Papers by Jedidiah R. Crandall

Students Who Don’t Understand Information Flow Should Be Eaten: An Experience Paper

On a completely different and much lighter note from Wednesday, today we’re going to look at a paper about teaching students information-flow theory with Werewolf.

Werewolf (also known as Mafia) is a game in which the players are divided into two groups, the werewolves (or mafiosi) and the villagers, and each is trying to wipe the other out at a rate of one murder per turn. There are a whole bunch of variants. Normally this is played around a table, as a game of strategic deception: only the werewolves know who the werewolves are, and they participate as villagers on the villagers’ turn. In this paper, it becomes a game played online, of surveillance and countersurveillance: the villagers are actively encouraged to exploit information leaks in the game server and discover who the werewolves are. (In a normal game this would be cheating.)

The authors don’t report how this teaching method compares to traditional lectures on any quantitative basis (e.g. final exam scores, class grades). However, they do say that the students loved the exercise, met up outside of official class hours to discuss strategies and plan, and that over the term the exploits and countermeasures grew steadily more sophisticated, in some cases requiring adjustments to the game server to ensure that both sides could still win. It’s hard to imagine this level of student engagement not leading to better grades, better retention, and deeper interest in the material.

I think this is a brilliant idea and not just for teaching information flow. One of the hardest things to teach in a security course is what Bruce Schneier calls the security mindset: intentionally thinking about how a system can be caused to fail, to do something it’s not intended to do. In particular, it is in tension with the usual engineering mindset, which focuses on verifying that something works when used as designed. (Safety engineering and failure analysis have more of the security mindset about them, but usually focus on failures due to accident rather than malice.) But it is exactly how you need to think to successfully cheat at a game, or to notice when someone else is cheating—and in that context, it’s going to be familiar to most college students. Using a game of strategic deception as the backdrop will also encourage students to think along the right lines. I’d like to see this idea adapted to teach other challenging notions in security—penetration testing is the obvious one, but maybe also code exploitation and key management?

Whiskey, Weed, and Wukan on the World Wide Web

The subtitle of today’s paper is On Measuring Censors’ Resources and Motivations. It’s a position paper, whose goal is to get other researchers to start considering how economic constraints might affect the much-hypothesized arms race or tit-for-tat behavior of censors and people reacting to censorship: as they say,

[…] the censor and censored have some level of motivation to accomplish various goals, some limited amount of resources to expend, and real-time deadlines that are due to the timeliness of the information that is being spread.

They back up their position by presenting a few pilot studies, of which the most compelling is the investigation of keyword censorship on Weibo (a Chinese microblogging service). They observe that searches are much more aggressively keyword-censored than posts—that is, for many examples of known-censored keywords, one is permitted to make a post on Weibo containing that keyword, but searches for that keyword will produce either no results or very few results. (They don’t say whether unrelated searches will turn up posts containing censored keywords.) They also observe that, for some keywords that are not permitted to be posted, the server only bothers checking for variations on the keyword if the user making the post has previously tried to post the literal keyword. (Again, the exact scope of the phenomenon is unclear—does an attempt to post any blocked keyword make the server check more aggressively for variations on all blocked keywords, or just that one? How long does this escalation last?) And finally, whoever is maintaining the keyword blacklists at Weibo seems to care most about controlling the news cycle: terms associated with breaking news that the government does not like are immediately added to the blacklist, and removed again just as quickly when the event falls out of the news cycle or is resolved positively. They give detailed information about this phenomenon for one news item, the Wukan incident, and cite several other keywords that seem to have been treated the same.

They compare Weibo’s behavior to similar keyword censorship by chat programs popular in China, where the same patterns appear, but whoever is maintaining the lists is sloppier and slower about it. This is clear evidence that the lists are not maintained centrally (by some government agency) and they suggest that many companies are not trying very hard:

At times, we often suspected that a keyword blacklist was being typed up by an over-worked college intern who was given vague instructions to filter out anything that might be against the law.

Sadly, I haven’t seen much in the way of people stepping up to the challenge presented, designing experiments to probe the economics of censorship. You can see similar data points in other studies of China [1] [2] [3] (it is still the case, as far as I know, that ignoring spurious TCP RST packets is sufficient to evade several aspects of the Great Firewall), and in reports from other countries. It is telling, for instance, that Pakistani censors did not bother to update their blacklist of porn sites to keep up with a shift in viewing habits. [4] George Danezis has been talking about the economics of anonymity and surveillance for quite some time now [5] [6] but that’s not quite the same thing. I mentioned above some obvious follow-on research just for Weibo, and I don’t think anyone’s done that. Please tell me if I’ve missed something.

Large-scale Spatiotemporal Characterization of Inconsistencies in the World’s Largest Firewall

Lots of academic research on Internet censorship treats the countries doing the censorship as monoliths: that is, measurements will typically only be conducted from one client in one fixed location (often a commercial VPS or colocation provider), and the results are assmued to reflect the situation countrywide. When you’re talking about a country as large as China, that assumption seems poorly justified, and there have been several studies aiming to collect more fine-grained information. [1] [2] [3] This paper is in that line of research, with a systematic survey of censorship of one application (Tor) in roughly 150 different locations across China, repeating the measurement at hourly intervals for 27 days. The measurement clients are diverse both in terms of geographic location and network topology.

The results largely confirm what was already suspected. This particular application is indeed blocked consistently across China, with the possible exception of CERNET (China Education and Research Network), whose filtering is less aggressive. The filtering occurs at major China-wide IXPs, as suspected from previous studies. The firewall appears to operate primarily by dropping inbound traffic to China; the authors don’t try to explain this, but earlier related research [4] points out that the firewall must wait to see a TCP SYN/ACK packet before it can successfully forge RST packets in both directions. Finally, there is concrete evidence for failures, lasting hours at a time, uncorrelated with geographic location, where traffic passes uncensored. This was anecdotally known to happen but not previously studied in any kind of detail, to my knowledge. This paper doesn’t speculate at all on why the failures happen or how we could figure that out, which I think is unfortunate.

The techniques used to collect the data are more interesting, at least to me. The principal method is called hybrid idle scanning, first presented by some of the same authors in a different paper [5]. It allows a measurement host to determine whether a client can complete a TCP handshake with a server, without itself being either the client or the server; if the handshake does not complete successfully, it reveals whether client-server or server-client packets are being lost. It does rely on an information leak in older client TCP stacks (predictable IP-ID sequences, [6]) but millions of hosts worldwide still run operating systems with these bugs—the authors report an estimate that they they comprise 1% of the global IPv4 address space. Thus, it’s possible to find a measurement client in any geographic location with reasonably common Internet usage. Data from this technique is backed up with more detailed information from traceroutes and SYN probes from a smaller number of locations. They describe a previously-unreported server-side information leak in Linux’s handling of half-open TCP connections, which can be used to study what IP-based blacklisting of a server looks like to that server, without access to that server.

I’m also impressed with the authors’ systematic presentation of the hypotheses they wanted to test and how they chose to test each of them. Anyone interested in network measurements could probably learn something about how to structure an experiment from this paper.