Papers published in 2012

k-Indistinguishable Traffic Padding in Web Applications

A Web application is a piece of software whose execution is split between the client and the server, and it sends data between the two sides every time you do something significant—even, perhaps, on every single keystroke. Inevitably, the volume of data has some correlation with the action performed. Since most Web applications are offered to the general public at low or no cost, an adversary can map out much of its internal state machine and develop a model that lets them deduce what someone’s doing from an observed sequence of packets. [1] [2] This has been successfully applied (in the lab) to Web applications for highly sensitive tasks, such as tax preparation and medical advice.

The only known solution to this problem is to pad the amount of data passed over the network in each direction so that the adversary cannot learn anything about the state of the web application from an observed sequence of packets. Unfortunately, the obvious way to do this involves unacceptably high levels of overhead—this paper quotes [1] as observing 21074% overhead for a well-known online tax system. Today’s paper proposes that techniques from the field of privacy-preserving data publishing (PPDP, [3]) be brought to bear on the problem.

The paper is mathematically heavy going and I’m not going to go into details of their exact proposal, because it’s really just a starting point. Instead, I’m going to pull out some interesting observations:

  • More padding is not always better. They show an example case where padding messages to a multiple of 128 bytes is exactly as good as padding them to a multiple of 512 bytes, and padding them to a multiple of 520 bytes is worse.

  • Computing the optimal amount of padding for a web application (as they define it) is NP-hard even with complete, ideal knowledge of its behavior. However, there are reasonably efficient and speedy approximations—if I understood them correctly, it might even be feasible to apply one of their algorithms on the fly with each client.

  • Because optimality is defined over an entire sequence of user interactions with an application, knowing how much padding to send in response to any given request may require the server to predict the future. Again, however, there are approximations that avoid this problem.

  • PPDP offers a whole bunch of different ways to model the degree to which one has mitigated the information leak: k-anonymity is the model they spend the most time on, but there’s also l-diversity, (αk)-anonymity, t-closeness, etc. etc. I don’t follow that literature but I have the impression people are still actively making up new models and haven’t spent a whole lot of time on figuring out which is best in which situations.

The great failing of this and all the other papers I’ve ever read in this area is operationalization. This paper’s algorithms need a complete, or nearly so, map of all possible sequences of states that a Web application can enter, and the authors shrug off the question of how you create this map in the first place. (I very strongly suspect that task can’t be mechanized in the general case—it will turn out to be the Halting Problem in another of its many disguises.) They also don’t seem to realize that things like Google search suggestions (the running example in the paper) are constantly changing. (Of course, that makes life harder for the adversary too.) This doesn’t mean there is no hope; it only means you need more approximations. I think it ought to be possible to build good-enough automatic padding into CMSes, and maybe even into web application frameworks. Trying that should be a priority for future research.

Inferring Mechanics of Web Censorship Around the World

I’ve talked a bunch about papers that investigate what is being censored online in various countries, but you might also want to know how it’s done. There are only a few ways it could be done, and this paper does a good job of laying them out:

  • By DNS name: intercept DNS queries either at the router or the local DNS relay, return either no such host or a server that will hand out errors for everything.

  • By IP address: in an intermediate router, discard packets intended for particular servers, and/or respond with TCP RST packets (which make the client disconnect) or forged responses. (In principle, an intermediate router could pretend to be the remote host for an entire TCP session, but it doesn’t seem that anyone does.)

  • By data transferred in cleartext: again in an intermediate router, allow the initial connection to go through, but if blacklisted keywords are detected then forge a TCP RST.

There are a few elaborations and variations, but those are the basic options if you are implementing censorship in the backbone of the network. The paper demonstrates that all are used. It could also, of course, be done at either endpoint, but that is much less common (though not unheard of) and the authors of this paper ruled it out of scope. It’s important to understand that the usual modes of encryption used on the ’net today (e.g. HTTPS) do not conceal either the DNS name or the IP address of the remote host, but do conceal the remainder of an HTTP request. Pages of an HTTPS-only website cannot be censored individually, but the entire site can be censored by its DNS name or server IP address. This is why Github was being DDoSed a few months ago to try to get them to delete repositories being used to host circumvention tools [1]: Chinese censors cannot afford to block the entire site, as it is too valuable to their software industry, but they have no way to block access to the specific repositories they don’t like.

Now, if you want to find out which of these scenarios is being carried out by any given censorious country, you need to do detailed network traffic logging, because at the application level, several of them are indistinguishable from the site being down or the network unreliable. This also means that the censor could choose to be stealthy: if Internet users in a particular country expect to see an explicit message when they try to load a blocked page, they might assume that a page that always times out is just broken. [2] The research contribution of this paper is in demonstrating how you do that, through a combination of packet logging and carefully tailored probes from hosts in-country. They could have explained themselves a bit better: I’m not sure they bothered to try to discriminate packets are being dropped at the border router from packets are being dropped by a misconfigured firewall on the site itself, for instance. Also, I’m not sure whether it’s worth going to the trouble of packet logging, frankly. You should be able to get the same high-level information by comparing the results you get from country A with those you get from country B.

Another common headache in this context is knowing whether the results you got from your measurement host truly reflect what a normal Internet user in the country would see. After all, you are probably using a commercial data center or academic network that may be under fewer restrictions. This problem is one of the major rationales for Encore, which I discussed a couple weeks ago [3]. This paper nods at that problem but doesn’t really dig into it. To be fair, they did use personal contacts to make some of their measurements, so those may have involved residential ISPs, but they are (understandably) vague about the details.

Students Who Don’t Understand Information Flow Should Be Eaten: An Experience Paper

On a completely different and much lighter note from Wednesday, today we’re going to look at a paper about teaching students information-flow theory with Werewolf.

Werewolf (also known as Mafia) is a game in which the players are divided into two groups, the werewolves (or mafiosi) and the villagers, and each is trying to wipe the other out at a rate of one murder per turn. There are a whole bunch of variants. Normally this is played around a table, as a game of strategic deception: only the werewolves know who the werewolves are, and they participate as villagers on the villagers’ turn. In this paper, it becomes a game played online, of surveillance and countersurveillance: the villagers are actively encouraged to exploit information leaks in the game server and discover who the werewolves are. (In a normal game this would be cheating.)

The authors don’t report how this teaching method compares to traditional lectures on any quantitative basis (e.g. final exam scores, class grades). However, they do say that the students loved the exercise, met up outside of official class hours to discuss strategies and plan, and that over the term the exploits and countermeasures grew steadily more sophisticated, in some cases requiring adjustments to the game server to ensure that both sides could still win. It’s hard to imagine this level of student engagement not leading to better grades, better retention, and deeper interest in the material.

I think this is a brilliant idea and not just for teaching information flow. One of the hardest things to teach in a security course is what Bruce Schneier calls the security mindset: intentionally thinking about how a system can be caused to fail, to do something it’s not intended to do. In particular, it is in tension with the usual engineering mindset, which focuses on verifying that something works when used as designed. (Safety engineering and failure analysis have more of the security mindset about them, but usually focus on failures due to accident rather than malice.) But it is exactly how you need to think to successfully cheat at a game, or to notice when someone else is cheating—and in that context, it’s going to be familiar to most college students. Using a game of strategic deception as the backdrop will also encourage students to think along the right lines. I’d like to see this idea adapted to teach other challenging notions in security—penetration testing is the obvious one, but maybe also code exploitation and key management?

Whiskey, Weed, and Wukan on the World Wide Web

The subtitle of today’s paper is On Measuring Censors’ Resources and Motivations. It’s a position paper, whose goal is to get other researchers to start considering how economic constraints might affect the much-hypothesized arms race or tit-for-tat behavior of censors and people reacting to censorship: as they say,

[…] the censor and censored have some level of motivation to accomplish various goals, some limited amount of resources to expend, and real-time deadlines that are due to the timeliness of the information that is being spread.

They back up their position by presenting a few pilot studies, of which the most compelling is the investigation of keyword censorship on Weibo (a Chinese microblogging service). They observe that searches are much more aggressively keyword-censored than posts—that is, for many examples of known-censored keywords, one is permitted to make a post on Weibo containing that keyword, but searches for that keyword will produce either no results or very few results. (They don’t say whether unrelated searches will turn up posts containing censored keywords.) They also observe that, for some keywords that are not permitted to be posted, the server only bothers checking for variations on the keyword if the user making the post has previously tried to post the literal keyword. (Again, the exact scope of the phenomenon is unclear—does an attempt to post any blocked keyword make the server check more aggressively for variations on all blocked keywords, or just that one? How long does this escalation last?) And finally, whoever is maintaining the keyword blacklists at Weibo seems to care most about controlling the news cycle: terms associated with breaking news that the government does not like are immediately added to the blacklist, and removed again just as quickly when the event falls out of the news cycle or is resolved positively. They give detailed information about this phenomenon for one news item, the Wukan incident, and cite several other keywords that seem to have been treated the same.

They compare Weibo’s behavior to similar keyword censorship by chat programs popular in China, where the same patterns appear, but whoever is maintaining the lists is sloppier and slower about it. This is clear evidence that the lists are not maintained centrally (by some government agency) and they suggest that many companies are not trying very hard:

At times, we often suspected that a keyword blacklist was being typed up by an over-worked college intern who was given vague instructions to filter out anything that might be against the law.

Sadly, I haven’t seen much in the way of people stepping up to the challenge presented, designing experiments to probe the economics of censorship. You can see similar data points in other studies of China [1] [2] [3] (it is still the case, as far as I know, that ignoring spurious TCP RST packets is sufficient to evade several aspects of the Great Firewall), and in reports from other countries. It is telling, for instance, that Pakistani censors did not bother to update their blacklist of porn sites to keep up with a shift in viewing habits. [4] George Danezis has been talking about the economics of anonymity and surveillance for quite some time now [5] [6] but that’s not quite the same thing. I mentioned above some obvious follow-on research just for Weibo, and I don’t think anyone’s done that. Please tell me if I’ve missed something.

Taster’s Choice: A Comparative Analysis of Spam Feeds

Here’s another paper about spam; this time it’s email spam, and they are interested not so much in the spam itself, but in the differences between collections of spam (feeds) as used in research. They have ten different feeds, and they compare them to each other looking only at the domain names that appear in each. The goal is to figure out whether or not each feed is an unbiased sample of all the spam being sent at any given time, and whether some types of feed are better at detecting particular sorts of spam. (Given this goal, looking only at the domain names is probably the most serious limitation of the paper, despite being brushed off with a footnote. It means they can’t say anything about spam that doesn’t contain any domain names, which may be rare, but is interesting because it’s rare and different from all the rest. They should have at least analyzed the proportion of it that appeared in each feed.)

The spam feeds differ primarily in how they collect their samples. There’s one source consisting exclusively of manually labeled spam (from a major email provider); two DNS blacklists (these provide only domain names, and are somehow derived from other types of feed); three MX honeypots (registered domains that accept email to any address, but are never used for legitimate mail); two seeded honey accounts (like honeypots, but a few addresses are made visible to attract more spam); one botnet-monitoring system; and one hybrid. They don’t have full details on exactly how they all work, which is probably the second most serious limitation.

The actual results of the analysis are basically what you would expect: manually labeled spam is lower-volume but has more unique examples in it, botnet spam is very high volume but has lots of duplication, everything else is somewhere in between. They made an attempt to associate spam domains with affiliate networks (the business of spamming nowadays is structured as a multi-level marketing scheme) but they didn’t otherwise try to categorize the spam itself. I can think of plenty of additional things to do with the data set—which is the point: it says right in the abstract most studies [of email spam] use a single spam feed and there has been little examination of how such feeds may differ in content. They’re not trying so much to produce a comprehensive analysis themselves as to alert people working in this subfield that they might be missing stuff by looking at only one data source.