Today we’re going to look at a position paper from the New America think tank’s Cybersecurity Initiative. If you’re someone like me, that descriptor probably raises at least four red flags: regarding anything to do with Internet security, there’s a tremendous gulf in groupthink between people associated with the US government, and people associated with commercial or pro bono development of computer stuff. Which is precisely why it’s useful to read papers from the other side of the gulf, like this one. (This particular think tank appears to be more on the left as that term is used in US politics. I haven’t dug into their other position papers.)

Swire starts by pointing out that the government’s understanding of secrecy was developed during the Cold War, when it was, frankly, much easier to keep secrets. Paper documents in an archive, which readers must physically visit, and demonstrate their need-to-know to the man with a gun at the entrance, are inherently difficult to duplicate. But that entire archive probably fits on a \$50 thumbdrive today. In a similar vein, regular readers will recall the standard military teletype with its data transfer rate of 75 bits per second, from Limitations of End-to-End Encryption (1978).

Also, once data has been exfiltrated, it’s much easier to broadcast it, because there are lots more news organizations who might take an interest—or you can just post it online yourself and rely on the tremendously accelerated speed of gossip. These things together are what Swire means by the declining half-life of secrets: secrets have always been expected to get out eventually, but the time scale is no longer decades. The metaphor of a reduced half-life seems spot on to me: leakage of secrets is inherently probabilistic, so exponential decay is the simplest model, and should give people the right first-order intuition.

Swire then moves on to discussing the effects of that groupthink gulf I mentioned. This bit is weaker, because it’s plain that he doesn’t understand why people might prefer the world-view of EFF. But it’s accurate as far as it goes. People associated with the government are starting from the premise that revealing a secret, regardless of its contents, is the worst possible thing anyone can do. (I confess to not understanding how one comes to think this, myself. It probably has to do with one’s default idea of a secret being something that really could get someone killed if it were revealed, never mind that only a tiny fraction of all classified information is that dangerous.) In contrast, the world-view of EFF begins with the premise that most information should be published, and that an organization doing something in secret from the general public probably means it knows, institutionally, that the general public would not approve. And, therefore, that it shouldn’t be doing it in the first place. Since most of the technology community takes this position, the government has an increasingly large problem trying to persuade that community to cooperate with its own attitude, and (Swire says) this will only get worse.

The paper concludes with some fairly weaksauce recommendations: plan for the possibility of disclosure; take the impact on public opinion (should the secret be revealed) into account when planning secret operations; put more effort into arguing for surveillance. Basically, business as usual but with more media savvy. This may be the best one can hope for in the short term, but I have some policy suggestions of my own:

• Apply Kerckhoffs’ Principle to all surveillance programs. The overall design of the system, its budget, the nature of the data collected, all relevant policies and procedures, everything except the collected data should be public knowledge, subject to normal public oversight (e.g. any Congressional hearings on the topic should be conducted in public and on the record), and debated in public prior to implementation—just like any other government program. If that would render the surveillance useless, the logic of Kerckhoffs’ principle says it was already useless. (I’ve made this point before, on my main blog.)

• Abandon the desire for exceptional access. The technology community has spent 20+ years explaining over and over and over again why exceptional access is impractical and makes security worse for everyone. Government agencies refusing to accept that message is probably the single strongest reason why the groupthink gulf is as wide as it is.

• More generally, whenever there is a tradeoff between offense and defense in computer security, choose defense. Design cryptographic standards that are secure for everyone, even if they happen to be enemies of the USA right now (or might be at some time in the future). Disclose all security-critical bugs to the vendors, so they get fixed, even if this means not being able to pull off another Stuxnet. Think of this as the Internet analogue of the SALT and START treaties.

• Split the NSA in half. Merge the offensive signals intelligence mission into the CIA, or scrap it, I don’t care. Merge the defensive cryptographic and cryptanalytic mission into NIST, declassify and publish everything, and do all future work in public (Kerckhoffs’ Principle again). Make it a bedrock policy that this organization only does offensive research in support of defensive programs (e.g. to demonstrate the (un)soundness of a cipher).

I’m willing to listen to reasons not to do these things, as long as they do not boil down to we’re scared of hypothetical enemy X.

As I’ve mentioned a few times here before, one of the biggest problems in measurement studies of Web censorship is taking the measurement from the right place. The easiest thing (and this may still be difficult) is to get access to a commercial VPN exit or university server inside each country of interest. But commercial data centers and universities have ISPs that are often somewhat less aggressive about censorship than residential and mobile ISPs in the same country—we think. [1] And, if the country is big enough, it probably has more than one residential ISP, and there’s no reason to think they behave exactly the same. [2] [3] What we’d really like is to enlist spare CPU cycles on a horde of residential computers across all of the countries we’re interested in.

This paper proposes a way to do just that. The authors propose to add a script to globally popular websites which, when the browser is idle, runs tests of censorship. Thus, anyone who visits the website will be enlisted. The first half of the paper is a technical demonstration that this is possible, and that you get enough information out of it to be useful. Browsers put a bunch of restrictions on what network requests a script can make—you can load an arbitrary webpage in an invisible `<iframe>`, but you don’t get notified of errors and the script can’t see the content of the page; conversely, `<img>` can only load images, but a script can ask to be notified of errors. Everything else is somewhere in between. Nonetheless, the authors make a compelling case for being able to detect censorship of entire websites with high accuracy and minimal overhead, and a somewhat less convincing case for being able to detect censorship of individual pages (with lower accuracy and higher overhead). You only get a yes-or-no answer for each thing probed, but that is enough for many research questions that we can’t answer right now. Deployment is made very easy, a simple matter of adding an additional third-party script to websites that want to participate.

The second half of the paper is devoted to ethical and practical considerations. Doing this at all is controversial—in a box on the first page, above the title of the paper, there’s a statement from the SIGCOMM 2015 program committee, saying the paper almost got rejected because some reviewers felt it was unethical to do anything of the kind without informed consent by the people whose computers are enlisted to make measurements. SIGCOMM also published a page-length review by John Byers, saying much the same thing. Against this, the authors argue that informed consent in this case is of dubious benefit, since it does not reduce the risk to the enlistees, and may actually be harmful by removing any traces of plausible deniability. They also point out that many people would need a preliminary course in how Internet censorship works and how Encore measures it before they could make an informed choice about whether to participate in this research. Limiting the pool of enlistees to those who already have the necessary technical background would dramatically reduce the scale and scope of measurements. Finally they observe that the benefits of collecting this data are clear, whereas the risks are nebulous. In a similar vein, George Danezis wrote a rebuttal of the public review, arguing that the reviewers’ concerns are based on a superficial understanding of what ethical research in this area looks like.

Let’s be concrete about the risks involved. Encore modifies a webpage such that web browsers accessing it will, automatically and invisibly to the user, also access a number of unrelated webpages (or resources). By design, those unrelated webpages contain material which is considered unacceptable, perhaps to the point of illegality, in at least some countries. Moreover, it is known that these countries mount active MITM attacks on much of the network traffic exiting the country, precisely to detect and block access to unacceptable material. Indeed, the whole point of the exercise is to provoke an observable response from the MITM, in order to discover what it will and won’t respond to.

The MITM has the power to do more than just block access. It almost certainly records the client IP address of each browser that accesses undesirable material, and since it’s operated by a state, those logs could be used to arrest and indict people for accessing illegal material. Or perhaps the state would just cut off their Internet access, which would be a lesser harm but still a punishment. It could also send back malware instead of the expected content (we don’t know if that has ever happened in real life, but very similar things have [4]), or turn around and mount an attack on the site hosting the material (this definitely has happened [5]). It could also figure out that certain accesses to undesirable material are caused by Encore and ignore them, causing the data collected to be junk, or it could use Encore itself as an attack vector (i.e. replacing the Encore program with malware).

In addition to the state MITM, we might also want to worry about other adversaries in a position to monitor user behavior online, such as employers, compromised coffee shop WiFi routers, and user-tracking software. Employers may have their own list of material that people aren’t supposed to access using corporate resources. Coffee shop WiFi is probably interested in finding a way to turn your laptop into a botnet zombie; any unencrypted network access is a chance to inject some malware. User-tracking software might become very confused about what someone’s demographic is, and start hitting them with ads that relate to whatever controversial topic Encore is looking for censorship of. (This last might actually be a Good Thing, considering the enormous harms behavioral targeting can do. [6])

All of these are harm to someone. It’s important to keep in mind that except for poisoning the data collected by Encore (harm to the research itself) all of them can happen in the absence of Encore. Malware, ad networks, embedded videos, embedded like buttons, third-party resources of any kind: all of these can and do cause a client computer to access material without its human operator’s knowledge or consent, including accesses to material that some countries consider undesirable. Many of them also offer an active MITM the opportunity to inject malware.

The ethical debate over this paper has largely focused on increased risk of legal, or quasilegal, sanctions taken against people whose browsers were enlisted to run Encore tests. I endorse the authors’ observation that informed consent would actually make that risk worse. Because there are so many reasons a computer might contact a network server without its owner’s knowledge, people already have plausible deniability regarding accesses to controversial material (i.e. I never did that, it must have been a virus or something). If Encore told its enlistees what it was doing and gave them a chance to opt out, it would take that away.

Nobody involved in the debate knows how serious this risk really is. We do know that many countries are not nearly as aggressive about filtering the Internet as they could be, [7] so it’s reasonable to think they can’t be bothered to prosecute people just for an occasional attempt to access stuff that is blocked. It could still be that they do prosecute people for bulk attempts to access stuff that is blocked, but Encore’s approach—many people doing a few tests—would tend to avoid that. But there’s enough uncertainty that I think the authors should be talking to people in a position to know for certain: lawyers and activists from the actual countries of interest. There is not one word either in the papers or the reviews to suggest that anyone has done this. The organizations that the authors are talking to (Citizen Lab, Oxford Internet Institute, the Berkman Center) should have appropriate contacts already or be able to find them reasonably quickly.

Meanwhile, all the worry over legal risks has distracted from worrying about the non-legal risks. The Encore authors are fairly dismissive of the possibility that the MITM might subvert Encore’s own code or poison the results; I think that’s a mistake. They consider the extra bandwidth costs Encore incurs, but they don’t consider the possibility of exposing the enlistee to malware on a page (when they load an entire page). More thorough monitoring and reportage on Internet censorship might cause the censor to change its behavior, and not necessarily for the better—for instance, if it’s known that some ISPs are less careful about their filtering, that might trigger sanctions against them. These are just the things I can think of off the top of my head.

In closing, I think the controversy over this paper is more about the community not having come to an agreement about its own research ethics than it is about the paper itself. If you read the paper carefully, the IRB at each author’s institution did not review this research. They declined to engage with it. This was probably a correct decision from the board’s point of view, because an IRB’s core competency is medical and psychological research. (They’ve come in for criticism in the past for reviewing sociological studies as if they were clinical trials.) They do not, in general, have the background or expertise to review this kind of research. There are efforts underway to change that: for instance, there was a Workshop on Ethics in Networked Systems Research at the very same conference that presented this paper. (I wish I could have attended.) Development of a community consensus here will, hopefully, lead to better handling of future, similar papers.

In the same vein as the ecological study of ad injectors I reviewed back in June, this paper does an ecological analysis of domain parking. Domain parking is the industry term of art for the practice of registering a whole bunch of domain names you don’t have any particular use for but hope someone will buy, and while you wait for the buyer, sticking a website consisting entirely of ads in the space, usually with This domain is for sale! at the top in big friendly letters. Like many ad-driven online business models, domain parking can be mostly harmless or it can be a lead-in to outright scamming people and infesting their computers with malware, and the research question in this paper is, how bad does it get?

In order to answer that question, the authors carried out a modest-sized survey of 3000 parked domains, identified by trawling the DNS for name servers associated with 15 known parking services. (Also like many other online businesses, domain parking runs on an affiliate marketing basis: lots of small fry register the domains and then hand the keys over to big services that do the actual work of maintaining the websites with the ads.) All of these services engage in all of the abusive behavior one would expect: typosquatting, aggressive behavioral ad targeting, drive-by malware infection, and feeding visitors to scam artists and phishers. I do not come away with a clear sense of how common any of these attacks are relative to the default parking page of advertisements and links—they have numbers, but they’re not very well organized, and different sets of parking pages were used in each section (discussing a different type of abuse) which makes it hard to compare across sections.

I’m most interested in the last section of the paper, in which they develop a machine classifier that can distinguish parking pages from normal webpages, based on things like the amount of text that is and isn’t a hyperlink, number of external links, total volume of resources drawn from third-party sources, and so on. The bulk of this section is devoted to enumerating all of the features that they tested, but doesn’t do a terribly good job of explaining which features wound up being predictive. Algorithmic choices also seem a little arbitrary. They got 97.9% true positive rate and 0.5% false positive rate out of it, though, which says to me that this isn’t a terribly challenging classification problem and probably most anything would have worked. (This is consistent with the intuitive observation that you, a human, can tell at a glance when you’ve hit a parking page.)

We’re closing out Unpublished Results Week here with a paper that was submitted to CCS 2015 last May, but doesn’t appear to have made it.

I’ve talked about side channels a little before, but let me repeat why time and cache channels keep coming up and why they’re hard to get rid of: they’re side effects of hardware-design choices that, most of the time, are exactly what you want, if you want your computer to go fast. Think about long versus short division, for instance. It takes less mental effort, and less scratch paper, to manually divide 123456789 by 3 than by 29; the same is true (sort of) for your computer. So if it’s your job to design the CPU, why shouldn’t the integer divide instruction finish quicker in the former case? Especially if you have profiling information that says that most of the time integer divide gets used with small divisors. Memory caches are even more vital—the other day I saw a graph suggesting that it takes order of 17,000 clock cycles for the CPU to come all the way back up to full number crunching speed after a context switch; most of that is probably due to stuff falling out of the cache and having to be pulled back in.

So: side channels are hard to get rid of in hardware because that would involve making design tradeoffs that would hurt performance for the supermajority of code that isn’t crunching sensitive data, and they’re hard to get rid of in software because the hardware, and the compiler, are actively fighting you. (The paper says several times that side-channel-free cryptographic primitives pretty much have to be done in hand-written assembly language, and that getting them right is extremely difficult even by the standards of people who write assembly language by hand.) Wouldn’t it be nice to find a big hammer that would make the problem just … go away? That’s this paper.

A principal design goal for the big hammer in this paper is ease of use. The only thing the application programmer has to do is label protected functions that need to have their side channels eliminated. And it’s not a compiler paper; they haven’t come up with a way to make a compiler do the extremely difficult assembly tuning for you. All the magic happens at runtime. Given all that, it works pretty much the only way it could work: it measures how much the timing varies and then it pads out the fast cases to the same length as the slow cases. It also flushes state from the protected function out of as many CPU caches as possible (memory, TLB, branch target buffer, …) after the function is done, and takes steps (page coloring, CPU and memory pinning, realtime scheduling priority) to ensure that the protected function is not perturbed by what malicious code has done before it starts executing. The authors demonstrate reliability by testing several different crypto primitives that definitely do have side-channel information leaks, and showing that the leak is rendered unavailable in all cases. They also claim the overhead is acceptable (8.5%).

I really like the concept and hope to see more in this vein in the future. I can’t speculate on why this particular paper didn’t get accepted at CCS, but I do feel that the overhead measurements were inadequate: they don’t consider indirect costs of flushing caches and pinning stuff to CPUs all the time. I’d like to see whole-system benchmarking of, say, a shared-hosting HTTPS server. And at least one round of adversarial testing would be nice. It is one thing to demonstrate that your measurements show that the side channels you already knew about have been closed; it is another thing if someone who spends all their time thinking of side-channel attacks cannot find any in your system.

We’re picking back up with a paper that’s brand new—so new that it exists only as an arXiv preprint and I don’t know if it is planned to be published anywhere. It probably hasn’t gone through formal peer review yet.

Wright and colleagues observe that because Tor is commonly used to evade censorship, changes in the number of people using Tor from any given country are a signal of a change in the censorship régime in that country. This isn’t a new idea: the Tor project itself has been doing something similar since 2011. What this paper does is present an improved algorithm for detecting such changes. It uses PCA to compare the time series of Tor active users across countries. The idea is that if there’s a change in Tor usage worldwide, that probably doesn’t indicate censorship, but a change in just a few countries is suspicious. To model this using PCA, they tune the number of principal components so that the projected data matrix is well-divided into what they call normal and anomalous subspaces; large components in the anomalous subspace for any data vector indicate that that country at that time is not well-predicted by all the other countries, i.e. something fishy is going on.

They show that their algorithm can pick out previously known cases where a change in Tor usage is correlated with a change in censorship, and that its top ten most anomalous countries are mostly the countries one would expect to be suspicious by this metric—but also a couple that nobody had previously suspected, which they highlight as a matter needing further attention.

PCA used as an anomaly detector is a new idea on me. It seems like they could be extracting more information from it than they are. The graphs in this paper show what’s probably a global jump in Tor usage in mid-2013; this has a clear explanation, and they show that their detector ignores it (as it’s supposed to), but can they make their detector call it out separately from country-specific events? PCA should be able to do that. Similarly, it seems quite probable that the ongoing revolutions and wars in the Levant and North Africa are causing correlated changes to degree of censorship region-wide; PCA should be able to pull that out as a separate explanatory variable. These would both involve taking a closer look at the normal subspace and what each of its dimensions mean.

It also seems to me that a bit of preprocessing, using standard time series decomposition techniques, would clean up the analysis and make its results easier to interpret. There’s not one word about that possibility in the paper, which seems like a major omission; decomposition is the first thing that anyone who knows anything about time series analysis would think of. In this case, I think seasonal variation should definitely be factored out, and removing linear per-country trends might also helpful.

It’s time for another of my occasional looks at a paper that doesn’t have anything to do with security. This one is about an attempt to bridge a gap between two modes of analysis in computational linguistics.

In a word vector analysis (also referred to as a distributed word representation in this paper) one maps words onto vectors in an abstract high-dimensional space, defined by the co-occurrence probabilities of each pair of words within some collection of documents (a corpus). This can be done entirely automatically with any source of documents; no manual preprocessing is required. Stock machine-learning techniques, applied to these vectors, perform surprisingly well on a variety of downstream tasks—classification of the words, for instance. However, the vectors are meaningless to humans, so it’s hard to use them in theoretical work that requires interpretability, or to combine them with other information (e.g. sentence structure). Lexical semantics, by contrast, relies on manual, word-by-word tagging with human-meaningful categories (part of speech, sense of word, role in sentence, etc) which is slow and expensive, but the results are much easier to use as a basis for a wide variety of further studies.

The paper proposes a technique for transforming a set of word vectors to make them more interpretable. It’s essentially the opposite of PCA. They project the vectors into a higher-dimensional space, one in which they are all sparse (concretely, more than 90% of the components of each vector are zero). Words that are semantically related will (they claim) share nonzero components of these vectors, so each component has more meaning than in the original space. The projection matrix can be generated by standard mathematical optimization techniques (specifically, gradient descent with some tweaks to ensure convergence).

To back up the claim that the sparse vectors are more interpretable, they first show that five stock downstream classification tasks achieve an average of 5% higher accuracy when fed sparse vectors than the dense vectors from which they were derived, and then that humans achieve 10% higher accuracy on a word intrusion task (one of these five words does not belong in the list, which is it?) when the words are selected by their rank along one dimension of the sparse vectors, than when they are selected the same way from the dense vectors. An example would probably help here: Table 6 of the paper quotes the top-ranked words from five dimensions of the sparse and the dense vectors:

Dense Sparse
combat, guard, honor, bow, trim, naval fracture, breathing, wound, tissue, relief
’ll, could, faced, lacking, seriously, scored relationships, connections, identity, relations
see, n’t, recommended, depending, part files, bills, titles, collections, poems, songs
due, positive, equal, focus, respect, better naval, industrial, technological, marine
sergeant, comments, critics, she, videos stadium, belt, championship, toll, ride, coach

Neither type of list is what you might call an ideal Platonic category,1 but it should be clear that the sparse lists have more meaning in them.

Because I’m venturing pretty far out of my field, it’s hard for me to tell how significant this paper is; I don’t have any basis for comparison. This is, in fact, the big thing missing from this paper: a comparison to other techniques for doing the same thing, if any. Perhaps the point is that there aren’t any, but I didn’t see them say so. I am also unclear on how you would apply this technique to anything other than an analysis of words. For instance, my own research right now involves (attempting to) mechanically assign topics to entire documents. Right now we’re reducing each document to a bag of words, carrying out LDA on the bags, and then manually labeling each LDA cluster with a topic. Could we use bags of sparse word vectors instead? Would that help LDA do its job better? Or is LDA already doing what this does? I don’t know the answers to these questions.

1 If you are under the impression that categories in natural language are even vaguely Platonic, go at once to your friendly local public library and request a copy of Women, Fire, and Dangerous Things.

RC4 is a very widely used stream cipher, developed in the late 1980s. As wikipedia puts it, RC4 is remarkable for its speed and simplicity in software, but it has weaknesses that argue against its use in new systems. Today’s paper demonstrates that it is even weaker than previously suspected.

To understand the paper you need to know how stream ciphers work. The core of a stream cipher is a random number generator. The encryption key is the starting seed for the random number generator, which produces a keystream—a sequence of uniformly random bytes. You then combine the keystream with your message using the mathematical XOR operation, and that’s your ciphertext. On the other end, the recipient knows the same key, so they can generate the same keystream, and they XOR the ciphertext with the keystream and get the message back.

If you XOR a ciphertext with the corresponding plaintext, you learn the keystream. This wouldn’t be a big deal normally, but, the basic problem with RC4 is that its keystream isn’t uniformly random. Some bytes of the keystream are more likely to take specific values than they ought to be. Some bytes are more likely to not take specific values. And some bytes are more likely to take the same value as another byte in the keystream. (The ABSAB bias, which is mentioned often in this paper, is an example of that last case: you have slightly elevated odds of encountering value A, value B, a middle sequence S, and then A and B again.) All of these are referred to as biases in the RC4 keystream.

To use RC4’s biases to decrypt a message, you need to get your attack target to send someone (not you) the same message many times, encrypted with many different keys. You then guess a keystream which exhibits as many of the biases as possible, and you XOR this keystream with all of the messages. Your guess won’t always be right, but it will be right slightly more often than chance, so the correct decryption will also appear slightly more often than chance. It helps that it doesn’t have to be exactly the same message. If there is a chunk that is always the same, and it always appears in the same position, that’s enough. It also helps if you already know some of the message; that lets you weed out bad guesses faster, and exploit more of the biases.

Asking the attack target to send the same message many times might seem ridiculous. But remember that many Internet protocols involve headers that are fixed or nearly so. The plaintext of a TKIP-encrypted message, for instance, will almost always be the WiFi encapsulation of an IPv4 packet. If you know the IP addresses of your target and of the remote host it’s communicating with, that means you know eight bytes of the message already, and they’ll always be the same and in the same position. The paper goes into some detail about how to get a Web browser to make lots and lots of HTTPS requests with a session cookie (always the same, but unknown—the secret you want to steal) in a predictable position, with known plaintext on either side of it.

All this was well-known already. What’s new in this paper is: first, some newly discovered biases in the RC4 keystream, and a statistical technique for finding even more; second, improved attacks on TKIP and TLS. Improved means that they’re easier to execute without anyone noticing, and that they take less time. Concretely, the best known cookie-stealing attack before this paper needed to see $13\cdot {2}^{30}$ HTTPS messages (that’s about 14 billion) and this paper cuts it to $9\cdot {2}^{27}$ messages (1.2 billion), which takes only 75 hours to run. That’s entering the realm of practical in real life, if only for a client computer that is left on all the time, with the web browser running.

It’s a truism in computer security (actually, security in general) that attacks only ever get better. What’s important, when looking at papers like this, is not so much the feasibility of any particular paper’s attack, but the trend. The first-known biases in RC4 keystream were discovered only a year after the algorithm was published, and since then there’s been a steady drumbeat of researchers finding more biases and more effective ways to exploit them. That means RC4 is no good, and everyone needs to stop using it before someone finds an attack that only takes a few minutes. Contrast the situation with AES, where there are no known biases, and fifteen years of people looking for some kind of attack has produced only a tiny improvement over brute force.

Advice to the general public: Does this affect you? Yes. What can you do about it? As always, make sure your Web browsers are fully patched up—current versions of Firefox, Chrome, and IE avoid using RC4, and upcoming versions will probably take it out altogether. Beyond that, the single most important thing for you to do is make sure everything you’ve got that communicates over WiFi—router, computers, tablets, smartphones, etc.—are all set to use WPA2 and CCMP only. (The configuration interface may refer to CCMP as AES-CCMP or just AES; in this context, those are three names for the same thing.) The alternatives WEP, WPA, and TKIP all unavoidably involve RC4 and are known to be broken to some extent. Any WiFi-capable widget manufactured after 2006 has no excuse for not supporting WPA2 with CCMP. It should definitely be possible to set your home router up this way; unfortunately, many client devices don’t expose the necessary knobs.

Today’s paper takes another shot at modeling how the physical topology of the Internet affects the security of Tor against passive adversaries with the ability to snoop on a lot of traffic. It’s by some of the same people who wrote Defending Tor from Network Adversaries and is quite closely related.

Most of the work of this paper goes into building a flexible, formal threat model, which Tor client software could (in principle) use to inform its routing decisions. Acknowledging that there’s always going to be a good deal of uncertainty about what adversaries are out there and what they are capable of, they make two key design decisions. The model is probabilistic (based on a Bayesian belief network), and it takes user input. For instance, if you have reason to think the government of Transbelvia has it in for you, you can instruct Tor to avoid paths that Transbelvia might be able to snoop on, and the model will expand that out to all the ways they might do that. Conversely, if you trust a particular organization you might like to preferentially use its guards or exit nodes, and it can do that too.

The model is very thorough about different ways a government might be able to snoop on network traffic—not just relays physically hosted in the country, but ASes and IXPs (Transbelvia hosts a major IXP for Eastern Europe), submarine cable landing sites (not relevant for a landlocked country), mutual legal assistance treaties (MLATs) which might be used to have another country do some snooping on Transbelvia’s behalf, and even hacking into and subverting routers at interesting points in the connectivity graph. (The pun in the title refers to their analysis of how MLATs could allow several of the usual suspects to snoop on 90+% of all submarine cable traffic, even though they host hardly any cable landings themselves.) Equally important, it can be expanded at need when new techniques for spying are revealed.

I think something like this is going to be an essential building block if we want to add any spy-aware routing algorithm to Tor, but I have two serious reservations. First, simplest, but less important, right now all Tor clients make routing decisions more-or-less the same way (there have been small changes to the algorithm over time, but everyone is strongly encouraged to stay close to the latest client release anyway, just because of bugs). If clients don’t all make routing decisions the same way, then that by itself might be usable to fingerprint them, and thus cut down the number of people who might’ve taken some action, from all Tor users to all Tor users who make routing decisions like THIS. If highly personalized threat models are allowed, the latter group might be just one person.

Second, and rather more serious, the user-input aspect of this system is going to require major user experience research and design to have any hope of not being worse than the problem it’s trying to solve. It’s not just a matter of putting a friendly face on the belief language (although that does need to happen)—the system will need to educate its users in the meaning of what it is telling them, and it will need to walk them through the consequences of their choices. And it might need to provide nudges if there’s a good reason to think the user’s assessment of their threat model is flat-out wrong (even just making that judgement automatically is fraught with peril—but so is not making that judgement).

Today’s paper is a pilot study, looking into differences in adoption rate of HTTPS between various broad categories of websites (as defined by Alexa). They looked at raw availabilty of TLS service on port 443, and they also checked whether an HTTP request for the front page of each Alexa domain would redirect to HTTPS or vice versa. This test was conducted only once, and supplemented with historical data from the University of Michigan’s HTTPS Ecosystem project.

As you would expect, there is a significant difference in the current level of HTTPS availability from one category to another. They only show this information for a few categories, but the pattern is not terribly surprising: shopping 82%, business 70%, advertisers 45%, adult 36%, news 30%, arts 26%. (They say The relatively low score for Adult sites is surprising given that the industry has a large amount of paid content, but I suspect this is explained by that industry’s habit of outsourcing payment processing, plus the ubiquitous (not just in the adult category) misapprehension that only payment processing is worth bothering to encrypt.) It is also not surprising to find that more popular sites are more likely to support HTTPS. And the enormous number of sites that redirect their front page away from HTTPS is depressing, but again, not surprising.

What’s more interesting to me is the trendlines, which show a steady, slow, category-independent, linear uptake rate. There’s a little bit of a bump in adult and news around 2013 but I suspect it’s just noise. (The response growth over time figure (number 2), which appears to show a category dependence, is improperly normalized and therefore misleading. You want to look only at figure 1.) The paper looks for a correlation with the Snowden revelations; I’d personally expect that the dominant causal factor here is the difficulty of setting up TLS, and I’d like to see them look for correlations with major changes in that: for instance, Cloudflare’s offering no-extra-cost HTTPS support [1], Mozilla publishing a server configuration guide [2], or the upcoming Let’s Encrypt no-hassle CA [3]. It might also be interesting to look at uptake rate as a function of ranking, rather than category; it seems like the big names are flocking to HTTPS lately, it would be nice to know for sure.

The study has a number of methodological problems, which is OK for a pilot, but they need to get fixed before drawing serious conclusions. I already mentioned the normalization problem in figure 2: I think they took percentages of percentages, which doesn’t make sense. The right thing would’ve been to just subtract the initial level seen in figure 1 from each line, which (eyeballing figure 1) would have demonstrated an increase of about 5% in each category over the time period shown, with no evidence for a difference between categories. But before we even get that far, there’s a question of the difference between an IP address (the unit of the UMich scans), a website (the unit of TLS certificates), and a domain (the unit of Alexa ranking). To take some obvious examples: There are hundreds, if not thousands, of IP addresses that will all answer to the name of `www.google.com`. Conversely, Server Name Indication permits one IP address to answer for dozens or even hundreds of encrypted websites, and that practice is even more common over unencrypted HTTP. And hovering around #150 in the Alexa rankings is `amazonaws.com`, which is the backing store for at least tens of thousands of different websites, each of which has its own subdomain and may or may not have configured TLS. The correct primary data sources for this experiment are not Alexa and IPv4 scanning, but DNS scanning and certificate transparency logs. (A major search engine’s crawl logs would also be useful, if you could get your hands on them.) Finally, they should pick one set of 10-20 mutually exclusive, exhaustive categories (one of which would have to be Other) and consistently use them throughout the paper.

This week’s non-PETS paper is a broad survey of research into improving either the security, or the performance, or both, of low-latency anonymity networks such as Tor. Nearly all of the research used Tor itself as a testbed, and the presentation here assumes Tor, but most of the work could be generalized to other designs.

There’s been a lot of work on this sort of thing in the eleven years since Tor was first introduced, and this paper does a generally good job of categorizing it, laying out lines of research, indicating which proposals have been integrated into Tor and which haven’t, etc. (I particularly liked the mindmap diagram near the beginning, and the discussion near the end of which problems still need to get solved.) One notable exception is the section on improved cryptography, where you need to have a solid cryptography background to get any idea of what the proposals are, let alone whether they worked. There are also a couple of places where connections to the larger literature of network protocol engineering would have been helpful: for instance, there’s not a single mention of bufferbloat, even though that is clearly an aspect of the congestion problems that one line of research aims to solve. And because it’s not mentioned, it’s not clear whether the researchers doing that work knew about it.

Tor is a difficult case in protocol design because its security goals are—as acknowledged in the original paper describing its design [1]—directly in conflict with its performance goals. Improvements in end-to-end latency, for instance, may make a traffic correlation attack easier. Improvements in queueing fairness or traffic prioritization may introduce inter-circuit crosstalk enabling an attacker to learn something about the traffic passing through a relay. Preferring to use high-bandwidth relays improves efficiency but reduces the number of possible paths that traffic can take. And so on. It is striking, reading through this survey, to see how often an apparently good idea for performance was discovered to have unacceptable consequences for anonymity.