Papers tagged ‘Tor’

Detecting Internet Filtering from Geographic Time Series

We’re picking back up with a paper that’s brand new—so new that it exists only as an arXiv preprint and I don’t know if it is planned to be published anywhere. It probably hasn’t gone through formal peer review yet.

Wright and colleagues observe that because Tor is commonly used to evade censorship, changes in the number of people using Tor from any given country are a signal of a change in the censorship régime in that country. This isn’t a new idea: the Tor project itself has been doing something similar since 2011. What this paper does is present an improved algorithm for detecting such changes. It uses PCA to compare the time series of Tor active users across countries. The idea is that if there’s a change in Tor usage worldwide, that probably doesn’t indicate censorship, but a change in just a few countries is suspicious. To model this using PCA, they tune the number of principal components so that the projected data matrix is well-divided into what they call normal and anomalous subspaces; large components in the anomalous subspace for any data vector indicate that that country at that time is not well-predicted by all the other countries, i.e. something fishy is going on.

They show that their algorithm can pick out previously known cases where a change in Tor usage is correlated with a change in censorship, and that its top ten most anomalous countries are mostly the countries one would expect to be suspicious by this metric—but also a couple that nobody had previously suspected, which they highlight as a matter needing further attention.

PCA used as an anomaly detector is a new idea on me. It seems like they could be extracting more information from it than they are. The graphs in this paper show what’s probably a global jump in Tor usage in mid-2013; this has a clear explanation, and they show that their detector ignores it (as it’s supposed to), but can they make their detector call it out separately from country-specific events? PCA should be able to do that. Similarly, it seems quite probable that the ongoing revolutions and wars in the Levant and North Africa are causing correlated changes to degree of censorship region-wide; PCA should be able to pull that out as a separate explanatory variable. These would both involve taking a closer look at the normal subspace and what each of its dimensions mean.

It also seems to me that a bit of preprocessing, using standard time series decomposition techniques, would clean up the analysis and make its results easier to interpret. There’s not one word about that possibility in the paper, which seems like a major omission; decomposition is the first thing that anyone who knows anything about time series analysis would think of. In this case, I think seasonal variation should definitely be factored out, and removing linear per-country trends might also helpful.

20,000 In League Under The Sea: Anonymous Communication, Trust, MLATs, and Undersea Cables

Today’s paper takes another shot at modeling how the physical topology of the Internet affects the security of Tor against passive adversaries with the ability to snoop on a lot of traffic. It’s by some of the same people who wrote Defending Tor from Network Adversaries and is quite closely related.

Most of the work of this paper goes into building a flexible, formal threat model, which Tor client software could (in principle) use to inform its routing decisions. Acknowledging that there’s always going to be a good deal of uncertainty about what adversaries are out there and what they are capable of, they make two key design decisions. The model is probabilistic (based on a Bayesian belief network), and it takes user input. For instance, if you have reason to think the government of Transbelvia has it in for you, you can instruct Tor to avoid paths that Transbelvia might be able to snoop on, and the model will expand that out to all the ways they might do that. Conversely, if you trust a particular organization you might like to preferentially use its guards or exit nodes, and it can do that too.

The model is very thorough about different ways a government might be able to snoop on network traffic—not just relays physically hosted in the country, but ASes and IXPs (Transbelvia hosts a major IXP for Eastern Europe), submarine cable landing sites (not relevant for a landlocked country), mutual legal assistance treaties (MLATs) which might be used to have another country do some snooping on Transbelvia’s behalf, and even hacking into and subverting routers at interesting points in the connectivity graph. (The pun in the title refers to their analysis of how MLATs could allow several of the usual suspects to snoop on 90+% of all submarine cable traffic, even though they host hardly any cable landings themselves.) Equally important, it can be expanded at need when new techniques for spying are revealed.

I think something like this is going to be an essential building block if we want to add any spy-aware routing algorithm to Tor, but I have two serious reservations. First, simplest, but less important, right now all Tor clients make routing decisions more-or-less the same way (there have been small changes to the algorithm over time, but everyone is strongly encouraged to stay close to the latest client release anyway, just because of bugs). If clients don’t all make routing decisions the same way, then that by itself might be usable to fingerprint them, and thus cut down the number of people who might’ve taken some action, from all Tor users to all Tor users who make routing decisions like THIS. If highly personalized threat models are allowed, the latter group might be just one person.

Second, and rather more serious, the user-input aspect of this system is going to require major user experience research and design to have any hope of not being worse than the problem it’s trying to solve. It’s not just a matter of putting a friendly face on the belief language (although that does need to happen)—the system will need to educate its users in the meaning of what it is telling them, and it will need to walk them through the consequences of their choices. And it might need to provide nudges if there’s a good reason to think the user’s assessment of their threat model is flat-out wrong (even just making that judgement automatically is fraught with peril—but so is not making that judgement).

Performance and Security Improvements for Tor: A Survey

This week’s non-PETS paper is a broad survey of research into improving either the security, or the performance, or both, of low-latency anonymity networks such as Tor. Nearly all of the research used Tor itself as a testbed, and the presentation here assumes Tor, but most of the work could be generalized to other designs.

There’s been a lot of work on this sort of thing in the eleven years since Tor was first introduced, and this paper does a generally good job of categorizing it, laying out lines of research, indicating which proposals have been integrated into Tor and which haven’t, etc. (I particularly liked the mindmap diagram near the beginning, and the discussion near the end of which problems still need to get solved.) One notable exception is the section on improved cryptography, where you need to have a solid cryptography background to get any idea of what the proposals are, let alone whether they worked. There are also a couple of places where connections to the larger literature of network protocol engineering would have been helpful: for instance, there’s not a single mention of bufferbloat, even though that is clearly an aspect of the congestion problems that one line of research aims to solve. And because it’s not mentioned, it’s not clear whether the researchers doing that work knew about it.

Tor is a difficult case in protocol design because its security goals are—as acknowledged in the original paper describing its design [1]—directly in conflict with its performance goals. Improvements in end-to-end latency, for instance, may make a traffic correlation attack easier. Improvements in queueing fairness or traffic prioritization may introduce inter-circuit crosstalk enabling an attacker to learn something about the traffic passing through a relay. Preferring to use high-bandwidth relays improves efficiency but reduces the number of possible paths that traffic can take. And so on. It is striking, reading through this survey, to see how often an apparently good idea for performance was discovered to have unacceptable consequences for anonymity.

Tor’s Usability for Censorship Circumvention

This is a report on a pilot usability study. The authors ran five journalists (there aren’t any more details than that) through the process of installing, activating, and using the Tor Browser for a small number of canned tasks, identifying a number of problems:

… people did have difficulty with installing Tor Browser (principally because of the Gatekeeper code-signing feature on OS X), did not understand what many of the many options meant, and were confused about why certain things were happening.

They are going to do a much larger study, and were soliciting feedback on experimental design. I have only two things to say. First, the proposal is to do a large test of 200 users and then, presumably, start making changes to the software to improve usability. The problem with this is, it is very likely that subtle (yet serious) UX issues are being masked out by the more blatant ones: no matter how many people you experiment on, you won’t detect the subtle problems until the blatant ones are fixed. Therefore, it would be far more valuable to do a series of smaller user studies, improving the software based on the results of each study before doing the next one. This strategy also ensures that the research results do get incorporated into the product, rather than being lost in the shuffle once the paper is published.

The other point is more of a hypothesis about what would be good to aim for. To use Tor in a way that genuinely improves your security outcomes, you need to understand what it is doing and why, and to do that you have to wrap your head around some concepts that may be unfamiliar—especially if you haven’t previously needed to understand the Internet itself in any kind of detail. (For instance, the fact that every IP packet is labeled with its source and destination is obvious once you think about it, but I never thought about it to a lot of people.) There probably needs to be a training manual, and this manual needs to take the attitude that yeah, this is a little tricky, and you have to think about it some, but don’t panic, you can understand it. Shoot for the we understand tone said to characterize Rust compiler errors (warning: Reddit). The place I’ve seen this done best, personally, was the tutorial and concepts guide for GnuCash, which took just this tone with regard to double-entry bookkeeping—also somewhat notorious for its inscrutability. (Note: I read this a long time ago, and I don’t know whether its current edition is still like that.)

Defending Tor from Network Adversaries: A Case Study of Network Path Prediction

In a similar vein as Tuesday’s paper, this is an investigation of how practical it might be to avoid exposing Tor circuits to traffic analysis by an adversary who controls an Autonomous System. Unlike Tuesday’s paper, they assume that the adversary does not manipulate BGP to observe traffic that they shouldn’t have seen, so the concern is simply to ensure that the two most sensitive links in the circuit—from client to entry, and from exit to destination—do not pass through the same AS. Previous papers have suggested that the Tor client should predict the AS-level paths involved in these links, and select entries and exits accordingly [1] [2]. This paper observes that AS path prediction is itself a difficult problem, and that different techniques can give substantially different results. Therefore, they collected traceroute data from 28 Tor relays and compared AS paths inferred from these traces with those predicted from BGP monitoring (using the algorithm of On AS-Level Path Inference [3]).

The core finding is that traceroute-based AS path inference does indeed give substantially different results from BGP-based path prediction. The authors assume that traceroute is more accurate; the discrepancy is consistently described as an error in the BGP-based prediction, and (since BGP-based prediction tends to indicate exposure to more different ASes) as overstating the risk exposure of any given Tor link. This seems unjustified to me. The standard traceroute algorithm is known to become confused in the presence of load-balancing routers, which are extremely common in the backbone [4]; refinements have been proposed (and implemented in the scamper tool used in this paper) but have problems themselves [5] [6]. More elementally, traceroute produces a snapshot: these UDP packets did take this route just now. Tor links are relatively long-lived TCP connections (tens of minutes) which could easily be rerouted among several different paths over their lifetime. I think it would be better to say that BGP path prediction produces a more conservative estimate of the ASes to which a Tor link could be exposed, and highlight figuring out which one is more accurate as future work.

A secondary finding is that AS-aware path selection by the Tor client interacts poorly with the guard policy, in which each Tor client selects a small number of entry nodes to use for an extended period. These nodes must be reliable and high-bandwidth; the economics of running a reliable, high-bandwidth Internet server mean that they are concentrated in a small number of ASes. Similar economics apply to the operation of exit nodes, plus additional legal headaches; as a result, it may not be possible to find any end-to-end path that obeys both the guard policy and the AS-selection policy. This situation is, of course, worsened if you take the more conservative, BGP-based estimation of AS exposure.

I’ve been concerned for some time that guards might actually be worse for anonymity than the problem they are trying to solve. The original problem statement [7] is that if you select an entry node at random for each circuit, and some fraction of entry nodes are malicious, with high probability you will eventually run at least one circuit through a malicious entry. With guards, either all your circuits pass through a malicious entry for an extended period of time, or none do. My fundamental concern with this is, first, having all your traffic exposed to a malicious entry for an extended period is probably much worse for your anonymity than having one circuit exposed every now and then; second, the hypothetical Tor adversary has deep pockets and can easily operate reliable high-bandwidth nodes, which are disproportionately likely to get picked as guards. Concentration of guards in a small number of ASes only makes this easier for the adversary; concentration of guards together with exits in a small number of ASes makes it even easier. It’s tempting to suggest a complete about-face, preferentially choosing entry nodes from the low-bandwidth, short-lived population and using them only for a short time; this would also mean that entry nodes could be taken from a much broader pool of ASes, and it would be easier to avoid overlap with the AS-path from exit to destination.

Anonymity on QuickSand: Using BGP to Compromise Tor

One of the oldest research threads regarding Tor is trying to figure out how close you could get in real life to the global passive adversary that’s known to be able to deanonymize all communications. This is a new entry in that line of research, from HotNets 2014.

At the largest scale, the global Internet is administratively divided into autonomous systems (ASes) that exchange traffic, using BGP for configuration. Any given AS can only communicate with a small number of direct peers, so a stream of packets will normally pass through many different ASes on the way to its destination. It’s well-known that AS-operated backbone routers are in an excellent position to mount traffic-correlation attacks on Tor, particularly if they collude [1] [2]. The key observation in this paper is that, by manipulating BGP, a malicious AS can observe traffic that wouldn’t naturally flow through it.

BGP is an old protocol, originally specified in 1989; like most of our older protocols, it assumes that all participants are cooperative and honest. Any backbone router can announce that it is now capable of forwarding packets to a prefix (set of IP addresses) and the rest of the network will believe it. Incidents where traffic is temporarily redirected to an AS that either can’t get it to the destination at all, or can only do so suboptimally, are commonplace, and people argue about how malicious these are. [3] [4] [5] Suppose an adversary can observe one end of a Tor circuit—perhaps they control the ISP for a Tor client. They also have some reason to suspect a particular destination for the traffic. They use BGP to hijack traffic to the suspected destination, passing it on so that the client doesn’t notice anything. They can now observe both ends of the circuit and confirm their suspicions. They might not get to see traffic in both directions, but the authors also demonstrate that a traffic-correlation attack works in principle even if you can only see the packet flow in one direction, thanks to TCP acknowledgments.

Making this worse, high-bandwidth, long-lived Tor relays (which are disproportionately likely to be used for either end of a circuit) are clustered in a small number of ASes worldwide. This means an adversary can do dragnet surveillance by hijacking all traffic to some of those ASes; depending on its own position in the network, this might not even appear abnormal. The adversary might even be one of those ASes, or an agency in a position to lean on its operators.

The countermeasures proposed in this paper are pretty weak; they would only operate on a timescale of hours to days, whereas a BGP hijack can happen, and stop happening, in a matter of minutes. I don’t see a good fix happening anywhere but in the routing protocol itself. Unfortunately, routing protocols that do not assume honest participants are still a topic of basic research. (I may get to some of those papers eventually.) There are proposals for adding a notion of this AS is authorized to announce this prefix to BGP [6] but those have all the usual problems with substituting I trust this organization for I have verified that this data is accurate.