Papers by Aaron Johnson

20,000 In League Under The Sea: Anonymous Communication, Trust, MLATs, and Undersea Cables

Today’s paper takes another shot at modeling how the physical topology of the Internet affects the security of Tor against passive adversaries with the ability to snoop on a lot of traffic. It’s by some of the same people who wrote Defending Tor from Network Adversaries and is quite closely related.

Most of the work of this paper goes into building a flexible, formal threat model, which Tor client software could (in principle) use to inform its routing decisions. Acknowledging that there’s always going to be a good deal of uncertainty about what adversaries are out there and what they are capable of, they make two key design decisions. The model is probabilistic (based on a Bayesian belief network), and it takes user input. For instance, if you have reason to think the government of Transbelvia has it in for you, you can instruct Tor to avoid paths that Transbelvia might be able to snoop on, and the model will expand that out to all the ways they might do that. Conversely, if you trust a particular organization you might like to preferentially use its guards or exit nodes, and it can do that too.

The model is very thorough about different ways a government might be able to snoop on network traffic—not just relays physically hosted in the country, but ASes and IXPs (Transbelvia hosts a major IXP for Eastern Europe), submarine cable landing sites (not relevant for a landlocked country), mutual legal assistance treaties (MLATs) which might be used to have another country do some snooping on Transbelvia’s behalf, and even hacking into and subverting routers at interesting points in the connectivity graph. (The pun in the title refers to their analysis of how MLATs could allow several of the usual suspects to snoop on 90+% of all submarine cable traffic, even though they host hardly any cable landings themselves.) Equally important, it can be expanded at need when new techniques for spying are revealed.

I think something like this is going to be an essential building block if we want to add any spy-aware routing algorithm to Tor, but I have two serious reservations. First, simplest, but less important, right now all Tor clients make routing decisions more-or-less the same way (there have been small changes to the algorithm over time, but everyone is strongly encouraged to stay close to the latest client release anyway, just because of bugs). If clients don’t all make routing decisions the same way, then that by itself might be usable to fingerprint them, and thus cut down the number of people who might’ve taken some action, from all Tor users to all Tor users who make routing decisions like THIS. If highly personalized threat models are allowed, the latter group might be just one person.

Second, and rather more serious, the user-input aspect of this system is going to require major user experience research and design to have any hope of not being worse than the problem it’s trying to solve. It’s not just a matter of putting a friendly face on the belief language (although that does need to happen)—the system will need to educate its users in the meaning of what it is telling them, and it will need to walk them through the consequences of their choices. And it might need to provide nudges if there’s a good reason to think the user’s assessment of their threat model is flat-out wrong (even just making that judgement automatically is fraught with peril—but so is not making that judgement).

Defending Tor from Network Adversaries: A Case Study of Network Path Prediction

In a similar vein as Tuesday’s paper, this is an investigation of how practical it might be to avoid exposing Tor circuits to traffic analysis by an adversary who controls an Autonomous System. Unlike Tuesday’s paper, they assume that the adversary does not manipulate BGP to observe traffic that they shouldn’t have seen, so the concern is simply to ensure that the two most sensitive links in the circuit—from client to entry, and from exit to destination—do not pass through the same AS. Previous papers have suggested that the Tor client should predict the AS-level paths involved in these links, and select entries and exits accordingly [1] [2]. This paper observes that AS path prediction is itself a difficult problem, and that different techniques can give substantially different results. Therefore, they collected traceroute data from 28 Tor relays and compared AS paths inferred from these traces with those predicted from BGP monitoring (using the algorithm of On AS-Level Path Inference [3]).

The core finding is that traceroute-based AS path inference does indeed give substantially different results from BGP-based path prediction. The authors assume that traceroute is more accurate; the discrepancy is consistently described as an error in the BGP-based prediction, and (since BGP-based prediction tends to indicate exposure to more different ASes) as overstating the risk exposure of any given Tor link. This seems unjustified to me. The standard traceroute algorithm is known to become confused in the presence of load-balancing routers, which are extremely common in the backbone [4]; refinements have been proposed (and implemented in the scamper tool used in this paper) but have problems themselves [5] [6]. More elementally, traceroute produces a snapshot: these UDP packets did take this route just now. Tor links are relatively long-lived TCP connections (tens of minutes) which could easily be rerouted among several different paths over their lifetime. I think it would be better to say that BGP path prediction produces a more conservative estimate of the ASes to which a Tor link could be exposed, and highlight figuring out which one is more accurate as future work.

A secondary finding is that AS-aware path selection by the Tor client interacts poorly with the guard policy, in which each Tor client selects a small number of entry nodes to use for an extended period. These nodes must be reliable and high-bandwidth; the economics of running a reliable, high-bandwidth Internet server mean that they are concentrated in a small number of ASes. Similar economics apply to the operation of exit nodes, plus additional legal headaches; as a result, it may not be possible to find any end-to-end path that obeys both the guard policy and the AS-selection policy. This situation is, of course, worsened if you take the more conservative, BGP-based estimation of AS exposure.

I’ve been concerned for some time that guards might actually be worse for anonymity than the problem they are trying to solve. The original problem statement [7] is that if you select an entry node at random for each circuit, and some fraction of entry nodes are malicious, with high probability you will eventually run at least one circuit through a malicious entry. With guards, either all your circuits pass through a malicious entry for an extended period of time, or none do. My fundamental concern with this is, first, having all your traffic exposed to a malicious entry for an extended period is probably much worse for your anonymity than having one circuit exposed every now and then; second, the hypothetical Tor adversary has deep pockets and can easily operate reliable high-bandwidth nodes, which are disproportionately likely to get picked as guards. Concentration of guards in a small number of ASes only makes this easier for the adversary; concentration of guards together with exits in a small number of ASes makes it even easier. It’s tempting to suggest a complete about-face, preferentially choosing entry nodes from the low-bandwidth, short-lived population and using them only for a short time; this would also mean that entry nodes could be taken from a much broader pool of ASes, and it would be easier to avoid overlap with the AS-path from exit to destination.