Papers tagged ‘Traffic analysis’

Performance and Security Improvements for Tor: A Survey

This week’s non-PETS paper is a broad survey of research into improving either the security, or the performance, or both, of low-latency anonymity networks such as Tor. Nearly all of the research used Tor itself as a testbed, and the presentation here assumes Tor, but most of the work could be generalized to other designs.

There’s been a lot of work on this sort of thing in the eleven years since Tor was first introduced, and this paper does a generally good job of categorizing it, laying out lines of research, indicating which proposals have been integrated into Tor and which haven’t, etc. (I particularly liked the mindmap diagram near the beginning, and the discussion near the end of which problems still need to get solved.) One notable exception is the section on improved cryptography, where you need to have a solid cryptography background to get any idea of what the proposals are, let alone whether they worked. There are also a couple of places where connections to the larger literature of network protocol engineering would have been helpful: for instance, there’s not a single mention of bufferbloat, even though that is clearly an aspect of the congestion problems that one line of research aims to solve. And because it’s not mentioned, it’s not clear whether the researchers doing that work knew about it.

Tor is a difficult case in protocol design because its security goals are—as acknowledged in the original paper describing its design [1]—directly in conflict with its performance goals. Improvements in end-to-end latency, for instance, may make a traffic correlation attack easier. Improvements in queueing fairness or traffic prioritization may introduce inter-circuit crosstalk enabling an attacker to learn something about the traffic passing through a relay. Preferring to use high-bandwidth relays improves efficiency but reduces the number of possible paths that traffic can take. And so on. It is striking, reading through this survey, to see how often an apparently good idea for performance was discovered to have unacceptable consequences for anonymity.

Protecting traffic privacy for massive aggregated traffic

Nowadays we are pretty darn sure we know how to encrypt network packets in transit such that there’s no realistic way to decrypt them. There are a whole host of second-order problems that are not as well solved, but basic message confidentiality is done—except for one thing: the length and timing of each packet are still visible to anyone who can monitor the wire. It turns out that this enables eavesdroppers to extract a frightening amount of information from those encrypted packets, such as which page of a medical-advice website you are reading [1], the language of your VoIP phone call [2], or even a transcript of your VoIP phone call [3].

In principle, we know how to close this information leak. It is simply necessary for Alice to send Bob a continuous stream of fixed-size packets, at a fixed rate, forever, whether or not she has anything useful to say. (When she doesn’t have anything useful to say, she can encrypt an endless stream of binary zeroes.) This is obviously a non-starter in any context where Alice cares about power consumption, shared channel capacity, or communicating with more than one Bob. Even when none of these is a concern—for instance, high-volume VPN links between data centers, which are running at significant utilization 24x7x365 anyway—forcing all traffic to fit the constant packet length and transmission schedule adds significant overhead. Thus, there’s a whole line of research trying to minimize that overhead, and/or see how far we can back down from the ideal-in-principle case without revealing too much.

Today’s paper offers a theoretical model and test framework that should make it easier to experiment with the high-volume-VPN-link case. I like it for its concreteness—often theoretical modeling papers feel so divorced from practice that it’s hard to see how to do anything constructive with them. It is also easy to see how the model could be extended to more sophisticated traffic-shaping techniques, which would be the obvious next step. The big downside, though, is that it only considers a fixed network topology: Alice talking to Bob and no one else, ever. Even for inter-data-center links, topologies can change on quite short notice, and a topology-change event might be exactly what Eve is listening for (perhaps it gives her advance notice of some corporate organizational change). To be fair, the continuous stream of fixed size packets scheme does completely solve the length-and-timing issue in principle; we do not have nearly as good schemes for concealing who is talking to whom, even in principle. Which is unfortunate, because you can do even more terrifying things with knowledge only of who is talking to whom. [4]

Defending Tor from Network Adversaries: A Case Study of Network Path Prediction

In a similar vein as Tuesday’s paper, this is an investigation of how practical it might be to avoid exposing Tor circuits to traffic analysis by an adversary who controls an Autonomous System. Unlike Tuesday’s paper, they assume that the adversary does not manipulate BGP to observe traffic that they shouldn’t have seen, so the concern is simply to ensure that the two most sensitive links in the circuit—from client to entry, and from exit to destination—do not pass through the same AS. Previous papers have suggested that the Tor client should predict the AS-level paths involved in these links, and select entries and exits accordingly [1] [2]. This paper observes that AS path prediction is itself a difficult problem, and that different techniques can give substantially different results. Therefore, they collected traceroute data from 28 Tor relays and compared AS paths inferred from these traces with those predicted from BGP monitoring (using the algorithm of On AS-Level Path Inference [3]).

The core finding is that traceroute-based AS path inference does indeed give substantially different results from BGP-based path prediction. The authors assume that traceroute is more accurate; the discrepancy is consistently described as an error in the BGP-based prediction, and (since BGP-based prediction tends to indicate exposure to more different ASes) as overstating the risk exposure of any given Tor link. This seems unjustified to me. The standard traceroute algorithm is known to become confused in the presence of load-balancing routers, which are extremely common in the backbone [4]; refinements have been proposed (and implemented in the scamper tool used in this paper) but have problems themselves [5] [6]. More elementally, traceroute produces a snapshot: these UDP packets did take this route just now. Tor links are relatively long-lived TCP connections (tens of minutes) which could easily be rerouted among several different paths over their lifetime. I think it would be better to say that BGP path prediction produces a more conservative estimate of the ASes to which a Tor link could be exposed, and highlight figuring out which one is more accurate as future work.

A secondary finding is that AS-aware path selection by the Tor client interacts poorly with the guard policy, in which each Tor client selects a small number of entry nodes to use for an extended period. These nodes must be reliable and high-bandwidth; the economics of running a reliable, high-bandwidth Internet server mean that they are concentrated in a small number of ASes. Similar economics apply to the operation of exit nodes, plus additional legal headaches; as a result, it may not be possible to find any end-to-end path that obeys both the guard policy and the AS-selection policy. This situation is, of course, worsened if you take the more conservative, BGP-based estimation of AS exposure.

I’ve been concerned for some time that guards might actually be worse for anonymity than the problem they are trying to solve. The original problem statement [7] is that if you select an entry node at random for each circuit, and some fraction of entry nodes are malicious, with high probability you will eventually run at least one circuit through a malicious entry. With guards, either all your circuits pass through a malicious entry for an extended period of time, or none do. My fundamental concern with this is, first, having all your traffic exposed to a malicious entry for an extended period is probably much worse for your anonymity than having one circuit exposed every now and then; second, the hypothetical Tor adversary has deep pockets and can easily operate reliable high-bandwidth nodes, which are disproportionately likely to get picked as guards. Concentration of guards in a small number of ASes only makes this easier for the adversary; concentration of guards together with exits in a small number of ASes makes it even easier. It’s tempting to suggest a complete about-face, preferentially choosing entry nodes from the low-bandwidth, short-lived population and using them only for a short time; this would also mean that entry nodes could be taken from a much broader pool of ASes, and it would be easier to avoid overlap with the AS-path from exit to destination.

Anonymity on QuickSand: Using BGP to Compromise Tor

One of the oldest research threads regarding Tor is trying to figure out how close you could get in real life to the global passive adversary that’s known to be able to deanonymize all communications. This is a new entry in that line of research, from HotNets 2014.

At the largest scale, the global Internet is administratively divided into autonomous systems (ASes) that exchange traffic, using BGP for configuration. Any given AS can only communicate with a small number of direct peers, so a stream of packets will normally pass through many different ASes on the way to its destination. It’s well-known that AS-operated backbone routers are in an excellent position to mount traffic-correlation attacks on Tor, particularly if they collude [1] [2]. The key observation in this paper is that, by manipulating BGP, a malicious AS can observe traffic that wouldn’t naturally flow through it.

BGP is an old protocol, originally specified in 1989; like most of our older protocols, it assumes that all participants are cooperative and honest. Any backbone router can announce that it is now capable of forwarding packets to a prefix (set of IP addresses) and the rest of the network will believe it. Incidents where traffic is temporarily redirected to an AS that either can’t get it to the destination at all, or can only do so suboptimally, are commonplace, and people argue about how malicious these are. [3] [4] [5] Suppose an adversary can observe one end of a Tor circuit—perhaps they control the ISP for a Tor client. They also have some reason to suspect a particular destination for the traffic. They use BGP to hijack traffic to the suspected destination, passing it on so that the client doesn’t notice anything. They can now observe both ends of the circuit and confirm their suspicions. They might not get to see traffic in both directions, but the authors also demonstrate that a traffic-correlation attack works in principle even if you can only see the packet flow in one direction, thanks to TCP acknowledgments.

Making this worse, high-bandwidth, long-lived Tor relays (which are disproportionately likely to be used for either end of a circuit) are clustered in a small number of ASes worldwide. This means an adversary can do dragnet surveillance by hijacking all traffic to some of those ASes; depending on its own position in the network, this might not even appear abnormal. The adversary might even be one of those ASes, or an agency in a position to lean on its operators.

The countermeasures proposed in this paper are pretty weak; they would only operate on a timescale of hours to days, whereas a BGP hijack can happen, and stop happening, in a matter of minutes. I don’t see a good fix happening anywhere but in the routing protocol itself. Unfortunately, routing protocols that do not assume honest participants are still a topic of basic research. (I may get to some of those papers eventually.) There are proposals for adding a notion of this AS is authorized to announce this prefix to BGP [6] but those have all the usual problems with substituting I trust this organization for I have verified that this data is accurate.