Identifying Webbrowsers in Encrypted Communications

If you are a website, it is fairly easy to identify the web browser in use by each of your visitors, even if they take steps to suppress the blatant things like the User-Agent header. [1] [2] It is so easy, in fact, that researchers typically try to make it harder for themselves, trying instead to identify individual users even as they move around, change IP addresses, flush their cookies, etc. [3] [4]

If you are a passive eavesdropper in between the browser and the website, and the network traffic is encrypted, and particularly if you are isolated from the client’s IP address by anonymizing relays (e.g. Tor), the task should logically be much harder. Or is it? The authors of this short paper did the most obvious thing: capture packet traces and throw them at an off-the-shelf machine classifier. The feature vectors seen by the machine classifier are not described as clearly as I’d like, but I think they divided the trace into equal-length intervals and aggregated packet sizes in each direction in each interval; this is also one of the most basic and obvious things to do (the future work bit talks a little about better feature engineering). Despite the lack of tuning, they report 70–90% classification accuracy on a four-way choice among browsers (Chrome, Firefox, IE, Tor Browser) and 30–80% accuracy for a 13-way choice among browser and plugin combinations (by which they seem to mean whether or not JavaScript and Flash were enabled) (note that for a 13-way choice, chance performance would be 7.7% accuracy).

This is a short workshop paper, so it is to be expected that the results are a little crude and have missing pieces. The authors already know they need to tune their classifier. I hope someone has told them about ROC curves; raw accuracies make me grind my teeth. Besides that, the clear next step is to figure out what patterns the classifiers are picking up on, and then how to efface those patterns. I think it’s quite likely that the signal they see depends on gross characteristics of the different HTTP implementations used by each browser; for instance, at time of publication, Chrome implemented SPDY and QUIC, and the others didn’t.

The paper contains some handwaving in the direction of being able to fingerprint individual users with this information, but I’d want to see detectable variation among installations of the same browser before I’d be convinced that’s possible.

Game-theoretic Patrolling Strategies for Intrusion Detection in Collaborative Peer-to-Peer Networks

Commercial intrusion detection systems are designed for corporate networks; they almost always assume a small number of choke points between internal and external networks, and often they also assume centralized control of all the devices on the internal network. Neither assumption is valid for a peer-to-peer overlay network, where there are typically a large number of mutually distrusting human agencies operating a small number of network peers each, and the routes between them are diverse.

It might seem that in the peer-to-peer environment, each node would have no choice but to run its own IDS. However, if we are willing to assume some degree of trust vested in other node operators, perhaps the task could be delegated. That’s the germ of this paper. For an idealized peer-to-peer network, they derive a game-theoretically optimal strategy for rotating the job of running the IDS around all the super-peers (long-lived nodes with extra responsibilities; many real P2P networks have such nodes).

I like the concept, but the idealized scenario they used may be too idealized to be applicable in real life. Key assumptions which probably don’t hold include:

  • The attacker does not already control any super-peers.
  • The IDS is perfect: that is, if attack traffic passes through a node running an IDS, the attack will be detected and blocked.
  • The attacker’s goal is to take control of, or deny availability of, a specific set of super-peers.
  • The defender can predict in advance which nodes will be attacked. (I would accept this if it were probabilistic, e.g. assuming that the attacker is more likely to target nodes that contribute more to to the overall network capacity.)

I think a more realistic model would go something like this: The attacker is assumed already to control some fraction of the super-peers. (The attacker may also mount attacks from other computers, either independently or in collaboration with malicious super-peers.) The attacker seeks to avoid detection, and so does not mount overt attacks on other super-peers; instead, it has some strategy for violating the protocol to achieve an adversarial goal (e.g. forging blockchain transactions, deanonymizing users, delivering false data to users) The malicious peers execute the protocol honestly most of the time, but sometimes break the rules. The defender’s goal is to detect peers that are violating the protocol often enough that this can’t be an accident, while not wasting too many resources on monitoring overhead.

Note: This paper is said to have been published in the International Conference on Secure Knowledge Management in Big-data era, 2014 but I cannot confirm this, as the conference website no longer exists and the Internet Archive’s copy does not include a program.

Defending Tor from Network Adversaries: A Case Study of Network Path Prediction

In a similar vein as Tuesday’s paper, this is an investigation of how practical it might be to avoid exposing Tor circuits to traffic analysis by an adversary who controls an Autonomous System. Unlike Tuesday’s paper, they assume that the adversary does not manipulate BGP to observe traffic that they shouldn’t have seen, so the concern is simply to ensure that the two most sensitive links in the circuit—from client to entry, and from exit to destination—do not pass through the same AS. Previous papers have suggested that the Tor client should predict the AS-level paths involved in these links, and select entries and exits accordingly [1] [2]. This paper observes that AS path prediction is itself a difficult problem, and that different techniques can give substantially different results. Therefore, they collected traceroute data from 28 Tor relays and compared AS paths inferred from these traces with those predicted from BGP monitoring (using the algorithm of On AS-Level Path Inference [3]).

The core finding is that traceroute-based AS path inference does indeed give substantially different results from BGP-based path prediction. The authors assume that traceroute is more accurate; the discrepancy is consistently described as an error in the BGP-based prediction, and (since BGP-based prediction tends to indicate exposure to more different ASes) as overstating the risk exposure of any given Tor link. This seems unjustified to me. The standard traceroute algorithm is known to become confused in the presence of load-balancing routers, which are extremely common in the backbone [4]; refinements have been proposed (and implemented in the scamper tool used in this paper) but have problems themselves [5] [6]. More elementally, traceroute produces a snapshot: these UDP packets did take this route just now. Tor links are relatively long-lived TCP connections (tens of minutes) which could easily be rerouted among several different paths over their lifetime. I think it would be better to say that BGP path prediction produces a more conservative estimate of the ASes to which a Tor link could be exposed, and highlight figuring out which one is more accurate as future work.

A secondary finding is that AS-aware path selection by the Tor client interacts poorly with the guard policy, in which each Tor client selects a small number of entry nodes to use for an extended period. These nodes must be reliable and high-bandwidth; the economics of running a reliable, high-bandwidth Internet server mean that they are concentrated in a small number of ASes. Similar economics apply to the operation of exit nodes, plus additional legal headaches; as a result, it may not be possible to find any end-to-end path that obeys both the guard policy and the AS-selection policy. This situation is, of course, worsened if you take the more conservative, BGP-based estimation of AS exposure.

I’ve been concerned for some time that guards might actually be worse for anonymity than the problem they are trying to solve. The original problem statement [7] is that if you select an entry node at random for each circuit, and some fraction of entry nodes are malicious, with high probability you will eventually run at least one circuit through a malicious entry. With guards, either all your circuits pass through a malicious entry for an extended period of time, or none do. My fundamental concern with this is, first, having all your traffic exposed to a malicious entry for an extended period is probably much worse for your anonymity than having one circuit exposed every now and then; second, the hypothetical Tor adversary has deep pockets and can easily operate reliable high-bandwidth nodes, which are disproportionately likely to get picked as guards. Concentration of guards in a small number of ASes only makes this easier for the adversary; concentration of guards together with exits in a small number of ASes makes it even easier. It’s tempting to suggest a complete about-face, preferentially choosing entry nodes from the low-bandwidth, short-lived population and using them only for a short time; this would also mean that entry nodes could be taken from a much broader pool of ASes, and it would be easier to avoid overlap with the AS-path from exit to destination.

Green Lights Forever: Analyzing the Security of Traffic Infrastructure

Consider the humble traffic light. The basic design dates to the 1890s, the signals it displays can be explained to a toddler, they’re everywhere and no one thinks anything of them, but if they malfunction an entire city could be paralyzed. Also, for longer than you might realize, they have not been controlled by simple, independent timing circuits; coordinated light patterns, intended to ensure that traffic flows smoothly along the entire length of a boulevard, date to the 1940s. Nowadays each light’s control box tends to have a computer inside, and they talk to each other on radio links, because running wires between all the intersections is expensive.

Today’s paper is from WOOT 2014 and is a case study of the security of such networks of traffic lights. If you have any experience with embedded device security you can probably guess how it turned out: The radio links are in the clear. The protocols are proprietary, but easy to reverse-engineer (or acquire software for). The computers run an outdated proprietary embedded OS and there’s no mechanism for ensuring it gets patched when necessary. Operating-system-level debugging interfaces with complete control over the device may have been left active and exposed to the network. Management software fails to work correctly if passwords are changed from the vendor’s default. And once you’re connected, you can completely reconfigure the traffic lights—which is by design, so adjustments don’t require sending a maintenance crew to every intersection.

The saving grace is the malfunction management unit. This is a separate circuit board, built with discrete logic and configured with physical jumpers, which prevents the light at any single intersection from displaying an unsafe combination of signals (e.g. green in all directions). If the computer tries to put the lights in a state that isn’t on the MMU’s whitelist, or change them too rapidly, it will be overridden and the lights will enter a failsafe mode (such as blinking red in all directions) until someone physically resets the control box. This safety mechanism limits the damage an attacker can do, but it’s still perfectly possible to bring all traffic to a halt, malcoordinate the lights to hinder rather than help traffic flow (which might go completely unnoticed) or just give yourself green lights all the time at everyone else’s expense.

This paper is probably more valuable for its broader lessons (section 6) than for this one case study. Why is it that security experts are not surprised when yet another embedded, safety-critical system turns out to be wide open to remote explotation? The authors call this phenomenon the security phase change. You have an industry with decades of experience designing, manufacturing, and selling widgets which do not communicate with the outside world. The engineers have a solid understanding of how the widget might fail due to accidents, wear and tear, manufacturing defects, etc, and what needs to be done to make sure it fails safe—that malfunction management unit is not in the traffic light controller by chance. But no one thinks about failures due to malice, because why would anyone go around to each and every traffic light and monkeywrench it? It would take hours. Someone would probably call the police.

To such a widget, features are incrementally added, each desirable in its own terms. Microcontrollers are cheaper and more flexible than fixed-function logic boards. Radio links are cheaper and more flexible than running wires down the street. If the fire trucks can override the lights, they can get to the fire faster. If all the lights report diagnostics to a central monitoring station, we can dispatch repair crews more efficiently and not need to make as many inspections.

The security phase change happens when a series of these changes culminates in a device that has a general-purpose computer inside, connected directly to a network that anyone can tap into with relatively little effort. Suddenly the adversary doesn’t need to drive around the entire city to monkeywrench all the lights. Or crack open the hood of your car to get at the engine control computer’s maintenance interface. Or be physically onsite at your chemical plant to mess with the process controller. I understand jet engines are a service now. I would like to think that those engineers have thought about channel security, at the very least.

Anonymity on QuickSand: Using BGP to Compromise Tor

One of the oldest research threads regarding Tor is trying to figure out how close you could get in real life to the global passive adversary that’s known to be able to deanonymize all communications. This is a new entry in that line of research, from HotNets 2014.

At the largest scale, the global Internet is administratively divided into autonomous systems (ASes) that exchange traffic, using BGP for configuration. Any given AS can only communicate with a small number of direct peers, so a stream of packets will normally pass through many different ASes on the way to its destination. It’s well-known that AS-operated backbone routers are in an excellent position to mount traffic-correlation attacks on Tor, particularly if they collude [1] [2]. The key observation in this paper is that, by manipulating BGP, a malicious AS can observe traffic that wouldn’t naturally flow through it.

BGP is an old protocol, originally specified in 1989; like most of our older protocols, it assumes that all participants are cooperative and honest. Any backbone router can announce that it is now capable of forwarding packets to a prefix (set of IP addresses) and the rest of the network will believe it. Incidents where traffic is temporarily redirected to an AS that either can’t get it to the destination at all, or can only do so suboptimally, are commonplace, and people argue about how malicious these are. [3] [4] [5] Suppose an adversary can observe one end of a Tor circuit—perhaps they control the ISP for a Tor client. They also have some reason to suspect a particular destination for the traffic. They use BGP to hijack traffic to the suspected destination, passing it on so that the client doesn’t notice anything. They can now observe both ends of the circuit and confirm their suspicions. They might not get to see traffic in both directions, but the authors also demonstrate that a traffic-correlation attack works in principle even if you can only see the packet flow in one direction, thanks to TCP acknowledgments.

Making this worse, high-bandwidth, long-lived Tor relays (which are disproportionately likely to be used for either end of a circuit) are clustered in a small number of ASes worldwide. This means an adversary can do dragnet surveillance by hijacking all traffic to some of those ASes; depending on its own position in the network, this might not even appear abnormal. The adversary might even be one of those ASes, or an agency in a position to lean on its operators.

The countermeasures proposed in this paper are pretty weak; they would only operate on a timescale of hours to days, whereas a BGP hijack can happen, and stop happening, in a matter of minutes. I don’t see a good fix happening anywhere but in the routing protocol itself. Unfortunately, routing protocols that do not assume honest participants are still a topic of basic research. (I may get to some of those papers eventually.) There are proposals for adding a notion of this AS is authorized to announce this prefix to BGP [6] but those have all the usual problems with substituting I trust this organization for I have verified that this data is accurate.

The Correctness-Security Gap in Compiler Optimization

At this year’s LangSec Workshop, a paper on formal analysis of cases where compiled-code optimizations can introduce security holes that weren’t present in the source code. This class of problems has been a concern for some time; for a general overview, see this series of posts on John Regehr’s blog: [1] [2] [3]. I’ll give one concrete example, which happens to be the running example in the paper.

void crypt (...)
{
    key = load_secret_key();
    ...;          // work with the secure key
    key = 0x0;    // scrub memory
}

No correct C program can observe values in local variables after a function returns, so the compiler is very likely to delete the final store to key as unnecessary (dead in compiler jargon). However, the programmer’s intent was to ensure that the secret value in key does not survive in memory after crypt returns, where it might be accessible to malicious code—the adversary is not bound by the definition of correct C program. By removing the dead store, the compiler introduces a security hole.

This paper proposes a general strategy for analyzing compiler optimizations to determine whether or not they can introduce security holes. They observe that the definition of correct C program is in terms of an abstract machine that ignores or leaves undefined many low-level nuances. Continuing with the above example, the C standard says only that if code executing in the same program attempts to access the memory allocated to key after crypt returns, the behavior is undefined—anything at all can happen. In particular, reading from that memory might legitimately return either zero or the secret key … or some other value altogether, or it might crash the program. And the standard takes no notice at all of the possibility that another program (such as a debugger, or malware) might gain access to this program’s memory.

The real computer on which the program is executing, by contrast, exhibits some concrete and (usually) deterministic behavior for any arbitrary sequence of machine operations. Even when the architecture specification leaves an outcome undefined, some concrete thing will happen on a physical CPU and it will usually be the same concrete thing for all instances of that model of CPU. This paper’s proposal is simply that compiler optimizations have the potential to introduce security holes whenever they change the observable behavior of the physical machine. Therefore, a correctness proof for an optimization can be converted into a does not introduce security holes proof by changing its model from the abstract machine to a physical machine. In the crypt example, deleting the final store to key is invisible in the abstract machine, and therefore a valid optimization of the original code. But in the physical machine, it means the previous, secret value persists on the stack (a concept which doesn’t even exist in the C abstract machine) and may be retrievable later. If the stack is included in the machine model, the original correctness proof for dead-store elimination will detect the information leak.

Dead-store elimination is well understood as a source of this kind of problem, and library-level band-aids exist. I suspect the greatest value of this technique will be in identifying other optimizations that can potentially introduce information leaks. There is some discussion of this in the paper (common subexpression elimination as a source of timing-channel vulnerabilities) but they do not take it all the way to a model. It would be interesting to know, for instance, whether the scary-sophisticated algebraic loop-nest transformations that you might want applied to your cryptographic primitives can introduce data-dependent timing variation that wasn’t present in the source code.

Sadly, this technique by itself is not enough to prevent compiler-introduced vulnerabilities; a dead-store elimination pass that was valid in the physical machine for all memory writes would wind up doing almost nothing, which is not what we want. (The authors acknowledge this, implicitly, when they talk about the undesirability of turning all optimizations off.) To some extent, tightening up the language specification—making the abstract machine exhibit less undefined behavior—can help, by making the effects of optimization more predictable. John Regehr has a proposal for Friendly C along those lines; I think it throws several babies out with the bathwater, but as a starting point for discussion it’s worthwhile. It also won’t cure all the problems in this area: another example from the paper is

int *alloc(int nrelems)
{
    int size = nrelems * sizeof(int);  // may overflow
    if (size < nrelems) abort();       // attempt to check for overflow
    return malloc(size);
}

According to the rules of the C abstract machine, signed integer overflow causes undefined behavior, which means the compiler is entitled to assume that the comparison size < nrelems is always false. Unfortunately, making signed integer overflow well-defined does not render this code safe, because the check itself is inadequate: fixed-width twos-complement wraparound multiplication (which is the behavior of the integer multiply instruction on most CPUs) can produce values that are larger than either multiplicand but smaller than the mathematically correct result, even when one multiplicand is a small number. (In this example, with the usual sizeof(int) == 4, passing 1,431,655,766 for nrelems would produce 1,431,655,768 for size when it should be 5,726,623,064.) For the same reason, making all the variables unsigned does not help.

On a higher level, compiler authors will fight tooth and nail for aggressive optimization, because compiler authors (kinda by definition) like writing optimizations, but also because they’ve got to satisfy three conflicting user groups at once, and the latter two are more numerous and have deeper pockets:

  1. Systems programmers, who want their C to be translated directly and transparently to machine language, and believe that undefined behavior exists to allow variation in CPU behavior but not compiler behavior.

  2. Applications programmers, who want the compiler to crush as many abstraction penalties as possible out of their gigantic white-elephant C++ codebases which there will never be time or budget to clean up.

  3. Number crunchers, who care about four things: speed, speed, not having to code in FORTRAN or assembly language anymore, and speed.

And what makes this extra fun is that cryptographers are simultaneously in group 1 and group 3. I’m not going to make ETAPS 2015, but I am very curious to know what DJB is going to say about the death of optimizing compilers there.

Security Analysis of Pseudo-Random Number Generators with Input

Coming up in this year’s CCS is a paper with the provocative subtitle /dev/random is not Robust, and thanks to widespread availability of online preprints, it’s already scored quite a bit of online attention, e.g. at Bruce Schneier’s blog and on Hacker News.

I’m going to pass over the alleged attacks on the Linux kernel CSPRNG, which are adequately covered in both of the above discussions; suffice it to say that in this paper robust is a precisely defined theoretical security property, and there’s a solid case that it’s stronger than necessary to guarantee practical security. What I’m interested in is the simple and very efficient PRNG construction that is provably robust, advertised in the abstract. CSPRNG-robustness (as defined here) may not be necessary for practical security, but if you can have it essentially for free, why wouldn’t you adopt it? Unfortunately it turns out that there’s a major palmed card in their definition of a CSPRNG with input, and another one in the construction itself, rendering the algorithm useless in practice.

The first problem is that the definition of a CSPRNG with input includes a setup phase that produces a random seed which is not known to the adversary. Despite consistently calling this item a public parameter, the text of the paper makes it clear that it must not be known to the adversary, and they consider this both essential and unproblematic:

Unfortunately, it is well known that no deterministic extractor is capable to simultaneously extract good randomness from all efficiently samplable high-entropy distributions (e.g. consider nearly full entropy distribution I which is random, except the first bit of Extract(I) is zero).

… [Therefore], we chose … to assume the existence of the setup procedure … this will allow one to consider a seeded extractor … which can now extract entropy from all high-entropy distributions. As a warning, this nice extra generality comes at a price that the [seed] is not passed to the [adversarial] distribution sampler …

But this is impossible to achieve in real life. If the seed is generated on the device that will use it, we have a chicken-and-egg problem, because initial startup is precisely when environmental randomness is least available; the authors should have been aware of actual, catastrophic exploits of systems with little or no access to randomness on first boot, e.g. Mining your Ps and Qs. If the seed is not generated on the device that will use it, then the adversary can probably learn it; in particular, if the seed for all devices of a particular model is embedded in a firmware image that anyone can download off the manufacturer’s website, you have accomplished precisely nothing.

The second problem is in the technical details of their proposed CSPRNG construction, which I will now quote verbatim.

Let G:{0,1}m{0,1}n+l be a (deterministic) pseudorandom generator where m<n. We use the notation [y]1m to denote the first m bits of y{0,1}n. Our construction of PRNG with input has parameters n (state length), l (output length), and p=n (sample length), and is defined as follows:

  • setup: Output seed=(X,X){0,1}2n.
  • S=refresh(S,I): Given seed=(X,X), current state S{0,1}n, and a sample I{0,1}n, output S=SX+I, where all operations are over 𝔽2n.
  • (S,R)=next(S): Given seed=(X,X) and a state S{0,1}n, first compute U=[XS]1m. Then output (S,R)=G(U).

It’s a blink-and-you-miss-it sort of thing: by all operations are over 𝔽2n they are specifying that both the refresh and next steps make use of arithmetic over finite Galois fields (more usually referred to as GF(2n) in crypto literature). n has to be large—section 6.1 of the paper suggests specific fields with n = 489, 579, and 705. I am informed that such arithmetic is awful to implement in software, not just difficult but risking side-channel attacks if not done perfectly, and often quite slow.

Both problems arise from the same source, namely wanting to build the algorithm around a keyed hash so that the entropy extractor is fully general, and then taking a mathematical construct with convenient theoretical but inconvenient practical properties as that hash. If we back down to the use of an unkeyed, but computationally secure, hash as the entropy extractor, we’d have to redo the analysis, but I think it ought to be possible to arrange that, without any secret unknown to the adversary, it is still computationally infeasible for the adversary to bias the entropy pool; and we could then pick that hash for its practical utility.

… Oddly enough, this hypothetical construction looks an awful lot like the existing Linux CSPRNG.

Analysis of the HTTPS Certificate Ecosystem

The Internet Measurement Conference brings us an attempt to figure out just how X.509 server certificates are being used in the wild, specifically for HTTPS servers. Yet more specifically, they are looking for endemic operational problems that harm security. The basic idea is to scan the entire IPv4 number space for servers responding on port 443, make note of the certificate(s) presented, and then analyze them.

This research question is nothing new; the EFF famously ran a similar study back in 2010, the SSL Observatory. And operational concerns about the way certificates are used in the wild go back decades; see Peter Gutmann’s slide deck Everything you Never Wanted to Know about PKI but were Forced to Find Out (PDF). What makes this study interesting is, first, it’s three years later; things can change very fast in Internet land (although, in this case, they have not). Second, the scale: the authors claim to have successfully contacted 178% more TLS hosts (per scan) and harvested 736% more certificates (in total, over the course of 110 scans spanning a little more than a year) than any previous such study.

What do we learn? Mostly that yeah, the TLS PKI is a big mess, and it hasn’t gotten any better since 2010. There are too many root CAs. There are far too many unconstrained intermediate certificates, and yet, at the same time, there are too few intermediates! (The point of intermediates is that they’re easy to replace, so if they get compromised you don’t have a catastrophe on your hands. Well, according to this paper, some 26% of all currently valid HTTPS server certificates are signed by one intermediate. No way is that going to be easy to replace if it gets compromised.) Lots of CAs ignore the baseline policies for certificate issuance and get away with it. (Unfortunately, the paper doesn’t say whether there are similar problems with the EV policies.)

Zoom out: when you have a piece of critical infrastructure with chronic operational issues, it’s a safe bet that they’re symptoms and the real problem is with operator incentives. The paper doesn’t discuss this at all, unfortunately, so I’ll throw in some speculation here. The browser vendors are notionally in the best position to Do Something about this mess, but they don’t: because the only real option they have is to delete root certificates from the Official List. Not only does this tend to put the offending CA out of business, it also causes some uncertain-but-large number of websites (most or all of which didn’t do anything wrong) to stop working. Such a drastic sanction is almost never seen to be appropriate. Browsers have hardly any good positive incentives to offer the CAs to do things right; note that EV certificates, which get special treatment in the browser UI and can therefore be sold at a premium, do come with a tighter set of CA requirements (stronger crypto, reliable OCSP, that sort of thing) which are, as far as I’m aware, followed.

Zoom out again: there’s no shortage of technical suggestions that could turn into less drastic sanctions and incentives for the CAs, but they never get implemented: why? Well, you ask me, I say it’s because both OpenSSL and NSS are such terrible code that nobody wants to hack on them, and the brave souls who do it anyway are busy chipping away at the mountain of technical debt and/or at features that are even more overdue. This, though, we know how to fix. It only takes money.