A Survey on Encrypted Traffic Classification

This paper gives a brief overview of techniques for extracting information from encrypted network traffic, covering the past five or so years, mostly focusing on the (perceived) network-management need to know what application is communicating within an encrypted tunnel. It’s not very good—it’s a big list of other papers, each earning a couple sentences of description, and not much else. I would not be surprised to learn that the authors have completely missed important pieces of literature in this area.

I’m writing it up here for two reasons. First, it’s useful to read mediocre papers in order to learn how to do better. In this case, the missing element is synthesis. A good survey paper is research upon the existing published research and, like primary research, has a result of its own to demonstrate. Typically, that result will have something to do with the connections among the articles reviewed. It will highlight and compare lines of research, or it will demonstrate how people’s thinking about the topic has changed over time, or it will highlight gaps where new work would be useful, or perhaps all of the above. A big list of articles, briefly described and classified, is only the skeleton of such a survey.

Second, survey papers are often an entry point into the literature for people outside a subfield. Those people are less likely to share assumptions with the people deeply embedded in a subfield. For instance: in this paper, the authors consistently assume, with only a brief handwave in the direction of a justification, that it is necessary for network management tools to be able to identify at least the protocol and perhaps also something about the data being carried over an encrypted channel. Now, me, I tend to think that if a box in the middle can extract any information from an encrypted channel, that’s a bug in the cryptosystem. And I am working in very nearly this subfield, so I already know why the network-management people think they need to be able to extract data from an encrypted channel. Someone coming at this paper from an application-security, end-user privacy, or theory-of-crypto perspective might have an even stronger negative reaction. And that’s the thing: the authors should have anticipated that people who don’t share their assumptions would read their paper, so they should have taken steps to back up those assumptions.

Large-scale Spatiotemporal Characterization of Inconsistencies in the World’s Largest Firewall

Lots of academic research on Internet censorship treats the countries doing the censorship as monoliths: that is, measurements will typically only be conducted from one client in one fixed location (often a commercial VPS or colocation provider), and the results are assmued to reflect the situation countrywide. When you’re talking about a country as large as China, that assumption seems poorly justified, and there have been several studies aiming to collect more fine-grained information. [1] [2] [3] This paper is in that line of research, with a systematic survey of censorship of one application (Tor) in roughly 150 different locations across China, repeating the measurement at hourly intervals for 27 days. The measurement clients are diverse both in terms of geographic location and network topology.

The results largely confirm what was already suspected. This particular application is indeed blocked consistently across China, with the possible exception of CERNET (China Education and Research Network), whose filtering is less aggressive. The filtering occurs at major China-wide IXPs, as suspected from previous studies. The firewall appears to operate primarily by dropping inbound traffic to China; the authors don’t try to explain this, but earlier related research [4] points out that the firewall must wait to see a TCP SYN/ACK packet before it can successfully forge RST packets in both directions. Finally, there is concrete evidence for failures, lasting hours at a time, uncorrelated with geographic location, where traffic passes uncensored. This was anecdotally known to happen but not previously studied in any kind of detail, to my knowledge. This paper doesn’t speculate at all on why the failures happen or how we could figure that out, which I think is unfortunate.

The techniques used to collect the data are more interesting, at least to me. The principal method is called hybrid idle scanning, first presented by some of the same authors in a different paper [5]. It allows a measurement host to determine whether a client can complete a TCP handshake with a server, without itself being either the client or the server; if the handshake does not complete successfully, it reveals whether client-server or server-client packets are being lost. It does rely on an information leak in older client TCP stacks (predictable IP-ID sequences, [6]) but millions of hosts worldwide still run operating systems with these bugs—the authors report an estimate that they they comprise 1% of the global IPv4 address space. Thus, it’s possible to find a measurement client in any geographic location with reasonably common Internet usage. Data from this technique is backed up with more detailed information from traceroutes and SYN probes from a smaller number of locations. They describe a previously-unreported server-side information leak in Linux’s handling of half-open TCP connections, which can be used to study what IP-based blacklisting of a server looks like to that server, without access to that server.

I’m also impressed with the authors’ systematic presentation of the hypotheses they wanted to test and how they chose to test each of them. Anyone interested in network measurements could probably learn something about how to structure an experiment from this paper.

The Security Flag in the IPv4 Header

Today I propose to explain a classic joke—or perhaps I should call it a parable instead.

RFC 3514 defines a bit in the IPv4 packet header as the evil bit: packets with malicious intent MUST set that bit, benign packets MUST NOT set it. It follows that a firewall router can protect its network from all attacks by dropping packets with the evil bit set (and the RFC requires them to do so). Yay! Network security is solved, we can all go home!

From the fact that we are all still chained to our desks twelve years later, you may deduce that network security is not solved. The flaw is obvious, once pointed out: nothing can force an attacker to set the evil bit. Nonetheless, there are people who believe that something resembling this ought to work; Matthew Skala called that belief the phenomenon of Colour back in 2004 [1] [2]. Bellovin did not intend his RFC to be implemented; he knew perfectly well it would not help at all. But by writing it out so baldly, he gave us something to point at when people propose the same thing in less obvious form, which happens all the time. I’m going to give three examples in increasing order of intractability; the first does actually have a technical solution (sort of), the second is a long-standing open problem which could have a technical solution (but no one’s thought of it yet), and the third is known to have no solution at all.

The evil bit is the difference between a cryptographic hash and a digital signature. A cryptographic hash is a large but fixed-length number (usually written as a string of hexadecimal) algorithmically derived from a document. Many documents produce the same hash, but it’s computationally infeasible to find two documents with the same hash—that is, nobody knows a better way to do it than guess-and-check, which would take so long that the Sun would be burnt out before you were done. Lots of people think that this means: when I give you a document and its hash, you can recompute the hash and if you get the same thing, you must have the document I meant to give you. It is not so. Mallory, the evil mastermind who can tamper with everything I send you, can change the document, and then change the hash to be the hash of Mallory’s document, and you’ll never know the difference. Mallory can’t do this with a digital signature, which is also a large but fixed-length number algorithmically derived from a document, because—unlike a hash—the algorithm depends on information that only I have (namely, my signing key). That you don’t also need that information to check the signature is the miracle of public key cryptography.

Public key cryptography isn’t perfect. The evil bit is also the difference between my public key, truthfully labeled with my name and address, and Mallory’s public key, which is also labeled with my name and address, because nothing stops Mallory from doing that if Mallory wants. How do you tell the difference? Well, you can look at https://www.owlfolio.org/contact/ where I have posted a string of hexadecimal called the fingerprint of my public key, and painstakingly compare that to the fingerprint of the key you just pulled off the PGP keyservers, and—wait a minute, how do you know that’s my website you’re looking at? How do you know Mallory hasn’t modified the fingerprint, and carefully left everything else alone?

Well, it’s an HTTPS website, so every page has a digital signature that the browser verifies for you, and you can click on the lock icon and observe that GeoTrust, Inc. assures you that you are talking to a web server operated by the person who controls the domain name owlfolio.org, and then you can consult whois and learn that that person is known only as Domain Administrator at a P.O. box in Florida, because spam. That didn’t help much, did it? Besides, why should you believe GeoTrust, Inc. about this, and how do you know you have their public key? If you know anything about the certificate authority system you will know that it is, in fact, turtles all the way down, and each one of those turtles could be Mallory in disguise. There are other ways you could attempt to decide whose public key you have, but those are also turtles all the way down.

Enough about Mallory, though; let’s talk about Reyes. Special Agent Reyes of the F.B.I., that is. Reyes’ job is tracking down murderers. Sometimes Reyes could catch the murderers faster if he could listen in on their telephone conversations; sometimes this might, in fact, save someone’s life. The F.B.I. wants Reyes to catch murderers as fast as possible, so they get Congress to pass a law that requires telephone companies to facilitate wiretapping. Reyes knows most of what people talk about on the phone is useless to him, even when those people are murderers, so he only listens in when he’s sure he will hear something that will help one of his cases.

The F.B.I. also employs Special Agent Spender. Spender does not track down murderers, he monitors extremist political groups. He uses a broad definition of extremism: if someone is even slightly famous and has said something negative about the government in the last 20 years, Spender has a file on them, and all their friends too. Spender’s job is also made considerably easier if he can listen in on these people’s telephone conversations. Unlike Reyes, Spender records all their telephone conversations, all the time, because Spender is looking for dirt. Spender wants to be able to publicly humiliate each and every one of those people, at a moment’s notice, should it become necessary. Who decides if it’s necessary? Spender.

The difference between Reyes and Spender is the evil bit.

Identifying Webbrowsers in Encrypted Communications

If you are a website, it is fairly easy to identify the web browser in use by each of your visitors, even if they take steps to suppress the blatant things like the User-Agent header. [1] [2] It is so easy, in fact, that researchers typically try to make it harder for themselves, trying instead to identify individual users even as they move around, change IP addresses, flush their cookies, etc. [3] [4]

If you are a passive eavesdropper in between the browser and the website, and the network traffic is encrypted, and particularly if you are isolated from the client’s IP address by anonymizing relays (e.g. Tor), the task should logically be much harder. Or is it? The authors of this short paper did the most obvious thing: capture packet traces and throw them at an off-the-shelf machine classifier. The feature vectors seen by the machine classifier are not described as clearly as I’d like, but I think they divided the trace into equal-length intervals and aggregated packet sizes in each direction in each interval; this is also one of the most basic and obvious things to do (the future work bit talks a little about better feature engineering). Despite the lack of tuning, they report 70–90% classification accuracy on a four-way choice among browsers (Chrome, Firefox, IE, Tor Browser) and 30–80% accuracy for a 13-way choice among browser and plugin combinations (by which they seem to mean whether or not JavaScript and Flash were enabled) (note that for a 13-way choice, chance performance would be 7.7% accuracy).

This is a short workshop paper, so it is to be expected that the results are a little crude and have missing pieces. The authors already know they need to tune their classifier. I hope someone has told them about ROC curves; raw accuracies make me grind my teeth. Besides that, the clear next step is to figure out what patterns the classifiers are picking up on, and then how to efface those patterns. I think it’s quite likely that the signal they see depends on gross characteristics of the different HTTP implementations used by each browser; for instance, at time of publication, Chrome implemented SPDY and QUIC, and the others didn’t.

The paper contains some handwaving in the direction of being able to fingerprint individual users with this information, but I’d want to see detectable variation among installations of the same browser before I’d be convinced that’s possible.

Game-theoretic Patrolling Strategies for Intrusion Detection in Collaborative Peer-to-Peer Networks

Commercial intrusion detection systems are designed for corporate networks; they almost always assume a small number of choke points between internal and external networks, and often they also assume centralized control of all the devices on the internal network. Neither assumption is valid for a peer-to-peer overlay network, where there are typically a large number of mutually distrusting human agencies operating a small number of network peers each, and the routes between them are diverse.

It might seem that in the peer-to-peer environment, each node would have no choice but to run its own IDS. However, if we are willing to assume some degree of trust vested in other node operators, perhaps the task could be delegated. That’s the germ of this paper. For an idealized peer-to-peer network, they derive a game-theoretically optimal strategy for rotating the job of running the IDS around all the super-peers (long-lived nodes with extra responsibilities; many real P2P networks have such nodes).

I like the concept, but the idealized scenario they used may be too idealized to be applicable in real life. Key assumptions which probably don’t hold include:

  • The attacker does not already control any super-peers.
  • The IDS is perfect: that is, if attack traffic passes through a node running an IDS, the attack will be detected and blocked.
  • The attacker’s goal is to take control of, or deny availability of, a specific set of super-peers.
  • The defender can predict in advance which nodes will be attacked. (I would accept this if it were probabilistic, e.g. assuming that the attacker is more likely to target nodes that contribute more to to the overall network capacity.)

I think a more realistic model would go something like this: The attacker is assumed already to control some fraction of the super-peers. (The attacker may also mount attacks from other computers, either independently or in collaboration with malicious super-peers.) The attacker seeks to avoid detection, and so does not mount overt attacks on other super-peers; instead, it has some strategy for violating the protocol to achieve an adversarial goal (e.g. forging blockchain transactions, deanonymizing users, delivering false data to users) The malicious peers execute the protocol honestly most of the time, but sometimes break the rules. The defender’s goal is to detect peers that are violating the protocol often enough that this can’t be an accident, while not wasting too many resources on monitoring overhead.

Note: This paper is said to have been published in the International Conference on Secure Knowledge Management in Big-data era, 2014 but I cannot confirm this, as the conference website no longer exists and the Internet Archive’s copy does not include a program.

Defending Tor from Network Adversaries: A Case Study of Network Path Prediction

In a similar vein as Tuesday’s paper, this is an investigation of how practical it might be to avoid exposing Tor circuits to traffic analysis by an adversary who controls an Autonomous System. Unlike Tuesday’s paper, they assume that the adversary does not manipulate BGP to observe traffic that they shouldn’t have seen, so the concern is simply to ensure that the two most sensitive links in the circuit—from client to entry, and from exit to destination—do not pass through the same AS. Previous papers have suggested that the Tor client should predict the AS-level paths involved in these links, and select entries and exits accordingly [1] [2]. This paper observes that AS path prediction is itself a difficult problem, and that different techniques can give substantially different results. Therefore, they collected traceroute data from 28 Tor relays and compared AS paths inferred from these traces with those predicted from BGP monitoring (using the algorithm of On AS-Level Path Inference [3]).

The core finding is that traceroute-based AS path inference does indeed give substantially different results from BGP-based path prediction. The authors assume that traceroute is more accurate; the discrepancy is consistently described as an error in the BGP-based prediction, and (since BGP-based prediction tends to indicate exposure to more different ASes) as overstating the risk exposure of any given Tor link. This seems unjustified to me. The standard traceroute algorithm is known to become confused in the presence of load-balancing routers, which are extremely common in the backbone [4]; refinements have been proposed (and implemented in the scamper tool used in this paper) but have problems themselves [5] [6]. More elementally, traceroute produces a snapshot: these UDP packets did take this route just now. Tor links are relatively long-lived TCP connections (tens of minutes) which could easily be rerouted among several different paths over their lifetime. I think it would be better to say that BGP path prediction produces a more conservative estimate of the ASes to which a Tor link could be exposed, and highlight figuring out which one is more accurate as future work.

A secondary finding is that AS-aware path selection by the Tor client interacts poorly with the guard policy, in which each Tor client selects a small number of entry nodes to use for an extended period. These nodes must be reliable and high-bandwidth; the economics of running a reliable, high-bandwidth Internet server mean that they are concentrated in a small number of ASes. Similar economics apply to the operation of exit nodes, plus additional legal headaches; as a result, it may not be possible to find any end-to-end path that obeys both the guard policy and the AS-selection policy. This situation is, of course, worsened if you take the more conservative, BGP-based estimation of AS exposure.

I’ve been concerned for some time that guards might actually be worse for anonymity than the problem they are trying to solve. The original problem statement [7] is that if you select an entry node at random for each circuit, and some fraction of entry nodes are malicious, with high probability you will eventually run at least one circuit through a malicious entry. With guards, either all your circuits pass through a malicious entry for an extended period of time, or none do. My fundamental concern with this is, first, having all your traffic exposed to a malicious entry for an extended period is probably much worse for your anonymity than having one circuit exposed every now and then; second, the hypothetical Tor adversary has deep pockets and can easily operate reliable high-bandwidth nodes, which are disproportionately likely to get picked as guards. Concentration of guards in a small number of ASes only makes this easier for the adversary; concentration of guards together with exits in a small number of ASes makes it even easier. It’s tempting to suggest a complete about-face, preferentially choosing entry nodes from the low-bandwidth, short-lived population and using them only for a short time; this would also mean that entry nodes could be taken from a much broader pool of ASes, and it would be easier to avoid overlap with the AS-path from exit to destination.

Green Lights Forever: Analyzing the Security of Traffic Infrastructure

Consider the humble traffic light. The basic design dates to the 1890s, the signals it displays can be explained to a toddler, they’re everywhere and no one thinks anything of them, but if they malfunction an entire city could be paralyzed. Also, for longer than you might realize, they have not been controlled by simple, independent timing circuits; coordinated light patterns, intended to ensure that traffic flows smoothly along the entire length of a boulevard, date to the 1940s. Nowadays each light’s control box tends to have a computer inside, and they talk to each other on radio links, because running wires between all the intersections is expensive.

Today’s paper is from WOOT 2014 and is a case study of the security of such networks of traffic lights. If you have any experience with embedded device security you can probably guess how it turned out: The radio links are in the clear. The protocols are proprietary, but easy to reverse-engineer (or acquire software for). The computers run an outdated proprietary embedded OS and there’s no mechanism for ensuring it gets patched when necessary. Operating-system-level debugging interfaces with complete control over the device may have been left active and exposed to the network. Management software fails to work correctly if passwords are changed from the vendor’s default. And once you’re connected, you can completely reconfigure the traffic lights—which is by design, so adjustments don’t require sending a maintenance crew to every intersection.

The saving grace is the malfunction management unit. This is a separate circuit board, built with discrete logic and configured with physical jumpers, which prevents the light at any single intersection from displaying an unsafe combination of signals (e.g. green in all directions). If the computer tries to put the lights in a state that isn’t on the MMU’s whitelist, or change them too rapidly, it will be overridden and the lights will enter a failsafe mode (such as blinking red in all directions) until someone physically resets the control box. This safety mechanism limits the damage an attacker can do, but it’s still perfectly possible to bring all traffic to a halt, malcoordinate the lights to hinder rather than help traffic flow (which might go completely unnoticed) or just give yourself green lights all the time at everyone else’s expense.

This paper is probably more valuable for its broader lessons (section 6) than for this one case study. Why is it that security experts are not surprised when yet another embedded, safety-critical system turns out to be wide open to remote explotation? The authors call this phenomenon the security phase change. You have an industry with decades of experience designing, manufacturing, and selling widgets which do not communicate with the outside world. The engineers have a solid understanding of how the widget might fail due to accidents, wear and tear, manufacturing defects, etc, and what needs to be done to make sure it fails safe—that malfunction management unit is not in the traffic light controller by chance. But no one thinks about failures due to malice, because why would anyone go around to each and every traffic light and monkeywrench it? It would take hours. Someone would probably call the police.

To such a widget, features are incrementally added, each desirable in its own terms. Microcontrollers are cheaper and more flexible than fixed-function logic boards. Radio links are cheaper and more flexible than running wires down the street. If the fire trucks can override the lights, they can get to the fire faster. If all the lights report diagnostics to a central monitoring station, we can dispatch repair crews more efficiently and not need to make as many inspections.

The security phase change happens when a series of these changes culminates in a device that has a general-purpose computer inside, connected directly to a network that anyone can tap into with relatively little effort. Suddenly the adversary doesn’t need to drive around the entire city to monkeywrench all the lights. Or crack open the hood of your car to get at the engine control computer’s maintenance interface. Or be physically onsite at your chemical plant to mess with the process controller. I understand jet engines are a service now. I would like to think that those engineers have thought about channel security, at the very least.

Anonymity on QuickSand: Using BGP to Compromise Tor

One of the oldest research threads regarding Tor is trying to figure out how close you could get in real life to the global passive adversary that’s known to be able to deanonymize all communications. This is a new entry in that line of research, from HotNets 2014.

At the largest scale, the global Internet is administratively divided into autonomous systems (ASes) that exchange traffic, using BGP for configuration. Any given AS can only communicate with a small number of direct peers, so a stream of packets will normally pass through many different ASes on the way to its destination. It’s well-known that AS-operated backbone routers are in an excellent position to mount traffic-correlation attacks on Tor, particularly if they collude [1] [2]. The key observation in this paper is that, by manipulating BGP, a malicious AS can observe traffic that wouldn’t naturally flow through it.

BGP is an old protocol, originally specified in 1989; like most of our older protocols, it assumes that all participants are cooperative and honest. Any backbone router can announce that it is now capable of forwarding packets to a prefix (set of IP addresses) and the rest of the network will believe it. Incidents where traffic is temporarily redirected to an AS that either can’t get it to the destination at all, or can only do so suboptimally, are commonplace, and people argue about how malicious these are. [3] [4] [5] Suppose an adversary can observe one end of a Tor circuit—perhaps they control the ISP for a Tor client. They also have some reason to suspect a particular destination for the traffic. They use BGP to hijack traffic to the suspected destination, passing it on so that the client doesn’t notice anything. They can now observe both ends of the circuit and confirm their suspicions. They might not get to see traffic in both directions, but the authors also demonstrate that a traffic-correlation attack works in principle even if you can only see the packet flow in one direction, thanks to TCP acknowledgments.

Making this worse, high-bandwidth, long-lived Tor relays (which are disproportionately likely to be used for either end of a circuit) are clustered in a small number of ASes worldwide. This means an adversary can do dragnet surveillance by hijacking all traffic to some of those ASes; depending on its own position in the network, this might not even appear abnormal. The adversary might even be one of those ASes, or an agency in a position to lean on its operators.

The countermeasures proposed in this paper are pretty weak; they would only operate on a timescale of hours to days, whereas a BGP hijack can happen, and stop happening, in a matter of minutes. I don’t see a good fix happening anywhere but in the routing protocol itself. Unfortunately, routing protocols that do not assume honest participants are still a topic of basic research. (I may get to some of those papers eventually.) There are proposals for adding a notion of this AS is authorized to announce this prefix to BGP [6] but those have all the usual problems with substituting I trust this organization for I have verified that this data is accurate.

The Correctness-Security Gap in Compiler Optimization

At this year’s LangSec Workshop, a paper on formal analysis of cases where compiled-code optimizations can introduce security holes that weren’t present in the source code. This class of problems has been a concern for some time; for a general overview, see this series of posts on John Regehr’s blog: [1] [2] [3]. I’ll give one concrete example, which happens to be the running example in the paper.

void crypt (...)
{
    key = load_secret_key();
    ...;          // work with the secure key
    key = 0x0;    // scrub memory
}

No correct C program can observe values in local variables after a function returns, so the compiler is very likely to delete the final store to key as unnecessary (dead in compiler jargon). However, the programmer’s intent was to ensure that the secret value in key does not survive in memory after crypt returns, where it might be accessible to malicious code—the adversary is not bound by the definition of correct C program. By removing the dead store, the compiler introduces a security hole.

This paper proposes a general strategy for analyzing compiler optimizations to determine whether or not they can introduce security holes. They observe that the definition of correct C program is in terms of an abstract machine that ignores or leaves undefined many low-level nuances. Continuing with the above example, the C standard says only that if code executing in the same program attempts to access the memory allocated to key after crypt returns, the behavior is undefined—anything at all can happen. In particular, reading from that memory might legitimately return either zero or the secret key … or some other value altogether, or it might crash the program. And the standard takes no notice at all of the possibility that another program (such as a debugger, or malware) might gain access to this program’s memory.

The real computer on which the program is executing, by contrast, exhibits some concrete and (usually) deterministic behavior for any arbitrary sequence of machine operations. Even when the architecture specification leaves an outcome undefined, some concrete thing will happen on a physical CPU and it will usually be the same concrete thing for all instances of that model of CPU. This paper’s proposal is simply that compiler optimizations have the potential to introduce security holes whenever they change the observable behavior of the physical machine. Therefore, a correctness proof for an optimization can be converted into a does not introduce security holes proof by changing its model from the abstract machine to a physical machine. In the crypt example, deleting the final store to key is invisible in the abstract machine, and therefore a valid optimization of the original code. But in the physical machine, it means the previous, secret value persists on the stack (a concept which doesn’t even exist in the C abstract machine) and may be retrievable later. If the stack is included in the machine model, the original correctness proof for dead-store elimination will detect the information leak.

Dead-store elimination is well understood as a source of this kind of problem, and library-level band-aids exist. I suspect the greatest value of this technique will be in identifying other optimizations that can potentially introduce information leaks. There is some discussion of this in the paper (common subexpression elimination as a source of timing-channel vulnerabilities) but they do not take it all the way to a model. It would be interesting to know, for instance, whether the scary-sophisticated algebraic loop-nest transformations that you might want applied to your cryptographic primitives can introduce data-dependent timing variation that wasn’t present in the source code.

Sadly, this technique by itself is not enough to prevent compiler-introduced vulnerabilities; a dead-store elimination pass that was valid in the physical machine for all memory writes would wind up doing almost nothing, which is not what we want. (The authors acknowledge this, implicitly, when they talk about the undesirability of turning all optimizations off.) To some extent, tightening up the language specification—making the abstract machine exhibit less undefined behavior—can help, by making the effects of optimization more predictable. John Regehr has a proposal for Friendly C along those lines; I think it throws several babies out with the bathwater, but as a starting point for discussion it’s worthwhile. It also won’t cure all the problems in this area: another example from the paper is

int *alloc(int nrelems)
{
    int size = nrelems * sizeof(int);  // may overflow
    if (size < nrelems) abort();       // attempt to check for overflow
    return malloc(size);
}

According to the rules of the C abstract machine, signed integer overflow causes undefined behavior, which means the compiler is entitled to assume that the comparison size < nrelems is always false. Unfortunately, making signed integer overflow well-defined does not render this code safe, because the check itself is inadequate: fixed-width twos-complement wraparound multiplication (which is the behavior of the integer multiply instruction on most CPUs) can produce values that are larger than either multiplicand but smaller than the mathematically correct result, even when one multiplicand is a small number. (In this example, with the usual sizeof(int) == 4, passing 1,431,655,766 for nrelems would produce 1,431,655,768 for size when it should be 5,726,623,064.) For the same reason, making all the variables unsigned does not help.

On a higher level, compiler authors will fight tooth and nail for aggressive optimization, because compiler authors (kinda by definition) like writing optimizations, but also because they’ve got to satisfy three conflicting user groups at once, and the latter two are more numerous and have deeper pockets:

  1. Systems programmers, who want their C to be translated directly and transparently to machine language, and believe that undefined behavior exists to allow variation in CPU behavior but not compiler behavior.

  2. Applications programmers, who want the compiler to crush as many abstraction penalties as possible out of their gigantic white-elephant C++ codebases which there will never be time or budget to clean up.

  3. Number crunchers, who care about four things: speed, speed, not having to code in FORTRAN or assembly language anymore, and speed.

And what makes this extra fun is that cryptographers are simultaneously in group 1 and group 3. I’m not going to make ETAPS 2015, but I am very curious to know what DJB is going to say about the death of optimizing compilers there.

Security Analysis of Pseudo-Random Number Generators with Input

Coming up in this year’s CCS is a paper with the provocative subtitle /dev/random is not Robust, and thanks to widespread availability of online preprints, it’s already scored quite a bit of online attention, e.g. at Bruce Schneier’s blog and on Hacker News.

I’m going to pass over the alleged attacks on the Linux kernel CSPRNG, which are adequately covered in both of the above discussions; suffice it to say that in this paper robust is a precisely defined theoretical security property, and there’s a solid case that it’s stronger than necessary to guarantee practical security. What I’m interested in is the simple and very efficient PRNG construction that is provably robust, advertised in the abstract. CSPRNG-robustness (as defined here) may not be necessary for practical security, but if you can have it essentially for free, why wouldn’t you adopt it? Unfortunately it turns out that there’s a major palmed card in their definition of a CSPRNG with input, and another one in the construction itself, rendering the algorithm useless in practice.

The first problem is that the definition of a CSPRNG with input includes a setup phase that produces a random seed which is not known to the adversary. Despite consistently calling this item a public parameter, the text of the paper makes it clear that it must not be known to the adversary, and they consider this both essential and unproblematic:

Unfortunately, it is well known that no deterministic extractor is capable to simultaneously extract good randomness from all efficiently samplable high-entropy distributions (e.g. consider nearly full entropy distribution II which is random, except the first bit of Extract(I)\operatorname{Extract}(I) is zero).

… [Therefore], we chose … to assume the existence of the setup procedure … this will allow one to consider a seeded extractor … which can now extract entropy from all high-entropy distributions. As a warning, this nice extra generality comes at a price that the [seed] is not passed to the [adversarial] distribution sampler …

But this is impossible to achieve in real life. If the seed is generated on the device that will use it, we have a chicken-and-egg problem, because initial startup is precisely when environmental randomness is least available; the authors should have been aware of actual, catastrophic exploits of systems with little or no access to randomness on first boot, e.g. Mining your Ps and Qs. If the seed is not generated on the device that will use it, then the adversary can probably learn it; in particular, if the seed for all devices of a particular model is embedded in a firmware image that anyone can download off the manufacturer’s website, you have accomplished precisely nothing.

The second problem is in the technical details of their proposed CSPRNG construction, which I will now quote verbatim.

Let 𝐆:{0,1}m{0,1}n+l\mathbf{G}: \{0,1\}^m \to \{0,1\}^{n+l} be a (deterministic) pseudorandom generator where m<nm < n. We use the notation [y]1m[y]_1^m to denote the first mm bits of y{0,1}ny \in \{0,1\}^n. Our construction of PRNG with input has parameters nn (state length), ll (output length), and p=np = n (sample length), and is defined as follows:

  • setup:\operatorname{setup}: Output seed=(X,X){0,1}2n\text{seed} = (X, X^\prime)\leftarrow\{0,1\}^{2n}.
  • S=refresh(S,I):S^\prime = \operatorname{refresh}(S,I): Given seed=(X,X)\text{seed} = (X, X^\prime), current state S{0,1}nS \in \{0,1\}^n, and a sample I{0,1}nI \in \{0,1\}^n, output S=SX+IS^\prime = S \cdot X + I, where all operations are over 𝔽2n\mathbb{F}_{2^n}.
  • (S,R)=next(S):(S^\prime, R) = \operatorname{next}(S): Given seed=(X,X)\text{seed} = (X, X^\prime) and a state S{0,1}nS \in \{0,1\}^n, first compute U=[XS]1mU = [X^\prime \cdot S]_1^m. Then output (S,R)=𝐆(U)(S^\prime, R) = \mathbf{G}(U).

It’s a blink-and-you-miss-it sort of thing: by all operations are over 𝔽2n\mathbb{F}_{2^n} they are specifying that both the refresh and next steps make use of arithmetic over finite, nonprime fields (more usually referred to as GF(2n)\operatorname{GF}(2^n) in crypto literature). nn has to be large—section 6.1 of the paper suggests specific fields with nn = 489, 579, and 705. The known techniques for this kind of arithmetic are all very slow, require large look-up tables (and therefore permit side-channel attacks), or cannot be used for a software implementation. [1] [2] This makes them impractical for a cryptographic component of an operating system that cannot rely on special-purpose hardware.

Both problems arise from the same source, namely wanting to build the algorithm around a keyed hash so that the entropy extractor is fully general, and then taking a mathematical construct with convenient theoretical but inconvenient practical properties as that hash. If we back down to the use of an unkeyed, but computationally secure, hash as the entropy extractor, we’d have to redo the analysis, but I think it ought to be possible to arrange that, without any secret unknown to the adversary, it is still computationally infeasible for the adversary to bias the entropy pool; and we could then pick that hash for its practical utility.

… Oddly enough, this hypothetical construction looks an awful lot like the existing Linux CSPRNG.