Like Monday’s paper, today’s has not been officially published anywhere, and in fact appears never to have been finished. It’s on a fairly arcane topic, and one that’s not directly related to security—as I said a while back, floating-point numbers usually aren’t used in security-critical algorithms. However, it deserves more attention, being both a case of everyone everywhere is Doing It Wrong, and a nice example of the danger of forgetting that an abstraction is hiding something from you.

For numerical and statistical computation, one frequently needs random floating-point numbers in the open interval $(0,\,1)$. However, all off-the-shelf pseudorandom number generators produce either random integers, or a stream of random bits (which is functionally the same thing—bits are trivially put together into integers and vice versa). The easy, obvious way to bridge the gap is to divide (in floating point) the integers by the largest value the PRNG can produce. This is how it’s done basically everywhere, including industrial-grade statistical languages like R…but what the paper is saying is, it’s wrong.

To understand why it’s wrong, you need to understand in detail how floating-point numbers are not mathematical real numbers. On one level this is obvious: an IEEE single occupies 32 bits and a double 64, so at most they could represent $2^{32}$ or $2^{64}$ different numbers, respectively; that isn’t even countably infinite. (A large chunk of the encoding is reserved for not a number values, so the true count is somewhat smaller.) But high-level programming languages encourage us to use floating point numbers as if they were the reals, and most of the time we’re working with values that are well-approximated by floats, and we get away with it.

The numbers that a binary floating-point format can represent are all of the form

$±m.mmm…m×2±e.eee…e$

where $±m.mmm…m$ (the mantissa) and $±e.eee…e$ (the exponent) are binary numbers of fixed length. For IEEE singles, the mantissa has 25 bits and the exponent 8, counting their respective signs in both cases, and there’s a clever trick (not relevant here) to fit that all into 32 bits total. This is just the scientific notation you probably learned in high school, in base 2 instead of base 10, and if the mantissa and exponent were infinitely wide it could represent all real numbers. However, they are finite, and this means not only that it can’t, but that the values it can are not uniformly distributed over the real line. Every time you pass a power of two, the exponent ticks up or down by one, but the mantissa stays the same length, so it loses or gains one bit of precision. Therefore, there are the same number of representable values between 1 and 2 as there are between 0.5 and 1, between 0.25 and 0.5, and so on, and each range’s representable numbers are twice as far apart as those in the next smaller range. Some of the numbers representable by an IEEE single-precision float (top row) and producible by dividing a 32-bit unsigned integer by 232−12^{32}-1 (bottom row).

Now, when you generate a random integer and divide it by the largest integer the RNG can produce, the result is uniformly distributed over $(0,\,1)$. If the largest integer the RNG can produce is $2^{32}-1$, as is often the case (even on systems with 64-bit integers), the gap between values producible by this method will be about $2 \times 10^{-10}$. The figure above compares the spacing of these values along the real line with the spacing of the values representable by an IEEE single-precision float. Only from 0.001953125 ($2^{-9}$) to 0.003906250 ($2^{-8}$) is there a one-to-one correspondence. Above $2^{-8}$, the representable numbers are more widely spaced than the values that the RNG will produce, which means clusters of RNG-produced values will get rounded to the same representable number, which introduces non-uniformity. Below $2^{-9}$, it’s the other way around: all of the RNG-produced values are exactly representable, but there are many more that can’t be produced. Downey estimates that over the entire range, only 7% of the numbers are producible.

The most significant gap is right at zero. The smallest number IEEE single can represent is on the order of $10^{-38}$ or $10^{-45}$ (depending on whether you allow denormals), and IEEE double goes all the way down to $2 \times 10^{-308}$ (or $5 \times 10^{-324}$ with denormals). But, the smallest number producible by the simple RNG method is $2 \times 10^{-10}$, dozens or hundreds of decimal orders of magnitude bigger. Downey suggests that this could cause serious problems for statistical algorithms, such as inverse transform sampling of the exponential distribution.

Downey’s proposed fix rests on the observation that a random number uniformly distributed over $(0,\,1)$ will be greater than 0.5 exactly half of the time; when it isn’t, it will be greater than 0.25 exactly half of the time; and so on. Therefore, his algorithm first selects an exponent by flipping coins in a loop—that is, drawing one bit at a time from the output of the RNG. The exponent starts out as −1, corresponding to a number between 0.5 and 1; each time the bit is 0, the exponent is reduced by 1 and another bit is drawn, until either a 1 is drawn or the exponent reaches its minimum allowable value. Then the algorithm fills in the mantissa with random bits. Finally, there’s a small correction: if the mantissa happens to come out zero, flip another bit and if it’s 1, increase the exponent by 1 again. This accounts for values that are exactly 1 over a power of two, which straddle a boundary between exponents and therefore can be selected from either side; without the correction they would be selected slightly more often than is appropriate.

It’s too bad Downey never finished this paper. The biggest missing piece is a clear and convincing demonstration that the naïve algorithm does introduce significant errors into common calculations. For exponential sampling by inverse transform, there is an inarguable error (the distribution is truncated on the right) but one could argue that it doesn’t rise to the level of significance because exponential deviates larger than 9.3 should only get drawn one time in $10^{10}$ or so. There are no other examples of potentially problematic tasks, and I am not enough of a statistician to think of any myself. IEEE double has enough mantissa, even in the range from 0.5 to 1, that it can represent every multiple of $2 \times 10^{-10}$, so nonuniformity does not occur if you’re generating doubles by the simple method, only missing values.

There are also a couple of practical problems with the algorithm. A potentially-lengthy run of coin tosses, with the requirement that every bit is independent and identically distributed, is poorly handled by many ordinary RNGs; I would only use this algorithm with a cryptographic RNG. Relatedly, on average the coin-tossing phase will terminate quickly, but if you do hit a long run of zeroes it’d be observably slower in that case. I don’t see a good way to implement the algorithm so that it uses a fixed number of random bits per float generated, though, short of generate all the coin tosses in advance which would consume 1078 bits of randomness for every IEEE double; this would probably be unacceptably slow overall.

RC4 is a very widely used stream cipher, developed in the late 1980s. As wikipedia puts it, RC4 is remarkable for its speed and simplicity in software, but it has weaknesses that argue against its use in new systems. Today’s paper demonstrates that it is even weaker than previously suspected.

To understand the paper you need to know how stream ciphers work. The core of a stream cipher is a random number generator. The encryption key is the starting seed for the random number generator, which produces a keystream—a sequence of uniformly random bytes. You then combine the keystream with your message using the mathematical XOR operation, and that’s your ciphertext. On the other end, the recipient knows the same key, so they can generate the same keystream, and they XOR the ciphertext with the keystream and get the message back.

If you XOR a ciphertext with the corresponding plaintext, you learn the keystream. This wouldn’t be a big deal normally, but, the basic problem with RC4 is that its keystream isn’t uniformly random. Some bytes of the keystream are more likely to take specific values than they ought to be. Some bytes are more likely to not take specific values. And some bytes are more likely to take the same value as another byte in the keystream. (The ABSAB bias, which is mentioned often in this paper, is an example of that last case: you have slightly elevated odds of encountering value A, value B, a middle sequence S, and then A and B again.) All of these are referred to as biases in the RC4 keystream.

To use RC4’s biases to decrypt a message, you need to get your attack target to send someone (not you) the same message many times, encrypted with many different keys. You then guess a keystream which exhibits as many of the biases as possible, and you XOR this keystream with all of the messages. Your guess won’t always be right, but it will be right slightly more often than chance, so the correct decryption will also appear slightly more often than chance. It helps that it doesn’t have to be exactly the same message. If there is a chunk that is always the same, and it always appears in the same position, that’s enough. It also helps if you already know some of the message; that lets you weed out bad guesses faster, and exploit more of the biases.

Asking the attack target to send the same message many times might seem ridiculous. But remember that many Internet protocols involve headers that are fixed or nearly so. The plaintext of a TKIP-encrypted message, for instance, will almost always be the WiFi encapsulation of an IPv4 packet. If you know the IP addresses of your target and of the remote host it’s communicating with, that means you know eight bytes of the message already, and they’ll always be the same and in the same position. The paper goes into some detail about how to get a Web browser to make lots and lots of HTTPS requests with a session cookie (always the same, but unknown—the secret you want to steal) in a predictable position, with known plaintext on either side of it.

All this was well-known already. What’s new in this paper is: first, some newly discovered biases in the RC4 keystream, and a statistical technique for finding even more; second, improved attacks on TKIP and TLS. Improved means that they’re easier to execute without anyone noticing, and that they take less time. Concretely, the best known cookie-stealing attack before this paper needed to see $13 \cdot 2^{30}$ HTTPS messages (that’s about 14 billion) and this paper cuts it to $9 \cdot 2^{27}$ messages (1.2 billion), which takes only 75 hours to run. That’s entering the realm of practical in real life, if only for a client computer that is left on all the time, with the web browser running.

It’s a truism in computer security (actually, security in general) that attacks only ever get better. What’s important, when looking at papers like this, is not so much the feasibility of any particular paper’s attack, but the trend. The first-known biases in RC4 keystream were discovered only a year after the algorithm was published, and since then there’s been a steady drumbeat of researchers finding more biases and more effective ways to exploit them. That means RC4 is no good, and everyone needs to stop using it before someone finds an attack that only takes a few minutes. Contrast the situation with AES, where there are no known biases, and fifteen years of people looking for some kind of attack has produced only a tiny improvement over brute force.

Advice to the general public: Does this affect you? Yes. What can you do about it? As always, make sure your Web browsers are fully patched up—current versions of Firefox, Chrome, and IE avoid using RC4, and upcoming versions will probably take it out altogether. Beyond that, the single most important thing for you to do is make sure everything you’ve got that communicates over WiFi—router, computers, tablets, smartphones, etc.—are all set to use WPA2 and CCMP only. (The configuration interface may refer to CCMP as AES-CCMP or just AES; in this context, those are three names for the same thing.) The alternatives WEP, WPA, and TKIP all unavoidably involve RC4 and are known to be broken to some extent. Any WiFi-capable widget manufactured after 2006 has no excuse for not supporting WPA2 with CCMP. It should definitely be possible to set your home router up this way; unfortunately, many client devices don’t expose the necessary knobs.

Coming up in this year’s CCS is a paper with the provocative subtitle /dev/random is not Robust, and thanks to widespread availability of online preprints, it’s already scored quite a bit of online attention, e.g. at Bruce Schneier’s blog and on Hacker News.

I’m going to pass over the alleged attacks on the Linux kernel CSPRNG, which are adequately covered in both of the above discussions; suffice it to say that in this paper robust is a precisely defined theoretical security property, and there’s a solid case that it’s stronger than necessary to guarantee practical security. What I’m interested in is the simple and very efficient PRNG construction that is provably robust, advertised in the abstract. CSPRNG-robustness (as defined here) may not be necessary for practical security, but if you can have it essentially for free, why wouldn’t you adopt it? Unfortunately it turns out that there’s a major palmed card in their definition of a CSPRNG with input, and another one in the construction itself, rendering the algorithm useless in practice.

The first problem is that the definition of a CSPRNG with input includes a setup phase that produces a random seed which is not known to the adversary. Despite consistently calling this item a public parameter, the text of the paper makes it clear that it must not be known to the adversary, and they consider this both essential and unproblematic:

Unfortunately, it is well known that no deterministic extractor is capable to simultaneously extract good randomness from all efficiently samplable high-entropy distributions (e.g. consider nearly full entropy distribution $I$ which is random, except the first bit of $\operatorname{Extract}(I)$ is zero).

… [Therefore], we chose … to assume the existence of the setup procedure … this will allow one to consider a seeded extractor … which can now extract entropy from all high-entropy distributions. As a warning, this nice extra generality comes at a price that the [seed] is not passed to the [adversarial] distribution sampler …

But this is impossible to achieve in real life. If the seed is generated on the device that will use it, we have a chicken-and-egg problem, because initial startup is precisely when environmental randomness is least available; the authors should have been aware of actual, catastrophic exploits of systems with little or no access to randomness on first boot, e.g. Mining your Ps and Qs. If the seed is not generated on the device that will use it, then the adversary can probably learn it; in particular, if the seed for all devices of a particular model is embedded in a firmware image that anyone can download off the manufacturer’s website, you have accomplished precisely nothing.

The second problem is in the technical details of their proposed CSPRNG construction, which I will now quote verbatim.

Let $\mathbf{G}: \{0,1\}^m \to \{0,1\}^{n+l}$ be a (deterministic) pseudorandom generator where $m < n$. We use the notation $[y]_1^m$ to denote the first $m$ bits of $y \in \{0,1\}^n$. Our construction of PRNG with input has parameters $n$ (state length), $l$ (output length), and $p = n$ (sample length), and is defined as follows:

• $\operatorname{setup}:$ Output $\text{seed} = (X, X^\prime)\leftarrow\{0,1\}^{2n}$.
• $S^\prime = \operatorname{refresh}(S,I):$ Given $\text{seed} = (X, X^\prime)$, current state $S \in \{0,1\}^n$, and a sample $I \in \{0,1\}^n$, output $S^\prime = S \cdot X + I$, where all operations are over $\mathbb{F}_{2^n}$.
• $(S^\prime, R) = \operatorname{next}(S):$ Given $\text{seed} = (X, X^\prime)$ and a state $S \in \{0,1\}^n$, first compute $U = [X^\prime \cdot S]_1^m$. Then output $(S^\prime, R) = \mathbf{G}(U)$.

It’s a blink-and-you-miss-it sort of thing: by all operations are over $\mathbb{F}_{2^n}$ they are specifying that both the refresh and next steps make use of arithmetic over finite, nonprime fields (more usually referred to as $\operatorname{GF}(2^n)$ in crypto literature). $n$ has to be large—section 6.1 of the paper suggests specific fields with $n$ = 489, 579, and 705. The known techniques for this kind of arithmetic are all very slow, require large look-up tables (and therefore permit side-channel attacks), or cannot be used for a software implementation.   This makes them impractical for a cryptographic component of an operating system that cannot rely on special-purpose hardware.

Both problems arise from the same source, namely wanting to build the algorithm around a keyed hash so that the entropy extractor is fully general, and then taking a mathematical construct with convenient theoretical but inconvenient practical properties as that hash. If we back down to the use of an unkeyed, but computationally secure, hash as the entropy extractor, we’d have to redo the analysis, but I think it ought to be possible to arrange that, without any secret unknown to the adversary, it is still computationally infeasible for the adversary to bias the entropy pool; and we could then pick that hash for its practical utility.

… Oddly enough, this hypothetical construction looks an awful lot like the existing Linux CSPRNG.