Today’s paper takes another shot at modeling how the physical topology of the Internet affects the security of Tor against passive adversaries with the ability to snoop on a lot of traffic. It’s by some of the same people who wrote Defending Tor from Network Adversaries and is quite closely related.

Most of the work of this paper goes into building a flexible, formal threat model, which Tor client software could (in principle) use to inform its routing decisions. Acknowledging that there’s always going to be a good deal of uncertainty about what adversaries are out there and what they are capable of, they make two key design decisions. The model is probabilistic (based on a Bayesian belief network), and it takes user input. For instance, if you have reason to think the government of Transbelvia has it in for you, you can instruct Tor to avoid paths that Transbelvia might be able to snoop on, and the model will expand that out to all the ways they might do that. Conversely, if you trust a particular organization you might like to preferentially use its guards or exit nodes, and it can do that too.

The model is very thorough about different ways a government might be able to snoop on network traffic—not just relays physically hosted in the country, but ASes and IXPs (Transbelvia hosts a major IXP for Eastern Europe), submarine cable landing sites (not relevant for a landlocked country), mutual legal assistance treaties (MLATs) which might be used to have another country do some snooping on Transbelvia’s behalf, and even hacking into and subverting routers at interesting points in the connectivity graph. (The pun in the title refers to their analysis of how MLATs could allow several of the usual suspects to snoop on 90+% of all submarine cable traffic, even though they host hardly any cable landings themselves.) Equally important, it can be expanded at need when new techniques for spying are revealed.

I think something like this is going to be an essential building block if we want to add any spy-aware routing algorithm to Tor, but I have two serious reservations. First, simplest, but less important, right now all Tor clients make routing decisions more-or-less the same way (there have been small changes to the algorithm over time, but everyone is strongly encouraged to stay close to the latest client release anyway, just because of bugs). If clients don’t all make routing decisions the same way, then that by itself might be usable to fingerprint them, and thus cut down the number of people who might’ve taken some action, from all Tor users to all Tor users who make routing decisions like THIS. If highly personalized threat models are allowed, the latter group might be just one person.

Second, and rather more serious, the user-input aspect of this system is going to require major user experience research and design to have any hope of not being worse than the problem it’s trying to solve. It’s not just a matter of putting a friendly face on the belief language (although that does need to happen)—the system will need to educate its users in the meaning of what it is telling them, and it will need to walk them through the consequences of their choices. And it might need to provide nudges if there’s a good reason to think the user’s assessment of their threat model is flat-out wrong (even just making that judgement automatically is fraught with peril—but so is not making that judgement).

I attended this year’s PETS conference a couple weeks ago, and for the next several weeks I’m going to be focusing on highlights from there. For variety’s sake, there will be one unrelated thing each week.

I’m starting with a paper which is heavier mathematical going than I usually pick. The author claims to have developed an optimal (meta-)strategy for obfuscating a person’s location before revealing it to some sort of service. The person wants to reveal enough information to get whatever utility the service provides, while not telling it exactly where they are, because they don’t entirely trust the service. The service, meanwhile, wants to learn as much as possible about the person’s location—it doesn’t matter why.

There are two standard mathematical techniques for obfuscating a location: differential privacy and distortion privacy. They both work by adding noise to a location value, but they use different metrics for how much the adversary can learn given the noisy value. The paper asserts: differential privacy is sensitive to the likelihood of observation given data, distortion privacy is sensitive to the joint probability of observation given data. Thus, by guaranteeing both, we encompass all the defense that is theoretically possible. (This is a bold claim and I don’t know enough about this subfield to verify it.) The paper proceeds to demonstrate, first, that you can construct a linear-programming problem whose solution gives an appropriate amount of noise to add to a location value, and second, that this solution is game-theoretically optimal, even against an adversary who knows your strategy and can adapt to it. Then they analyze the degree to which the data has to be obfuscated, comparing this with more limited algorithms from previous literature.

I unfortunately couldn’t make any sense of the comparisons, because the figures are poorly labeled and explained. And the bulk of the paper is just mathematical derivations, to which I have nothing to add. But there is an important point I want to highlight. The game-theoretic analysis only works if everything the adversary already knows (before they learn the obfuscated location) is captured by a distance metric which defines how far the obfuscated location has been displaced. Mathematically, this is fine; any given fact that the adversary might know about the person revealing their location can be modeled by adjusting the metric. (Person is in a city, so their path is constrained to the street grid. Person just left a large park, there is only one coffee shop within radius X of that park. Person never goes to that restaurant.) However, the system doing the obfuscation might not be able to predict all of those factors, and if it distorts the metric too much it might contradict itself (person cannot possibly have walked from A to B in the time available). In the future I’d like to see some analysis of this. What is it plausible for the adversary to know? At what point does the obfuscation break down because it can’t cover all the known facts without contradicting itself?

Commercial intrusion detection systems are designed for corporate networks; they almost always assume a small number of choke points between internal and external networks, and often they also assume centralized control of all the devices on the internal network. Neither assumption is valid for a peer-to-peer overlay network, where there are typically a large number of mutually distrusting human agencies operating a small number of network peers each, and the routes between them are diverse.

It might seem that in the peer-to-peer environment, each node would have no choice but to run its own IDS. However, if we are willing to assume some degree of trust vested in other node operators, perhaps the task could be delegated. That’s the germ of this paper. For an idealized peer-to-peer network, they derive a game-theoretically optimal strategy for rotating the job of running the IDS around all the super-peers (long-lived nodes with extra responsibilities; many real P2P networks have such nodes).

I like the concept, but the idealized scenario they used may be too idealized to be applicable in real life. Key assumptions which probably don’t hold include:

• The attacker does not already control any super-peers.
• The IDS is perfect: that is, if attack traffic passes through a node running an IDS, the attack will be detected and blocked.
• The attacker’s goal is to take control of, or deny availability of, a specific set of super-peers.
• The defender can predict in advance which nodes will be attacked. (I would accept this if it were probabilistic, e.g. assuming that the attacker is more likely to target nodes that contribute more to to the overall network capacity.)

I think a more realistic model would go something like this: The attacker is assumed already to control some fraction of the super-peers. (The attacker may also mount attacks from other computers, either independently or in collaboration with malicious super-peers.) The attacker seeks to avoid detection, and so does not mount overt attacks on other super-peers; instead, it has some strategy for violating the protocol to achieve an adversarial goal (e.g. forging blockchain transactions, deanonymizing users, delivering false data to users) The malicious peers execute the protocol honestly most of the time, but sometimes break the rules. The defender’s goal is to detect peers that are violating the protocol often enough that this can’t be an accident, while not wasting too many resources on monitoring overhead.

Note: This paper is said to have been published in the International Conference on Secure Knowledge Management in Big-data era, 2014 but I cannot confirm this, as the conference website no longer exists and the Internet Archive’s copy does not include a program.

At this year’s LangSec Workshop, a paper on formal analysis of cases where compiled-code optimizations can introduce security holes that weren’t present in the source code. This class of problems has been a concern for some time; for a general overview, see this series of posts on John Regehr’s blog: [1] [2] [3]. I’ll give one concrete example, which happens to be the running example in the paper.

void crypt (...)
{
...;          // work with the secure key
key = 0x0;    // scrub memory
}

No correct C program can observe values in local variables after a function returns, so the compiler is very likely to delete the final store to key as unnecessary (dead in compiler jargon). However, the programmer’s intent was to ensure that the secret value in key does not survive in memory after crypt returns, where it might be accessible to malicious code—the adversary is not bound by the definition of correct C program. By removing the dead store, the compiler introduces a security hole.

This paper proposes a general strategy for analyzing compiler optimizations to determine whether or not they can introduce security holes. They observe that the definition of correct C program is in terms of an abstract machine that ignores or leaves undefined many low-level nuances. Continuing with the above example, the C standard says only that if code executing in the same program attempts to access the memory allocated to key after crypt returns, the behavior is undefined—anything at all can happen. In particular, reading from that memory might legitimately return either zero or the secret key … or some other value altogether, or it might crash the program. And the standard takes no notice at all of the possibility that another program (such as a debugger, or malware) might gain access to this program’s memory.

The real computer on which the program is executing, by contrast, exhibits some concrete and (usually) deterministic behavior for any arbitrary sequence of machine operations. Even when the architecture specification leaves an outcome undefined, some concrete thing will happen on a physical CPU and it will usually be the same concrete thing for all instances of that model of CPU. This paper’s proposal is simply that compiler optimizations have the potential to introduce security holes whenever they change the observable behavior of the physical machine. Therefore, a correctness proof for an optimization can be converted into a does not introduce security holes proof by changing its model from the abstract machine to a physical machine. In the crypt example, deleting the final store to key is invisible in the abstract machine, and therefore a valid optimization of the original code. But in the physical machine, it means the previous, secret value persists on the stack (a concept which doesn’t even exist in the C abstract machine) and may be retrievable later. If the stack is included in the machine model, the original correctness proof for dead-store elimination will detect the information leak.

Dead-store elimination is well understood as a source of this kind of problem, and library-level band-aids exist. I suspect the greatest value of this technique will be in identifying other optimizations that can potentially introduce information leaks. There is some discussion of this in the paper (common subexpression elimination as a source of timing-channel vulnerabilities) but they do not take it all the way to a model. It would be interesting to know, for instance, whether the scary-sophisticated algebraic loop-nest transformations that you might want applied to your cryptographic primitives can introduce data-dependent timing variation that wasn’t present in the source code.

Sadly, this technique by itself is not enough to prevent compiler-introduced vulnerabilities; a dead-store elimination pass that was valid in the physical machine for all memory writes would wind up doing almost nothing, which is not what we want. (The authors acknowledge this, implicitly, when they talk about the undesirability of turning all optimizations off.) To some extent, tightening up the language specification—making the abstract machine exhibit less undefined behavior—can help, by making the effects of optimization more predictable. John Regehr has a proposal for Friendly C along those lines; I think it throws several babies out with the bathwater, but as a starting point for discussion it’s worthwhile. It also won’t cure all the problems in this area: another example from the paper is

int *alloc(int nrelems)
{
int size = nrelems * sizeof(int);  // may overflow
if (size < nrelems) abort();       // attempt to check for overflow
return malloc(size);
}

According to the rules of the C abstract machine, signed integer overflow causes undefined behavior, which means the compiler is entitled to assume that the comparison size < nrelems is always false. Unfortunately, making signed integer overflow well-defined does not render this code safe, because the check itself is inadequate: fixed-width twos-complement wraparound multiplication (which is the behavior of the integer multiply instruction on most CPUs) can produce values that are larger than either multiplicand but smaller than the mathematically correct result, even when one multiplicand is a small number. (In this example, with the usual sizeof(int) == 4, passing 1,431,655,766 for nrelems would produce 1,431,655,768 for size when it should be 5,726,623,064.) For the same reason, making all the variables unsigned does not help.

On a higher level, compiler authors will fight tooth and nail for aggressive optimization, because compiler authors (kinda by definition) like writing optimizations, but also because they’ve got to satisfy three conflicting user groups at once, and the latter two are more numerous and have deeper pockets:

1. Systems programmers, who want their C to be translated directly and transparently to machine language, and believe that undefined behavior exists to allow variation in CPU behavior but not compiler behavior.

2. Applications programmers, who want the compiler to crush as many abstraction penalties as possible out of their gigantic white-elephant C++ codebases which there will never be time or budget to clean up.

3. Number crunchers, who care about four things: speed, speed, not having to code in FORTRAN or assembly language anymore, and speed.

And what makes this extra fun is that cryptographers are simultaneously in group 1 and group 3. I’m not going to make ETAPS 2015, but I am very curious to know what DJB is going to say about the death of optimizing compilers there.

Coming up in this year’s CCS is a paper with the provocative subtitle /dev/random is not Robust, and thanks to widespread availability of online preprints, it’s already scored quite a bit of online attention, e.g. at Bruce Schneier’s blog and on Hacker News.

I’m going to pass over the alleged attacks on the Linux kernel CSPRNG, which are adequately covered in both of the above discussions; suffice it to say that in this paper robust is a precisely defined theoretical security property, and there’s a solid case that it’s stronger than necessary to guarantee practical security. What I’m interested in is the simple and very efficient PRNG construction that is provably robust, advertised in the abstract. CSPRNG-robustness (as defined here) may not be necessary for practical security, but if you can have it essentially for free, why wouldn’t you adopt it? Unfortunately it turns out that there’s a major palmed card in their definition of a CSPRNG with input, and another one in the construction itself, rendering the algorithm useless in practice.

The first problem is that the definition of a CSPRNG with input includes a setup phase that produces a random seed which is not known to the adversary. Despite consistently calling this item a public parameter, the text of the paper makes it clear that it must not be known to the adversary, and they consider this both essential and unproblematic:

Unfortunately, it is well known that no deterministic extractor is capable to simultaneously extract good randomness from all efficiently samplable high-entropy distributions (e.g. consider nearly full entropy distribution $I$ which is random, except the first bit of $\operatorname{Extract}(I)$ is zero).

… [Therefore], we chose … to assume the existence of the setup procedure … this will allow one to consider a seeded extractor … which can now extract entropy from all high-entropy distributions. As a warning, this nice extra generality comes at a price that the [seed] is not passed to the [adversarial] distribution sampler …

But this is impossible to achieve in real life. If the seed is generated on the device that will use it, we have a chicken-and-egg problem, because initial startup is precisely when environmental randomness is least available; the authors should have been aware of actual, catastrophic exploits of systems with little or no access to randomness on first boot, e.g. Mining your Ps and Qs. If the seed is not generated on the device that will use it, then the adversary can probably learn it; in particular, if the seed for all devices of a particular model is embedded in a firmware image that anyone can download off the manufacturer’s website, you have accomplished precisely nothing.

The second problem is in the technical details of their proposed CSPRNG construction, which I will now quote verbatim.

Let $\mathbf{G}: \{0,1\}^m \to \{0,1\}^{n+l}$ be a (deterministic) pseudorandom generator where $m < n$. We use the notation $[y]_1^m$ to denote the first $m$ bits of $y \in \{0,1\}^n$. Our construction of PRNG with input has parameters $n$ (state length), $l$ (output length), and $p = n$ (sample length), and is defined as follows:

• $\operatorname{setup}:$ Output $\text{seed} = (X, X^\prime)\leftarrow\{0,1\}^{2n}$.
• $S^\prime = \operatorname{refresh}(S,I):$ Given $\text{seed} = (X, X^\prime)$, current state $S \in \{0,1\}^n$, and a sample $I \in \{0,1\}^n$, output $S^\prime = S \cdot X + I$, where all operations are over $\mathbb{F}_{2^n}$.
• $(S^\prime, R) = \operatorname{next}(S):$ Given $\text{seed} = (X, X^\prime)$ and a state $S \in \{0,1\}^n$, first compute $U = [X^\prime \cdot S]_1^m$. Then output $(S^\prime, R) = \mathbf{G}(U)$.

It’s a blink-and-you-miss-it sort of thing: by all operations are over $\mathbb{F}_{2^n}$ they are specifying that both the refresh and next steps make use of arithmetic over finite, nonprime fields (more usually referred to as $\operatorname{GF}(2^n)$ in crypto literature). $n$ has to be large—section 6.1 of the paper suggests specific fields with $n$ = 489, 579, and 705. The known techniques for this kind of arithmetic are all very slow, require large look-up tables (and therefore permit side-channel attacks), or cannot be used for a software implementation. [1] [2] This makes them impractical for a cryptographic component of an operating system that cannot rely on special-purpose hardware.

Both problems arise from the same source, namely wanting to build the algorithm around a keyed hash so that the entropy extractor is fully general, and then taking a mathematical construct with convenient theoretical but inconvenient practical properties as that hash. If we back down to the use of an unkeyed, but computationally secure, hash as the entropy extractor, we’d have to redo the analysis, but I think it ought to be possible to arrange that, without any secret unknown to the adversary, it is still computationally infeasible for the adversary to bias the entropy pool; and we could then pick that hash for its practical utility.

… Oddly enough, this hypothetical construction looks an awful lot like the existing Linux CSPRNG.