Papers published in 2015

Automated Experiments on Ad Privacy Settings

You’ve probably noticed the creepy effect where you consider buying something online, or maybe just look at a page for something that happens to be for sale, and for weeks afterward you get ads on totally unrelated websites for that thing or similar things. The more reputable online ad brokerages offer a degree of control over this effect (e.g.: Google, Microsoft). This study investigates exactly what effect those settings have on the ads observed by automated browsing agents. The basic idea is to set some of the knobs, visit websites that will tell the ad provider something about the simulated customer’s preferences, possibly adjust the knobs again, and finally record what is being advertised on a general-interest website.

A great deal of the paper is devoted to statistical methodology. Because the ad provider is a stateful black box, and one whose behavior may depend on uncontrollable external factors (e.g. that advertiser has exhausted their budget for the month), it’s vital to avoid as many statistical assumptions as possible. They use permutation tests and supervised classification (logistic regression), both of which make minimal assumptions. They’re also very careful about drawing conclusions from their results. I’m not much of a statistician, but it all sounds carefully thought out and plausible, with one exception: heavy reliance on significance testing, which has come in for serious criticism [1] to the point where some journals no longer accept its use at all [2]. This is exactly the sort of research where p-values can mislead; if I were reviewing this prior to publication I would have encouraged the authors to think about how they could present the same conclusions without using significance testing.

Now, the actual conclusions. Only Google’s ads were tested. (Expanding the tests to additional ad brokers is listed as future work.) They confirm that turning a particular topic (dating) off in the preferences does cause those ads to go away. They observe that two highly sensitive topics (substance abuse, disability) that do trigger targeted ads are not controllable via the preferences; in fact, they are completely invisible on that screen. And the most interesting case is when they set the ad preferences to explicitly reveal a gender (man or woman) then browsed a bunch of sites related to job searching. Simulated customers who claimed to be men got ads for a career coaching service which promised better odds of being hired into $200K+ executive positions; those who claimed to be women did not see these ads.

This last example clearly reflects the well-known glass ceiling which affects business in general, but (as the authors point out) it’s impossible to tell, from outside the black box, why it shows up in this case. The career-coaching service could have chosen to advertise only to men. Google’s ad-targeting algorithm could have been coded with (likely unconscious) bias in its category structure, or—this is the most probable explanation—its feedback mechanism could have detected that men are more likely to click on ads with those keywords, so it makes that even more likely by showing them preferentially to men. There’s a telling comment at the very end of the paper:

… We consider it more likely that Google has lost control over its massive, automated advertising system. Even without advertisers placing inappropriate bids, large-scale machine learning can behave in unexpected ways.

There’s a lesson here for all the big data companies: the best an unbiased machine learning system can hope to do is produce an accurate reflection of the training set—including whatever biases are in there. If you want to avoid reduplicating all the systemic biases of the offline world, you will have to write code to that effect.

Tor’s Usability for Censorship Circumvention

This is a report on a pilot usability study. The authors ran five journalists (there aren’t any more details than that) through the process of installing, activating, and using the Tor Browser for a small number of canned tasks, identifying a number of problems:

… people did have difficulty with installing Tor Browser (principally because of the Gatekeeper code-signing feature on OS X), did not understand what many of the many options meant, and were confused about why certain things were happening.

They are going to do a much larger study, and were soliciting feedback on experimental design. I have only two things to say. First, the proposal is to do a large test of 200 users and then, presumably, start making changes to the software to improve usability. The problem with this is, it is very likely that subtle (yet serious) UX issues are being masked out by the more blatant ones: no matter how many people you experiment on, you won’t detect the subtle problems until the blatant ones are fixed. Therefore, it would be far more valuable to do a series of smaller user studies, improving the software based on the results of each study before doing the next one. This strategy also ensures that the research results do get incorporated into the product, rather than being lost in the shuffle once the paper is published.

The other point is more of a hypothesis about what would be good to aim for. To use Tor in a way that genuinely improves your security outcomes, you need to understand what it is doing and why, and to do that you have to wrap your head around some concepts that may be unfamiliar—especially if you haven’t previously needed to understand the Internet itself in any kind of detail. (For instance, the fact that every IP packet is labeled with its source and destination is obvious once you think about it, but I never thought about it to a lot of people.) There probably needs to be a training manual, and this manual needs to take the attitude that yeah, this is a little tricky, and you have to think about it some, but don’t panic, you can understand it. Shoot for the we understand tone said to characterize Rust compiler errors (warning: Reddit). The place I’ve seen this done best, personally, was the tutorial and concepts guide for GnuCash, which took just this tone with regard to double-entry bookkeeping—also somewhat notorious for its inscrutability. (Note: I read this a long time ago, and I don’t know whether its current edition is still like that.)

The Harmful Consequences of Postel’s Maxim

Postel’s Maxim of protocol design (also known as the Robustness Principle or the Internet Engineering Principle) is Be liberal in what you accept, conservative in what you send. It was first stated as such (by Jon Postel) in the 1979 and 1980 specifications (e.g. RFC 760) of the protocol that we now call IPv4. [1] It’s been tremendously influential, for instance quoted as an axiom in Tim Berners-Lee’s design principles for the Web [2] but has also come in for a fair bit of criticism [3] [4]. (An expanded version of the principle, in RFC 1122, anticipates many of these criticisms and is well worth reading if you haven’t.) Now we have an Internet-Draft arguing that it is fatally flawed:

… there are negative long-term consequences to interoperability if an implementation applies Postel’s advice. Correcting the problems caused by divergent behavior in implementations can be difficult or impossible.

and arguing that instead

Protocol designs and implementations should be maximally strict.

Generating fatal errors for what would otherwise be a minor or recoverable error is preferred, especially if there is any risk that the error represents an implementation flaw. A fatal error provides excellent motivation for addressing problems.

The primary function of a specification is to proscribe behavior in the interest of interoperability.

This is the first iteration of an Internet-Draft, so it’s not intended to be done, so rather than express an opinion as such, I want to put forward some examples of real-world situations from the last couple decades of Internet protocol design that the author may or may not have considered, and ask how he feels they should be / have been handled. I also invite readers to suggest further examples where strictness, security, upward compatibility, incremental deployment, ergonomics, and so on may be in tension.

  • The original IP and TCP (v4) specifications left a number of bits reserved in their respective packet headers. In 2001 the ECN specification gave meaning to some of those bits. It was almost immediately discovered that many intermediate routers would silently discard packets with the ECN bits set; in consequence, fourteen years later ECN is still quite rarely used, even though there are far fewer such routers than there were in 2001. [5] [6]

  • Despite the inclusion of a protocol version number in SSL/TLS, and a clear specification of how servers were supposed to react to clients offering a newer protocol than they supported, servers that drop connections from too-new clients are historically quite common, so until quite recently Web browsers would retry such connections with an older protocol version. This enables a man-in-the-middle to force negotiation of an old, possibly insecure version, even if both sides support something better. [7] [8] [9] Similar to the ECN situation, this problem was originally noticed in 2001 and continues to be an issue in 2015.

  • Cryptographic protocols (such as TLS) can be subverted—and I mean complete breach of confidentiality subverted—if they reveal why a message failed to decrypt, or how long it took to decrypt / fail to decrypt a message, to an attacker that can forge messages. [10] [11] To close these holes it may be necessary to run every message through the complete decryption process even if you already know it’s going to fail.

  • In the interest of permitting future extensions, HTML5 [12] and CSS [13] take pains to specify exact error recovery behavior; the idea is that older software will predictably ignore stuff it doesn’t understand, so that authors can be sure of how their websites will look in browsers that both do and don’t implement each shiny new feature. However, this means you can predict how the CSS parser will parse HTML (and vice versa). And in conjunction with the general unreliability of MIME types it means you used to be able to exploit that to extract information from a document you shouldn’t be able to read. [14] (Browsers fixed this by becoming pickier about MIME types.)

An Automated Approach for Complementing Ad Blockers’ Blacklists

Last week’s long PETS paper was very abstract; this week’s paper is very concrete. The authors are concerned that manually-curated blacklists, as currently used by most ad-blocking software, cannot hope to keep up with the churn in the online ad industry. (I saw a very similar talk at WPES back in 2012 [1] which quoted the statistic that the default AdBlock Plus filter list contains 18,000 unique URLS, with new ones added at a rate of five to 15 every week.) They propose to train a machine classifier on network-level characteristics that differ between ad services and normal web sites, to automate detection of new ad providers and/or third-party privacy-invasive analytics services. (This is the key difference from the paper at WPES 2012: that project used static analysis of JavaScript delivered by third-party services to extract features for their classifier.)

They find that a set of five features provides a reasonably effective classification: proportion of requests that are third-party (for transclusion into another website), number of unique referrers, ratio of received to sent bytes, and proportion of requests including cookies. For the training set they used, they achieve 83% precision and 85% recall, which is reasonable for a system that will be used to identify candidates for manual inspection and addition to blacklists.

There are several methodological bits in here which I liked. They use entropy-based discretization and information gain to identify valuable features and discard unhelpful ones. They compare a classifier trained on manually-labeled data (from a large HTTP traffic trace) with a classifier trained on the default AdBlock Plus filter list; both find similar features, but the ABP filter list has better coverage of infrequently used ads or analytics services, whereas the manually labeled training set catches a bunch of common ads and analytics services that ABP missed.

One fairly significant gap is that the training set is limited to cleartext HTTP. There’s a strong trend nowadays toward HTTPS for everything, including ads, but established ad providers are finding it difficult to cut all their services over efficiently, which might provide an opportunity for new providers—and thus cause a shift toward providers that have been missed by the blacklists.

There’s almost no discussion of false positives. Toward the end there is a brief mention of third-party services like Gravatar and Flattr, that share a usage pattern with ads/analytics and showed up as false positives. But it’s possible to enumerate common types of third-party services (other than ads and analytics) a priori: outsourced commenting (Disqus, hypothes.is), social media share buttons (Facebook, Twitter), shared hosting of resources (jQuery, Google Fonts), static-content CDNs, etc. Probably, most of these are weeded out by the ratio of received to sent bytes check, but I would still have liked to see an explicit check of at least a few of these.

And finally, nobody seems to have bothered talking to the people who actually maintain the ABP filter lists to find out how they do it. (I suspect it relies strongly on manual, informal reporting to a forum or something.) If this is to turn into anything more than an experiment, people need to be thinking about integration and operationalization.

Tensions and frictions in researching activists’ digital security and privacy practices

Presentations at HotPETS are not as technical as those at PETS proper, and are chosen for relevance, novelty, and potential to generate productive discussion. The associated papers are very short, and both are meant only as a starting point for a conversation. Several of this year’s presentations were from people and organizations that are concerned with making effective use of technology we already have, rather than building new stuff, and with highlighting the places where actual problems are not being solved.

The Tactical Technology Collective (tactical tech.org, exposing the invisible.org; if you don’t have time to watch a bunch of videos, I strongly recommend at least reading the transcribed interview with Jesus Robles Maloof) is a practitioner NGO working on digital security and privacy, with deep connections to activist communities around the world. What this means is that they spend a lot of time going places and talking to people—often people at significant risk due to their political activities—about their own methods, the risks that they perceive, and how existing technology does or doesn’t help.

Their presentation was primarily about the tension between research and support goals from the perspective of an organization that wants to do both. Concretely: if you are helping activists do their thing by training them in more effective use of technology, you’re changing their behavior, which is what you want to study. (It was not mentioned, and they might not know it by this name, but I suspect they would recognize the Hawthorne Effect in which being the subject of research also changes people’s behavior.) They also discussed potential harm to activists due to any sort of intervention from the outside, whether that is research, training, or just contact, and the need for cultural sensitivity in working out the best way to go about things. (I can’t find the talk now—perhaps it was at the rump session—but at last year’s PETS someone mentioned a case where democracy activists in a Southeast Asian country specifically did not want to be anonymous, because it would be harder for the government to make them disappear if they were known by name to the Western media.)

There were several other presentations on related topics, and technology developers and researchers aren’t paying enough attention to what activists actually need has been a refrain, both at PETS and elsewhere, for several years now. I’m not sure further reiteration of that point helps all that much. What would help? I’m not sure about that either. It’s not even that people aren’t building the right tools, so much, as that the network effects of the wrong tools are so dominant that the activists have to use them even though they’re dangerous.

Cache Timing Attacks Revisited: Efficient and Repeatable Browser History, OS, and Network Sniffing

Cache timing attacks use the time some operation takes to learn whether or not a datum is in a cache. They’re inherently hard to fix, because the entire point of a cache is to speed up access to data that was used recently. They’ve been studied as far back as Multics. [1] In the context of the Web, the cache in question (usually) is the browser’s cache of resources retrieved from the network, and the attacker (usually) is a malicious website that wants to discover something about the victim’s activity on other websites. [2] [3] This paper gives examples of using cache timing to learn information needed for phishing, discover the victim’s location, and monitor the victim’s search queries. It’s also known to be possible to use the victim’s browsing habits as an identifying fingerprint [4].

The possibility of this attack is well-known, but to date it has been dismissed as an actual risk for two reasons: it was slow, and probing the cache was a destructive operation, i.e. the attacker could only probe any given resource once, after which it would be cached whether or not the victim had ever loaded it. This paper overcomes both of those limitations. It uses Web Workers to parallelize cache probing, and it sets a very short timeout on all its background network requests—so short that the request can only succeed if it’s cached. Otherwise, it will be cancelled and the cache will not be perturbed. (Network requests will always take at least a few tens of milliseconds, whereas the cache can respond in less than a millisecond.) In combination, these achieve two orders of magnitude speedup over the best previous approach, and, more importantly, they render the attack repeatable.

What do we do about it? Individual sites can mark highly sensitive data not to be cached at all, but this slows the web down, and you’ll never get everyone to do it for everything that could be relevant. Browsers aren’t about to disable caching; it’s too valuable. A possibility (not in this paper) is that we could notice the first time a resource was loaded from a new domain, and deliberately delay satisfying it from the cache by approximately the amount of time it took to load off the network originally. I’d want to implement and test that to make sure it didn’t leave a hole, but it seems like it would allow us to have the cake and eat it.

In closing, I want to point out that this is a beautiful example of the maxim that attacks always get better. Cryptographers have been banging that drum for decades, trying to get people to move away from cipher primitives that are only a little weak before it’s too late, without much impact. The same, it seems, goes for information leaks.

Privacy Games: Optimal User-Centric Data Obfuscation

I attended this year’s PETS conference a couple weeks ago, and for the next several weeks I’m going to be focusing on highlights from there. For variety’s sake, there will be one unrelated thing each week.

I’m starting with a paper which is heavier mathematical going than I usually pick. The author claims to have developed an optimal (meta-)strategy for obfuscating a person’s location before revealing it to some sort of service. The person wants to reveal enough information to get whatever utility the service provides, while not telling it exactly where they are, because they don’t entirely trust the service. The service, meanwhile, wants to learn as much as possible about the person’s location—it doesn’t matter why.

There are two standard mathematical techniques for obfuscating a location: differential privacy and distortion privacy. They both work by adding noise to a location value, but they use different metrics for how much the adversary can learn given the noisy value. The paper asserts: differential privacy is sensitive to the likelihood of observation given data, distortion privacy is sensitive to the joint probability of observation given data. Thus, by guaranteeing both, we encompass all the defense that is theoretically possible. (This is a bold claim and I don’t know enough about this subfield to verify it.) The paper proceeds to demonstrate, first, that you can construct a linear-programming problem whose solution gives an appropriate amount of noise to add to a location value, and second, that this solution is game-theoretically optimal, even against an adversary who knows your strategy and can adapt to it. Then they analyze the degree to which the data has to be obfuscated, comparing this with more limited algorithms from previous literature.

I unfortunately couldn’t make any sense of the comparisons, because the figures are poorly labeled and explained. And the bulk of the paper is just mathematical derivations, to which I have nothing to add. But there is an important point I want to highlight. The game-theoretic analysis only works if everything the adversary already knows (before they learn the obfuscated location) is captured by a distance metric which defines how far the obfuscated location has been displaced. Mathematically, this is fine; any given fact that the adversary might know about the person revealing their location can be modeled by adjusting the metric. (Person is in a city, so their path is constrained to the street grid. Person just left a large park, there is only one coffee shop within radius X of that park. Person never goes to that restaurant.) However, the system doing the obfuscation might not be able to predict all of those factors, and if it distorts the metric too much it might contradict itself (person cannot possibly have walked from A to B in the time available). In the future I’d like to see some analysis of this. What is it plausible for the adversary to know? At what point does the obfuscation break down because it can’t cover all the known facts without contradicting itself?

Keys Under Doormats: Mandating insecurity by requiring government access to all data and communications

Today’s paper is not primary research, but an expert opinion on a matter of public policy; three of its authors have posted their own summaries [1] [2] [3], the general press has picked it up [4] [5] [6] [7] [8], and it was mentioned during Congressional hearings on the topic [9]. I will, therefore, only briefly summarize it, before moving on to some editorializing of my own. I encourage all of you to read the paper itself; it’s clearly written, for a general audience, and you can probably learn something about how to argue a position from it.

The paper is a direct response to FBI Director James Comey, who has for some time been arguing that data storage and communications systems must be designed for exceptional access by law enforcement agencies (quote from paper); his recent Lawfare editorial can be taken as representative. British politicians have also been making similar noises (see the above general-press articles). The paper, in short, says that this would cause much worse technical problems than it solves, and that even if, by some magic, those problems could be avoided, it would still be a terrible idea for political reasons.

At slightly more length, exceptional access means: law enforcement agencies (like the FBI) and espionage agencies (like the NSA) want to be able to wiretap all communications on the ’net, even if those communications are encrypted. This is a bad idea technically for the same reasons that master-key systems for doors can be more trouble than they’re worth. The locks are more complicated, and easier to pick than they would be otherwise. If the master key falls into the wrong hands you have to change all the locks. Whoever has access to the master keys can misuse them—which makes the keys, and the people who control them, a target. And it’s a bad idea politically because, if the USA gets this capability, every other sovereign nation gets it too, and a universal wiretapping capability is more useful to a totalitarian state that wants to suppress the political opposition, than it is to a detective who wants to arrest murderers. I went into this in more detail toward the end of my review of RFC 3514.

I am certain that James Comey knows all of this, in at least as much detail as it is explained in the paper. Moreover, he knows that the robust democratic debate he calls for already happened, in the 1990s, and wiretapping lost. [10] [11] [12] Why, then, is he trying to relitigate it? Why does he keep insisting that it must somehow be both technically and politically feasible, against all evidence to the contrary? Perhaps most important of all, why does he keep insisting that it’s desperately important for his agency to be able to break encryption, when it was only an obstacle nine times in all of 2013? [13]

On one level, I think it’s a failure to understand the scale of the problems with the idea. On the technical side, if you don’t live your life down in the gears it’s hard to bellyfeel the extent to which everything is broken and therefore any sort of wiretap requirement cannot help but make the situation worse. And it doesn’t help, I’m sure, that Comey has heard (correctly) that what he wants is mathematically possible, so he thinks everyone saying this is impossible is trying to put one over on him, rather than just communicate this isn’t practically possible.

The geopolitical problems with the idea are perhaps even harder to convey, because the Director of the FBI wrestles with geopolitical problems every day, so he thinks he does know the scale there. For instance, the paper spends quite some time on a discussion of the jurisdictional conflict that would naturally come up in an investigation where the suspect is a citizen of country A, the crime was committed in B, and the computers involved are physically in C but communicate with the whole world—and it elaborates from there. But we already have treaties to cover that sort of investigation. Comey probably figures they can be bent to fit, or at worst, we’ll have to negotiate some new ones.

If so, what he’s missing is that he’s imagining too small a group of wiretappers: law enforcement and espionage agencies from countries that are on reasonably good terms with the USA. He probably thinks export control can keep the technology out of the hands of countries that aren’t on good terms with the USA (it can’t) and hasn’t even considered non-national actors: local law enforcement, corporations engaged in industrial espionage, narcotraficantes, mafiosi, bored teenagers, Anonymous, religious apocalypse-seekers, and corrupt insiders in all the above. People the USA can’t negotiate treaties with. People who would already have been thrown in jail if anyone could make charges stick. People who may not even share premises like what good governance or due process of law or basic human decency mean. There are a bit more than seven billion people on the planet today, and this is the true horror of the Internet: roughly 40% of those people [14] could, if they wanted, ruin your life, right now. It’s not hard. [15] (The other 60% could too, if they could afford to buy a mobile, or at worst a satellite phone.)

But these points, too, have already been made, repeatedly. Why does none of it get through? I am only guessing, but my best guess is: the War On Some Drugs [16] and the aftermath of 9/11 [17] (paywalled, sorry; please let me know if you have a better cite) have both saddled the American homeland security complex with impossible, Sisyphean labors. In an environment where failure is politically unacceptable, yet inevitable, the loss of any tool—even if it’s only marginally helpful—must seem like an existential threat. To get past that, we would have to be prepared to back off on the must never happen again / must be stopped at all cost posturing; the good news is, that has an excellent chance of delivering better, cheaper law enforcement results overall. [18]

Security Analysis of Consumer-Grade Anti-Theft Solutions Provided by Android Mobile Anti-Virus Apps

Today we have another analysis of Android device security against scenarios where the owner loses control of the phone, by the same researchers who wrote Security Analysis of Android Factory Resets. Here, instead of looking at what happens when the owner deliberately erases a phone in order to sell it, they study what happens when the phone is stolen and the owner tries to wipe or disable it remotely. A remote disable mechanism is commonly included in anti-virus programs for Android, and this study looks at ten such mechanisms.

As with factory reset, the core finding boils down to none of them work. The root causes are slightly different, though. The core of Android, naturally, is at pains to prevent normal applications from doing something as destructive as initiating a memory wipe, rendering the phone unresponsive to input, or changing passwords. To be properly effective the anti-theft program needs administrative privileges, which can be revoked at any time, so there’s an inherent race between the owner activating the anti-theft program and the thief disabling it. To make matters worse, UX bugs make it easy for these programs to appear to be installed correctly but not have the privileges they need; implementation bugs (possibly caused by unclear Android documentation) may leave loopholes even when the program was installed correctly; and several device vendors added backdoor capabilities (probably for in-store troubleshooting) that allow a thief to bypass the entire thing—for those familiar with Android development, we’re talking if the phone is turned off and then plugged into a computer it boots into recovery mode and activates ADB.

There is a curious omission in this paper: since 2013, Google itself has provided a remote lock/wipe feature as part of the Google Apps bundle that’s installed on most Android devices [1]. Since the Apps bundle is, nowadays, Google’s way of getting around vendors who can’t be bothered to patch the base operating system in a timely fashion, it has all the privileges it needs to do this job correctly, and the feature should be available regardless of Android base version. The UX is reasonable, and one would presume that the developers are at least somewhat less likely to make mistakes in the implementation. This paper, however, doesn’t mention that at all, despite (correctly) pointing out that this is a difficult thing for a third-party application to get right and the device vendors should step up.

Practical advice for phone-owners continues to be: encrypt the phone on first boot, before giving it any private information, and use a nontrivial unlock code. Practical advice for antivirus vendors, I think, is you’re not testing your software adversarially enough. The implementation bugs, in particular, suggest that the vendors’ test strategy confirmed that remote lock/wipe works, when set up correctly, but did not put enough effort into thinking of ways to defeat the lock.

On Subnormal Floating Point and Abnormal Timing

Most of the time, people think the floating-point unit has no security significance, simply because you don’t use the floating-point unit for anything with security significance. Cryptography, authentication, access control, process isolation, virtualization, trusted paths, it’s all done with integers. Usually not even negative integers.

Today’s paper presents some situations where that is not the case: where the low-level timing behavior of floating-point arithmetic is, in fact, security-critical. The most elemental of these situations involves displaying stuff on the screen—nowadays, everything on the screen gets run through a 3D rendering pipeline even if it looks completely flat, because you have the hardware just sitting there, and that process intrinsically involves floating point. And there’s an API to tell you how long it took to render the previous frame of animation because if you were actually animating something you would genuinely need to know that. So if there’s something being displayed on the screen that you, a malicious program, are not allowed to know what it is, but you can influence how it is being displayed, you might be able to make the information you’re not allowed to know affect the rendering time, and thus extract the information—slowly, but not too slowly to be practical. There is also a scenario involving differentially private databases, where you’re allowed to know an approximation to an aggregate value but not see individual table rows; the aggregate is computed with floating point, and, again, computation time can reveal the secret values.

In both cases, the floating-point computations are uniform, with no data-dependent branches or anything like that, so how does timing variation sneak in? It turns out that on all tested CPUs and GPUs, primitive floating-point arithmetic operations—add, multiply, divide—don’t always take the same amount of time to execute. Certain combinations of input values are slower than others, and predictably so. As the authors point out, this is a well-known problem for numerical programmers. It has to do with a feature of IEEE floating point known as subnormal numbers. These allow IEEE floating point to represent numbers that are very close to zero, so close that they don’t fit into the bit representation without bending the rules. This is mathematically desirable because it means that if two floating-point values are unequal, subtracting one from the other will never produce zero. However, subnormals are awkward to implement in hardware; so awkward that CPUs in the 1990s were notorious for suffering a slowdown of 100x or more for basic arithmetic on subnormals. Nowadays it’s not so bad; if I’m reading the (poorly designed) charts in this paper correctly, it’s only a 2–4x slowdown on modern hardware. But that’s still enough to detect and build a timing channel out of.

Timing channels are a perennial problem because they tend to be side-effects of something desirable. Algorithms are often easier to understand if you dispose of special cases up-front—this paper also talks about how division by zero might be much faster than division by a normal value, presumably because the CPU doesn’t bother running through the divide circuit in that case. Running fast most of the time, slow in unusual cases, is often an excellent algorithmic choice for overall performance: hash tables, quicksort, etc. The papers that defined the concept of covert channels [1] [2] discuss timing channels introduced by data caching, without which Moore’s Law would have been hamstrung by fundamental physics decades ago.

However, I don’t think there’s any excuse for variable-time arithmetic or logic operations, even in floating point. (I might be persuaded to allow it for division, which is genuinely algorithmically hard.) In particular I have never understood why subnormal numbers are a problem for hardware. Now I don’t know beans about hardware design, but I have written floating-point emulation code, in assembly language, and here’s the thing: in software, subnormals are straightforward to implement. You need larger internal exponent precision, an additional test in both decode and encode, a count-leading-zeroes operation on the way in and an extra bit-shift on the way out. I didn’t try to make my emulator constant-time, but I can imagine doing it without too much trouble, at least for addition and multiplication. In hardware, that seems like it ought to translate to a wider internal data bus, a couple of muxes, a count-leading-zeroes widget, and a barrel shifter that you probably need anyway, and it should be easy to make it constant-time across both subnormals and normals, without sacrificing speed at all.

Constant time floating point operations for all inputs and outputs does strike me as harder, but only because IEEE floating point is specified to generate hardware faults and/or out-of-band exception bits under some circumstances. This was a mistake. A floating-point standard that has neither control nor status registers, and generates exclusively in-band results, is decades overdue. It would require sacrificing a bit of mantissa for a sticky this number is inexact flag, but I think that should be acceptable.

As a final headache, how long do you suppose it’ll be before someone comes up with an exploit that can only be fixed by making all of the transcendental function library (sin, cos, log, exp, …) constant-time? Never mind performance, I doubt this can be done without severely compromising accuracy.