Tensions and frictions in researching activists’ digital security and privacy practices

Presentations at HotPETS are not as technical as those at PETS proper, and are chosen for relevance, novelty, and potential to generate productive discussion. The associated papers are very short, and both are meant only as a starting point for a conversation. Several of this year’s presentations were from people and organizations that are concerned with making effective use of technology we already have, rather than building new stuff, and with highlighting the places where actual problems are not being solved.

The Tactical Technology Collective (tactical tech.org, exposing the invisible.org; if you don’t have time to watch a bunch of videos, I strongly recommend at least reading the transcribed interview with Jesus Robles Maloof) is a practitioner NGO working on digital security and privacy, with deep connections to activist communities around the world. What this means is that they spend a lot of time going places and talking to people—often people at significant risk due to their political activities—about their own methods, the risks that they perceive, and how existing technology does or doesn’t help.

Their presentation was primarily about the tension between research and support goals from the perspective of an organization that wants to do both. Concretely: if you are helping activists do their thing by training them in more effective use of technology, you’re changing their behavior, which is what you want to study. (It was not mentioned, and they might not know it by this name, but I suspect they would recognize the Hawthorne Effect in which being the subject of research also changes people’s behavior.) They also discussed potential harm to activists due to any sort of intervention from the outside, whether that is research, training, or just contact, and the need for cultural sensitivity in working out the best way to go about things. (I can’t find the talk now—perhaps it was at the rump session—but at last year’s PETS someone mentioned a case where democracy activists in a Southeast Asian country specifically did not want to be anonymous, because it would be harder for the government to make them disappear if they were known by name to the Western media.)

There were several other presentations on related topics, and technology developers and researchers aren’t paying enough attention to what activists actually need has been a refrain, both at PETS and elsewhere, for several years now. I’m not sure further reiteration of that point helps all that much. What would help? I’m not sure about that either. It’s not even that people aren’t building the right tools, so much, as that the network effects of the wrong tools are so dominant that the activists have to use them even though they’re dangerous.

Cache Timing Attacks Revisited: Efficient and Repeatable Browser History, OS, and Network Sniffing

Cache timing attacks use the time some operation takes to learn whether or not a datum is in a cache. They’re inherently hard to fix, because the entire point of a cache is to speed up access to data that was used recently. They’ve been studied as far back as Multics. [1] In the context of the Web, the cache in question (usually) is the browser’s cache of resources retrieved from the network, and the attacker (usually) is a malicious website that wants to discover something about the victim’s activity on other websites. [2] [3] This paper gives examples of using cache timing to learn information needed for phishing, discover the victim’s location, and monitor the victim’s search queries. It’s also known to be possible to use the victim’s browsing habits as an identifying fingerprint [4].

The possibility of this attack is well-known, but to date it has been dismissed as an actual risk for two reasons: it was slow, and probing the cache was a destructive operation, i.e. the attacker could only probe any given resource once, after which it would be cached whether or not the victim had ever loaded it. This paper overcomes both of those limitations. It uses Web Workers to parallelize cache probing, and it sets a very short timeout on all its background network requests—so short that the request can only succeed if it’s cached. Otherwise, it will be cancelled and the cache will not be perturbed. (Network requests will always take at least a few tens of milliseconds, whereas the cache can respond in less than a millisecond.) In combination, these achieve two orders of magnitude speedup over the best previous approach, and, more importantly, they render the attack repeatable.

What do we do about it? Individual sites can mark highly sensitive data not to be cached at all, but this slows the web down, and you’ll never get everyone to do it for everything that could be relevant. Browsers aren’t about to disable caching; it’s too valuable. A possibility (not in this paper) is that we could notice the first time a resource was loaded from a new domain, and deliberately delay satisfying it from the cache by approximately the amount of time it took to load off the network originally. I’d want to implement and test that to make sure it didn’t leave a hole, but it seems like it would allow us to have the cake and eat it.

In closing, I want to point out that this is a beautiful example of the maxim that attacks always get better. Cryptographers have been banging that drum for decades, trying to get people to move away from cipher primitives that are only a little weak before it’s too late, without much impact. The same, it seems, goes for information leaks.

Privacy Games: Optimal User-Centric Data Obfuscation

I attended this year’s PETS conference a couple weeks ago, and for the next several weeks I’m going to be focusing on highlights from there. For variety’s sake, there will be one unrelated thing each week.

I’m starting with a paper which is heavier mathematical going than I usually pick. The author claims to have developed an optimal (meta-)strategy for obfuscating a person’s location before revealing it to some sort of service. The person wants to reveal enough information to get whatever utility the service provides, while not telling it exactly where they are, because they don’t entirely trust the service. The service, meanwhile, wants to learn as much as possible about the person’s location—it doesn’t matter why.

There are two standard mathematical techniques for obfuscating a location: differential privacy and distortion privacy. They both work by adding noise to a location value, but they use different metrics for how much the adversary can learn given the noisy value. The paper asserts: differential privacy is sensitive to the likelihood of observation given data, distortion privacy is sensitive to the joint probability of observation given data. Thus, by guaranteeing both, we encompass all the defense that is theoretically possible. (This is a bold claim and I don’t know enough about this subfield to verify it.) The paper proceeds to demonstrate, first, that you can construct a linear-programming problem whose solution gives an appropriate amount of noise to add to a location value, and second, that this solution is game-theoretically optimal, even against an adversary who knows your strategy and can adapt to it. Then they analyze the degree to which the data has to be obfuscated, comparing this with more limited algorithms from previous literature.

I unfortunately couldn’t make any sense of the comparisons, because the figures are poorly labeled and explained. And the bulk of the paper is just mathematical derivations, to which I have nothing to add. But there is an important point I want to highlight. The game-theoretic analysis only works if everything the adversary already knows (before they learn the obfuscated location) is captured by a distance metric which defines how far the obfuscated location has been displaced. Mathematically, this is fine; any given fact that the adversary might know about the person revealing their location can be modeled by adjusting the metric. (Person is in a city, so their path is constrained to the street grid. Person just left a large park, there is only one coffee shop within radius X of that park. Person never goes to that restaurant.) However, the system doing the obfuscation might not be able to predict all of those factors, and if it distorts the metric too much it might contradict itself (person cannot possibly have walked from A to B in the time available). In the future I’d like to see some analysis of this. What is it plausible for the adversary to know? At what point does the obfuscation break down because it can’t cover all the known facts without contradicting itself?

Keys Under Doormats: Mandating insecurity by requiring government access to all data and communications

Today’s paper is not primary research, but an expert opinion on a matter of public policy; three of its authors have posted their own summaries [1] [2] [3], the general press has picked it up [4] [5] [6] [7] [8], and it was mentioned during Congressional hearings on the topic [9]. I will, therefore, only briefly summarize it, before moving on to some editorializing of my own. I encourage all of you to read the paper itself; it’s clearly written, for a general audience, and you can probably learn something about how to argue a position from it.

The paper is a direct response to FBI Director James Comey, who has for some time been arguing that data storage and communications systems must be designed for exceptional access by law enforcement agencies (quote from paper); his recent Lawfare editorial can be taken as representative. British politicians have also been making similar noises (see the above general-press articles). The paper, in short, says that this would cause much worse technical problems than it solves, and that even if, by some magic, those problems could be avoided, it would still be a terrible idea for political reasons.

At slightly more length, exceptional access means: law enforcement agencies (like the FBI) and espionage agencies (like the NSA) want to be able to wiretap all communications on the ’net, even if those communications are encrypted. This is a bad idea technically for the same reasons that master-key systems for doors can be more trouble than they’re worth. The locks are more complicated, and easier to pick than they would be otherwise. If the master key falls into the wrong hands you have to change all the locks. Whoever has access to the master keys can misuse them—which makes the keys, and the people who control them, a target. And it’s a bad idea politically because, if the USA gets this capability, every other sovereign nation gets it too, and a universal wiretapping capability is more useful to a totalitarian state that wants to suppress the political opposition, than it is to a detective who wants to arrest murderers. I went into this in more detail toward the end of my review of RFC 3514.

I am certain that James Comey knows all of this, in at least as much detail as it is explained in the paper. Moreover, he knows that the robust democratic debate he calls for already happened, in the 1990s, and wiretapping lost. [10] [11] [12] Why, then, is he trying to relitigate it? Why does he keep insisting that it must somehow be both technically and politically feasible, against all evidence to the contrary? Perhaps most important of all, why does he keep insisting that it’s desperately important for his agency to be able to break encryption, when it was only an obstacle nine times in all of 2013? [13]

On one level, I think it’s a failure to understand the scale of the problems with the idea. On the technical side, if you don’t live your life down in the gears it’s hard to bellyfeel the extent to which everything is broken and therefore any sort of wiretap requirement cannot help but make the situation worse. And it doesn’t help, I’m sure, that Comey has heard (correctly) that what he wants is mathematically possible, so he thinks everyone saying this is impossible is trying to put one over on him, rather than just communicate this isn’t practically possible.

The geopolitical problems with the idea are perhaps even harder to convey, because the Director of the FBI wrestles with geopolitical problems every day, so he thinks he does know the scale there. For instance, the paper spends quite some time on a discussion of the jurisdictional conflict that would naturally come up in an investigation where the suspect is a citizen of country A, the crime was committed in B, and the computers involved are physically in C but communicate with the whole world—and it elaborates from there. But we already have treaties to cover that sort of investigation. Comey probably figures they can be bent to fit, or at worst, we’ll have to negotiate some new ones.

If so, what he’s missing is that he’s imagining too small a group of wiretappers: law enforcement and espionage agencies from countries that are on reasonably good terms with the USA. He probably thinks export control can keep the technology out of the hands of countries that aren’t on good terms with the USA (it can’t) and hasn’t even considered non-national actors: local law enforcement, corporations engaged in industrial espionage, narcotraficantes, mafiosi, bored teenagers, Anonymous, religious apocalypse-seekers, and corrupt insiders in all the above. People the USA can’t negotiate treaties with. People who would already have been thrown in jail if anyone could make charges stick. People who may not even share premises like what good governance or due process of law or basic human decency mean. There are a bit more than seven billion people on the planet today, and this is the true horror of the Internet: roughly 40% of those people [14] could, if they wanted, ruin your life, right now. It’s not hard. [15] (The other 60% could too, if they could afford to buy a mobile, or at worst a satellite phone.)

But these points, too, have already been made, repeatedly. Why does none of it get through? I am only guessing, but my best guess is: the War On Some Drugs [16] and the aftermath of 9/11 [17] (paywalled, sorry; please let me know if you have a better cite) have both saddled the American homeland security complex with impossible, Sisyphean labors. In an environment where failure is politically unacceptable, yet inevitable, the loss of any tool—even if it’s only marginally helpful—must seem like an existential threat. To get past that, we would have to be prepared to back off on the must never happen again / must be stopped at all cost posturing; the good news is, that has an excellent chance of delivering better, cheaper law enforcement results overall. [18]

Censorship in the Wild: Analyzing Internet Filtering in Syria

Last week we looked at a case study of Internet filtering in Pakistan; this week we have a case study of Syria. (I think this will be the last such case study I review, unless I come across a really compelling one; there’s not much new I have to say about them.)

This study is chiefly interesting for its data source: a set of log files from the Blue Coat brand DPI routers that are allegedly used [1] [2] to implement Syria’s censorship policy, covering a 9-day period in July and August of 2011. leaked by the Telecomix hacktivist group. Assuming that these log files are genuine, this gives the researchers what we call ground truth: they can be certain that sites appearing in the logs are, or are not, censored. (This doesn’t mean they know the complete policy, though. The routers’ blacklists could include sites or keywords that nobody tried to visit during the time period covered by the logs.)

With ground truth it is possible to make more precise deductions from the phenomena. For instance, when the researchers see URLs of the form http://a1b2.cdn.example/adproxy/cyber/widget blocked by the filter, they know (because the logs say so) that the block is due to a keyword match on the string proxy, rather than the domain name, the IP address, or any other string in the HTTP request. This, in turn, enables them to describe the censorship policy quite pithily: Syrian dissident political organizations, anything and everything to do with Israel, instant messaging tools, and circumvention tools are all blocked. This was not possible in the Pakistani case—for instance, they had to guess at the exact scope of the porn filter.

Because the leaked logs cover only a very short time window, it’s not possible to say anything about the time evolution of Syrian censorship, which is unfortunate, considering the tumultuous past few years that the country has had.

The leak is from several years ago. There is heavy reliance on keyword filtering; it would be interesting to know if this has changed since, what with the increasing use of HTTPS making keyword filtering less useful. For instance, since 2013 Facebook has defaulted to HTTPS for all users. This would have made it much harder for Syria to block access to specific Facebook pages, as they were doing in this study.

Security Analysis of Consumer-Grade Anti-Theft Solutions Provided by Android Mobile Anti-Virus Apps

Today we have another analysis of Android device security against scenarios where the owner loses control of the phone, by the same researchers who wrote Security Analysis of Android Factory Resets. Here, instead of looking at what happens when the owner deliberately erases a phone in order to sell it, they study what happens when the phone is stolen and the owner tries to wipe or disable it remotely. A remote disable mechanism is commonly included in anti-virus programs for Android, and this study looks at ten such mechanisms.

As with factory reset, the core finding boils down to none of them work. The root causes are slightly different, though. The core of Android, naturally, is at pains to prevent normal applications from doing something as destructive as initiating a memory wipe, rendering the phone unresponsive to input, or changing passwords. To be properly effective the anti-theft program needs administrative privileges, which can be revoked at any time, so there’s an inherent race between the owner activating the anti-theft program and the thief disabling it. To make matters worse, UX bugs make it easy for these programs to appear to be installed correctly but not have the privileges they need; implementation bugs (possibly caused by unclear Android documentation) may leave loopholes even when the program was installed correctly; and several device vendors added backdoor capabilities (probably for in-store troubleshooting) that allow a thief to bypass the entire thing—for those familiar with Android development, we’re talking if the phone is turned off and then plugged into a computer it boots into recovery mode and activates ADB.

There is a curious omission in this paper: since 2013, Google itself has provided a remote lock/wipe feature as part of the Google Apps bundle that’s installed on most Android devices [1]. Since the Apps bundle is, nowadays, Google’s way of getting around vendors who can’t be bothered to patch the base operating system in a timely fashion, it has all the privileges it needs to do this job correctly, and the feature should be available regardless of Android base version. The UX is reasonable, and one would presume that the developers are at least somewhat less likely to make mistakes in the implementation. This paper, however, doesn’t mention that at all, despite (correctly) pointing out that this is a difficult thing for a third-party application to get right and the device vendors should step up.

Practical advice for phone-owners continues to be: encrypt the phone on first boot, before giving it any private information, and use a nontrivial unlock code. Practical advice for antivirus vendors, I think, is you’re not testing your software adversarially enough. The implementation bugs, in particular, suggest that the vendors’ test strategy confirmed that remote lock/wipe works, when set up correctly, but did not put enough effort into thinking of ways to defeat the lock.

Using Memory Errors to Attack a Virtual Machine

All modern operating systems include some form of process isolation—a mechanism for insuring that two programs can run simultaneously on the same machine without interfering with each other. Standard techniques include virtual memory and static verification. Whatever the technique, though, it relies on the axiom that the computer faithfully executes its specified instruction set (as the authors of today’s paper put it). In other words, a hardware fault can potentially ruin everything. But how practical is it to exploit a hardware fault? They are unpredictable, and the most common syndrome is for a single bit in RAM to flip. How much damage could one bit-flip do, when you have no control over where or when it will happen?

Today’s paper demonstrates that, if you are able to run at least one program yourself on a target computer, a single bit-flip can give you total control. Their exploit has two pieces. First, they figured out a way to escalate a single bit-flip to total control of a Java virtual machine; basically, once the bit flips, they have two pointers to the same datum but with different types (integer, pointer) and that allows them to read and write arbitrary memory locations, which in turn enables them to overwrite the VM’s security policy with one that lets the program do whatever it wants. That’s the hard part. The easy part is, their attack program fills up the computer’s memory with lots of copies of the data structure that will give them total control if a bit flips inside it. That way, the odds of the bit flip being useful are greatly increased. (This trick is known as heap spraying and it turns out to be useful for all sorts of exploits, not just the ones where you mess with the hardware.)

This is all fine and good, but you’re still waiting for a cosmic ray to hit the computer and flip a bit for you, aren’t you? Well, maybe there are other ways to induce memory errors. Cosmic rays are ionizing radiation, which you can also get from radioactivity. But readily available radioactive material (e.g. in smoke detectors) isn’t powerful enough, and powerful enough radioactive material is hard to get (there’s a joke where the paper suggests that the adversary could purchase a small oil drilling company in order to get their hands on a neutron source). But there’s also heat, and that turns out to work quite well, at least on the DRAM that was commonly available in 2003. Just heat the chips to 80–100˚C and they start flipping bits. The trick here is inducing enough bit-flips to exploit, but not so many that the entire computer crashes; they calculate an ideal error rate at which the exploit program has a 71% chance of succeeding before the computer crashes.

The attack in this paper probably isn’t worth worrying about in real life. But attacks only ever get nastier. For instance, about a decade later some people figured out how to control where the memory errors happen, and how to induce them without pointing a heat gun at the hardware: that was An Experimental Study of DRAM Disturbance Errors which I reviewed back in April. That attack is probably practical, if the hardware isn’t taking any steps to prevent it.

A Look at the Consequences of Internet Censorship Through an ISP Lens

When a national government decides to block access to an entire category of online content, naturally people who wanted to see that content—whatever it is—will try to find workarounds. Today’s paper is a case study of just such behavior. The authors were given access to a collection of bulk packet logs taken by an ISP in Pakistan. The ISP had captured a day’s worth of traffic on six days ranging from October 2011 through August 2013, a period that included two significant changes to the national censorship policy. In late 2011, blocking access to pornography became a legal mandate (implemented as a blacklist of several thousand sites, maintained by the government and disseminated to ISPs in confidence—the authors were not allowed to see this blacklist). In mid-2012, access to Youtube was also blocked, in retaliation for hosting anti-Islamic videos [1]. The paper analyzes the traffic in aggregate to understand broad trends in user behavior and how these changed in response to the censorship.

The Youtube block triggered an immediate and obvious increase in encrypted traffic, which the authors attribute to an increased use of circumvention tools—the packet traces did not record enough information to identify exactly what tool, or to discriminate circumvention from other encrypted traffic, but it seems a reasonable assumption. Over the next several months, alternative video sharing/streaming services rose in popularity; as of the last trace in the study, they had taken over roughly 80% of the market share formerly held by Youtube.

Users responded quite differently to the porn block: roughly half of the inbound traffic formerly attributable to porn just disappeared, but the other half was redirected to different porn sites that didn’t happen to be on the official blacklist. The censorship authority did not react by adding the newly popular sites to the blacklist. Perhaps a 50% reduction in overall consumption of porn was good enough for the politicians who wanted the blacklist in the first place.

The paper also contains also some discussion of the mechanism used to block access to censored domains. This confirms prior literature [2] so I’m not going to go into it in great detail; we’ll get to those papers eventually. One interesting tidbit (also previously reported) is that Pakistan has two independent filters, one implemented by local ISPs which falsifies DNS responses, and another operating in the national backbone which forges TCP RSTs and/or HTTP redirections.

The authors don’t talk much about why user response to the Youtube block was so different from the response to the porn block, but it’s evident from their discussion of what people do right after they hit a block in each case. This is very often a search engine query (unencrypted, so visible in the packet trace). For Youtube, people either search for proxy/circumvention services, or they enter keywords for the specific video they wanted to watch, hoping to find it elsewhere, or at least a transcript. For porn, people enter keywords corresponding to a general type of material (sex act, race and gender of performers, that sort of thing), which suggests that they don’t care about finding a specific video, and will be content with whatever they find on a site that isn’t blocked. This is consistent with analysis of viewing patterns on a broad-spectrum porn hub site [3]. It’s also consistent with the way Youtube is integrated into online discourse—people very often link to or even embed a specific video on their own website, in order to talk about it; if you can’t watch that video you can’t participate in the conversation. I think this is really the key finding of the paper, since it gets at when people will go to the trouble of using a circumvention tool.

What the authors do talk about is the consequences of these blocks on the local Internet economy. In particular, Youtube had donated a caching server to the ISP in the case study, so that popular videos would be available locally rather than clogging up international data channels. With the block and the move to proxied, encrypted traffic, the cache became useless and the ISP had to invest in more upstream bandwidth. On the other hand, some of the video services that came to substitute for Youtube were Pakistani businesses, so that was a net win for the local economy. This probably wasn’t intended by the Pakistani government, but in similar developments in China [4] and Russia [5], import substitution is clearly one of the motivating factors. From the international-relations perspective, that’s also highly relevant: censorship only for ideology’s sake probably won’t motivate a bureaucracy as much as censorship that’s seen to be in the economic interest of the country.

Why Doesn’t Jane Protect Her Privacy?

Today’s paper is very similar to What Deters Jane from Preventing Identification and Tracking on the Web? and shares an author. The main difference is that it’s about email rather than the Web. The research question is the same: why don’t people use existing tools for enhancing the security and privacy of their online communications? (In this case, specifically tools for end-to-end encryption of email.) The answers are also closely related. As before, many people think no one would bother snooping on them because they aren’t important people. They may understand that their webmail provider reads their email in order to select ads to display next to it, but find this acceptable, and believe that the provider can be trusted not to do anything else with its knowledge. They may believe that the only people in a position to read their email are nation-state espionage agencies, and that trying to stop them is futile. All of these are broadly consistent with the results for the Web.

A key difference, though, is that users’ reported understanding of email-related security risks is often about a different category of threat that end-to-end encryption doesn’t help with: spam, viruses, and phishing. In fact, it may hurt: one of Gmail’s (former) engineers went on record with a detailed argument for why their ability to read all their users’ mail was essential to their ability to filter spam. [1] I’m not sure that isn’t just a case of not being able to see out of their local optimum, but it certainly does make the job simpler. Regardless, it seems to me that spam, viruses, and phishing are a much more visible and direct threat to the average email user’s personal security than any sort of surveillance. Choosing to use a service that’s very good at filtering, even at some cost in privacy, therefore strikes me as a savvy choice rather than an ignorant one. Put another way, I think a provider of end-to-end encrypted email needs to demonstrate that it can filter junk just as effectively if it wants to attract users away from existing services.

(In the current world, encryption is a signal of not being spam, but in a world where most messages were encrypted, spammers would start using encryption, and so would your PHB who keeps sending you virus-infected spreadsheets that you have to look at for your job.)

Another key difference is, you can unilaterally start using Tor, anti-tracking browser extensions, and so on, but you can’t unilaterally start encrypting your email. You can only send encrypted email to people who can receive encrypted email. Right now, that means there is a strong network effect against the use of encrypted email. There’s not a single word about this in the paper, and I find that a serious omission. It does specifically say that they avoided asking people about their experiences (if any) with PGP and similar software because they didn’t want to steer their thinking that way, but I think that was a mistake. It means they can’t distinguish what people think about email privacy in general, from what they think about end-to-end encryption tools that they may have tried, or at least heard of. There may be a substantial population of people who only looked into PGP just enough to discover that it’s only useful if the recipient also uses it, and don’t think of it anymore unless specifically prompted about it.

On Subnormal Floating Point and Abnormal Timing

Most of the time, people think the floating-point unit has no security significance, simply because you don’t use the floating-point unit for anything with security significance. Cryptography, authentication, access control, process isolation, virtualization, trusted paths, it’s all done with integers. Usually not even negative integers.

Today’s paper presents some situations where that is not the case: where the low-level timing behavior of floating-point arithmetic is, in fact, security-critical. The most elemental of these situations involves displaying stuff on the screen—nowadays, everything on the screen gets run through a 3D rendering pipeline even if it looks completely flat, because you have the hardware just sitting there, and that process intrinsically involves floating point. And there’s an API to tell you how long it took to render the previous frame of animation because if you were actually animating something you would genuinely need to know that. So if there’s something being displayed on the screen that you, a malicious program, are not allowed to know what it is, but you can influence how it is being displayed, you might be able to make the information you’re not allowed to know affect the rendering time, and thus extract the information—slowly, but not too slowly to be practical. There is also a scenario involving differentially private databases, where you’re allowed to know an approximation to an aggregate value but not see individual table rows; the aggregate is computed with floating point, and, again, computation time can reveal the secret values.

In both cases, the floating-point computations are uniform, with no data-dependent branches or anything like that, so how does timing variation sneak in? It turns out that on all tested CPUs and GPUs, primitive floating-point arithmetic operations—add, multiply, divide—don’t always take the same amount of time to execute. Certain combinations of input values are slower than others, and predictably so. As the authors point out, this is a well-known problem for numerical programmers. It has to do with a feature of IEEE floating point known as subnormal numbers. These allow IEEE floating point to represent numbers that are very close to zero, so close that they don’t fit into the bit representation without bending the rules. This is mathematically desirable because it means that if two floating-point values are unequal, subtracting one from the other will never produce zero. However, subnormals are awkward to implement in hardware; so awkward that CPUs in the 1990s were notorious for suffering a slowdown of 100x or more for basic arithmetic on subnormals. Nowadays it’s not so bad; if I’m reading the (poorly designed) charts in this paper correctly, it’s only a 2–4x slowdown on modern hardware. But that’s still enough to detect and build a timing channel out of.

Timing channels are a perennial problem because they tend to be side-effects of something desirable. Algorithms are often easier to understand if you dispose of special cases up-front—this paper also talks about how division by zero might be much faster than division by a normal value, presumably because the CPU doesn’t bother running through the divide circuit in that case. Running fast most of the time, slow in unusual cases, is often an excellent algorithmic choice for overall performance: hash tables, quicksort, etc. The papers that defined the concept of covert channels [1] [2] discuss timing channels introduced by data caching, without which Moore’s Law would have been hamstrung by fundamental physics decades ago.

However, I don’t think there’s any excuse for variable-time arithmetic or logic operations, even in floating point. (I might be persuaded to allow it for division, which is genuinely algorithmically hard.) In particular I have never understood why subnormal numbers are a problem for hardware. Now I don’t know beans about hardware design, but I have written floating-point emulation code, in assembly language, and here’s the thing: in software, subnormals are straightforward to implement. You need larger internal exponent precision, an additional test in both decode and encode, a count-leading-zeroes operation on the way in and an extra bit-shift on the way out. I didn’t try to make my emulator constant-time, but I can imagine doing it without too much trouble, at least for addition and multiplication. In hardware, that seems like it ought to translate to a wider internal data bus, a couple of muxes, a count-leading-zeroes widget, and a barrel shifter that you probably need anyway, and it should be easy to make it constant-time across both subnormals and normals, without sacrificing speed at all.

Constant time floating point operations for all inputs and outputs does strike me as harder, but only because IEEE floating point is specified to generate hardware faults and/or out-of-band exception bits under some circumstances. This was a mistake. A floating-point standard that has neither control nor status registers, and generates exclusively in-band results, is decades overdue. It would require sacrificing a bit of mantissa for a sticky this number is inexact flag, but I think that should be acceptable.

As a final headache, how long do you suppose it’ll be before someone comes up with an exploit that can only be fixed by making all of the transcendental function library (sin, cos, log, exp, …) constant-time? Never mind performance, I doubt this can be done without severely compromising accuracy.