Papers published in 2015

Ad Injection at Scale: Assessing Deceptive Advertisement Modifications

Today we have a study of ad injection software, which runs on your computer and inserts ads into websites that didn’t already have them, or replaces the website’s ads with their own. (The authors concentrate on browser extensions, but there are several other places where such programs could be installed with the same effect.) Such software is, in 98 out of 100 cases (figure taken from paper), not intentionally installed; instead it is a side-load, packaged together with something else that the user intended to install, or else it is loaded onto the computer by malware.

The injected ads cannot easily be distinguished from ads that a website intended to run, by the person viewing the ads or by the advertisers. A website subjected to ad injection, however, can figure it out, because it knows what its HTML page structure is supposed to look like. This is how the authors detected injected ads on a variety of Google sites; they say that they developed software that can be reused by anyone, but I haven’t been able to find it. They say that Content-Security-Policy should also work, but that doesn’t seem right to me, because page modifications made by a browser extension should, in general, be exempt from CSP.

The bulk of the paper is devoted to characterizing the ecosystem of ad-injection software: who makes it, how does it get onto people’s computers, what does it do? Like the malware ecosystem [1] [2], the core structure of this ecosystem is a layered affiliate network, in which a small number of vendors market ad-injection modules which are added to a wide variety of extensions, and broker ad delivery and clicks from established advertising exchanges. Browser extensions are in an ideal position to surveil the browser user and build up an ad-targeting profile, and indeed, all of the injectors do just that. Ad injection is often observed in conjunction with other malicious behaviors, such as search engine hijacking, affiliate link hijacking, social network spamming, and preventing uninstallation, but it’s not clear whether the ad injectors themselves are responsible for that (it could equally be that the extension developer is trying to monetize by every possible means).

There are some odd gaps. There is no mention of click-fraud; it is easy for an extension to forge clicks, so I’m a little surprised the authors did not discuss the possibility. There is also no discussion of parasitic repackaging. This is a well-known problem with desktop software, with entire companies whose business model is take software that someone else wrote and gives away for free; package it together with ad injectors and worse; arrange to be what people find when they try to download that software. [3] [4] It wouldn’t surprise me if these were also responsible for an awful lot of the problematic extensions discussed in the paper.

An interesting tidbit, not followed up on, is that ad injection is much more common in South America, parts of Africa, South Asia, and Southeast Asia than in Europe, North America, Japan, or South Korea. (They don’t have data for China, North Korea, or all of Africa.) This could be because Internet users in the latter countries are more likely to know how to avoid deceptive software installers and malicious extensions, or, perhaps, just less likely to click on ads in general.

The solutions presented in this paper are rather weak: more aggressive weeding of malicious extensions from the Chrome Web Store and similar repositories, reaching out to ad exchanges to encourage them to refuse service to injectors (if they can detect them, anyway). A more compelling solution would probably start with a look at who profits from bundling ad injectors with their extensions, and what alternative sources of revenue might be viable for them. Relatedly, I would have liked to see some analysis of what the problematic extensions’ overt functions were. There are legitimate reasons for an extension to add content to all webpages, e.g. [5] [6], but extension repositories could reasonably require more careful scrutiny of an extension that asks for that privilege.

It would also help if the authors acknowledged that the only difference between an ad injector and a legitimate ad provider is that the latter only runs ads on sites with the site’s consent. All of the negative impact to end users—behavioral tracking, pushing organic content below the fold or under interstitials, slowing down page loads, and so on—is present with site-solicited advertising. And the same financial catch-22 is present for website proprietors as extension developers: advertising is one of the only proven ways to earn revenue for a website, but it doesn’t work all that well, and it harms your relationship with your end users. In the end I think the industry has to find some other way to make money.

Security Analysis of Android Factory Resets

Unlike the last two papers, today’s isn’t theoretical at all. It’s a straightforward experimental study of whether or not any data can be recovered from an Android phone that’s undergone a factory reset. Factory reset is supposed to render it safe to sell a phone on the secondhand market, which means any personal data belonging to the previous owner should be erased at least thoroughly enough that the new owner cannot get at it without disassembling the phone. (This is NIST logical media sanitization, for those who are familiar with those rules—digital sanitization, where the new owner cannot extract any data short of taking an electron microscope to the memory chip, would of course be better, but in the situations where the difference matters, merely having a cell phone may mean you have already failed.)

The paper’s core finding is also straightforward: due to a combination of UI design errors, major oversights in the implementation of the factory-reset feature, driver bugs, and the market’s insistence that flash memory chips have to pretend to be traditional spinning rust hard disks even though they don’t work anything like that, none of the phones in the study erased everything that needed to get erased. The details are confusingly presented, but they’re not that important anyway. It ends with a concrete set of recommendations for future improvements, which is nice to see.

There are a few caveats. The study only covers Android 2.3 through 4.3; it is unclear to me whether the situation is improved in newer versions of Android. They don’t discuss device encryption at all—this is at least available in all versions studied, and should completely mitigate the problem provided that a factory reset successfully wipes out the old encryption keys, and that there’s no way for unencrypted data to survive the encryption process. (Unfortunately, the normal setup flow for a new phone, at least in the versions I’ve used, asks for a bunch of sensitive info before offering to encrypt the phone. You can bypass this but it’s not obvious how.) It would be great if someone tested whether encryption is effective in practice.

Beyond the obvious, this is also an example of Android’s notorious update problem. Many of the cell carriers cannot be bothered to push security updates—or any updates—to phones once they have been sold. [1] They are also notorious for making clumsy modifications to the software that harm security, usability, or both. [2] In this case, this means driver bugs that prevent the core OS from erasing everything it meant to erase, and failure to deploy patches that made the secure erase mechanism work better. Nobody really has a good solution to that. I personally am inclined to say that neither the telcos nor Google should be in the business of selling phones, and conversely, the consumer electronics companies who are in that business should not have the opportunity to make any modifications to the operating system. Whatever the hardware is, it’s got to work with a stock set of drivers, and the updates have to come direct from Google. That would at least simplify the problem.

(Arguably Google shouldn’t be in the OS business either, but that’s a more general antitrust headache. One thing at a time.)

Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice

Diffie-Hellman key exchange is a cryptographic primitive used in nearly all modern security protocols. Like many cryptographic primitives, it is difficult to break because it is difficult to solve a particular mathematical problem; in this case, the discrete logarithm problem. Generally, when people try to break a security protocol, they either look at the pure math of it—searching for an easier way to solve discrete logarithms—or they look for mistakes in how the protocol is implemented—something that will mean you don’t have to solve a discrete logarithm to break the protocol.

This paper does a bit of both. It observes something about the way Diffie-Hellman is used in Internet security protocols that makes it unusually easy to solve the discrete logarithms that will break the protocol. Concretely: to break an instance of Diffie-Hellman as studied in this paper, you have to solve for xx in the equation

gxy(modp)g^x \equiv y \;(\text{mod}\,p)

where all numbers are positive integers, gg and pp are fixed in advance, and yy is visible on the wire. The trouble is with the fixed in advance part. It turns out that the most efficient known way to solve this kind of equation, the general number field sieve, can be broken into two stages. The first stage is much more expensive than the second, and it depends only on gg and pp. So if the same gg and pp were reused for many communications, an eavesdropper could do the first stage in advance, and then breaking individual communications would be much easier—perhaps easy enough to do on the fly, as needed.

At least three common Internet security protocols (TLS, IPsec, and SSH) do reuse gg and pp, if they are not specifically configured otherwise. As the paper puts it, if the attacker can precompute for one 1024-bit group, they can compromise 37% of HTTPS servers in the Alexa Top 1 million, 66% of all probeable IPsec servers, and 26% of all probeable SSH servers. A group is a specific pair of values for gg and pp, and the number of bits essentially refers to how large pp is. 1024 bits is the smallest size that was previously considered secure; this paper demonstrates that the precomputation for one such group would cost only a little less than a billion dollars, most of which goes to constructing the necessary supercomputer—the incremental cost for more groups is much smaller. As such we have to move 1024 bits onto the not secure anymore pile. (There’s an entire section devoted to the possibility that the NSA might already have done this.)

(Another section of the paper demonstrates that 512-bit groups can be precomputed by a small compute cluster, and 768-bit groups by a substantial (but not enormous) cluster: 110 and 36,500 core-years of computation, respectively. The former took one week of wall-clock time with the equipment available to the authors. We already knew those groups were insecure; unfortunately, they are still accepted by ~10% of servers.)

What do we do about it? If you’re running a server, the thing you should do right now is jump to a 2048-bit group; the authors have instructions for common TLS servers and SSH, and generic security configuration guidelines for HTTPS servers and SSH also cover this topic. (If you know where to find instructions for IPsec, please let me know.) 2048 bits is big enough that you probably don’t need to worry about using the same group as anyone else, but generating your own groups is also not difficult. It is also important to make sure that you have completely disabled support for export ciphersuites. This eliminates the 512- and 768-bit groups and also several other primitives that we know are insecure.

At the same time, it would be a good idea turn on support for TLSv1.2 and modern ciphersuites, including elliptic curve Diffie-Hellman, which requires an attacker to solve a more complicated equation and is therefore much harder to break. It’s still a discrete logarithm problem, but in a different finite field that is harder to work with. I don’t understand the details myself, but an elliptic curve group only needs 256 bits to provide the same security as a 2048-bit group for ordinary Diffie-Hellman. There is a catch: the general number field sieve works for elliptic curves, too. I’m not aware of any reason why the precomputation attack in this paper wouldn’t apply, and I wish the authors had estimated how big your curve group needs to be for it to be infeasible. 256 bits is almost certainly big enough; but how about 128 bits, which is usually considered equivalent to 1024 bits for ordinary DH?

In the longer timeframe, where we think about swapping out algorithms entirely, I’m going to say this paper means cryptographers should take precomputation should not help the attacker as a design principle. We already do this for passwords, where the whole point of salt is to make precomputation not help; it’s time to start thinking about that in a larger context.

Also in the longer timeframe, this is yet another paper demonstrating that old clients and servers, that only support old primitives that are now insecure, hurt everyone’s security. Just about every time one peer continues to support a broken primitive for backward compatibility, it has turned out that a man in the middle can downgrade it—trick it into communicating insecurely even with counterparties that do support good primitives. (There’s a downgrade attack to 512-bit Diffie-Hellman groups in this paper.) This is one of those thorny operational/incentive alignment problems that hasn’t been solved to anyone’s satisfaction. Windows XP makes an instructive example: it would have been technically possible for Microsoft to backport basically all of the security improvements (and other user-invisible improvements) in later versions of Windows, and there are an enormous number of people who would have benefited from that approach. But it wouldn’t have been in Microsoft’s own best interest, so instead they did things geared to force people onto newer versions of the OS even at the expense of security for people who didn’t want to budge, and people who needed to continue interoperating with people who didn’t want to budge (schannel.dll in XP still doesn’t support TLS 1.2). I could spin a similar story for basically every major player in the industry.

Protecting traffic privacy for massive aggregated traffic

Nowadays we are pretty darn sure we know how to encrypt network packets in transit such that there’s no realistic way to decrypt them. There are a whole host of second-order problems that are not as well solved, but basic message confidentiality is done—except for one thing: the length and timing of each packet are still visible to anyone who can monitor the wire. It turns out that this enables eavesdroppers to extract a frightening amount of information from those encrypted packets, such as which page of a medical-advice website you are reading [1], the language of your VoIP phone call [2], or even a transcript of your VoIP phone call [3].

In principle, we know how to close this information leak. It is simply necessary for Alice to send Bob a continuous stream of fixed-size packets, at a fixed rate, forever, whether or not she has anything useful to say. (When she doesn’t have anything useful to say, she can encrypt an endless stream of binary zeroes.) This is obviously a non-starter in any context where Alice cares about power consumption, shared channel capacity, or communicating with more than one Bob. Even when none of these is a concern—for instance, high-volume VPN links between data centers, which are running at significant utilization 24x7x365 anyway—forcing all traffic to fit the constant packet length and transmission schedule adds significant overhead. Thus, there’s a whole line of research trying to minimize that overhead, and/or see how far we can back down from the ideal-in-principle case without revealing too much.

Today’s paper offers a theoretical model and test framework that should make it easier to experiment with the high-volume-VPN-link case. I like it for its concreteness—often theoretical modeling papers feel so divorced from practice that it’s hard to see how to do anything constructive with them. It is also easy to see how the model could be extended to more sophisticated traffic-shaping techniques, which would be the obvious next step. The big downside, though, is that it only considers a fixed network topology: Alice talking to Bob and no one else, ever. Even for inter-data-center links, topologies can change on quite short notice, and a topology-change event might be exactly what Eve is listening for (perhaps it gives her advance notice of some corporate organizational change). To be fair, the continuous stream of fixed size packets scheme does completely solve the length-and-timing issue in principle; we do not have nearly as good schemes for concealing who is talking to whom, even in principle. Which is unfortunate, because you can do even more terrifying things with knowledge only of who is talking to whom. [4]

Secrets, Lies, and Account Recovery

Today’s paper appeared in WWW 2015 and was also summarized on Google’s security blog. The subtitle is Lessons from the Use of Personal Knowledge Questions at Google, and the basic conclusion is: security questions (for recovering access to an account when you’ve forgotten your password) are terribly insecure. Well, duh, you are probably thinking, the answers are right there on my social network profile for anyone to read. That’s true, but it’s not the angle this paper takes. This paper is more interested in guessing by attackers who haven’t read your social network profile. At most, they know your preferred language and country of residence. It turns out that, if you only ask one security question for account recovery, three guesses by this adversary are sufficient to penetrate 10–20% of accounts; that is how many people typically share the most frequent three answers per language. (Depending on the question and language, it might be much worse; for instance, the three most common places of birth for Korean speakers cover 40% of that population.)

They also observe that questions where there should be no shared answers (what is your frequent flyer number?) do have shared answers, because people give fake answers, and the fake answers tend to be things like 123456790. This brings the security of such questions down to the level of all the others. Finally, there is a user survey, from which the most telling two items are: People think account recovery via security question is more secure and more reliable than account recovery via a secret code sent to them in a text message or email (precisely the opposite is true). People give fake answers because they want to avoid having the answers be visible on their social network profile (or otherwise easy to find).

If someone knows almost nothing about you, why are they bothering to try to take over your account? Probably the two most important reasons are they need throwaway addresses to send spam with and if they steal $100 from ten thousand people, that adds up to a million dollars. These are compelling enough that any major online service provider (such as Google) sees continuous, bulk, brute-force attacks by adversaries who don’t care which accounts they compromise. The authors don’t say, but I wouldn’t be surprised if those attacks constituted a supermajority of all attacks on Google accounts.

Turning it around, though, the probable negative consequences of having your account cracked by a spammer who doesn’t know or care about you as a person are really not that bad, compared to what someone who knows you and bears you a personal grudge could do. (Ransomware is the most prominent counterexample at present.) Google as a corporate entity is right to worry more about the most frequent type of attack they see—but you as an individual are probably also right to worry more about an attack that’s much less common but has way more downside for you. I think this contrast of priorities explains fake answers and the general reluctance to adopt two-factor authentication, recovery codes, etc. all of which might provide a security benefit in the future but definitely make life more inconvenient now. So Long, And No Thanks for the Externalities and How Do We Design For Learned Helplessness? go into considerably more detail on that point. (Facebook may well have improved their password reset sequence since the latter was written.) On a related note, if someone happens to have an email address or phone number that isn’t already looped back into the account that needs a recovery contact point, they might well consider it critically important that the panopticon running that account does not find out about it!

Defending Tor from Network Adversaries: A Case Study of Network Path Prediction

In a similar vein as Tuesday’s paper, this is an investigation of how practical it might be to avoid exposing Tor circuits to traffic analysis by an adversary who controls an Autonomous System. Unlike Tuesday’s paper, they assume that the adversary does not manipulate BGP to observe traffic that they shouldn’t have seen, so the concern is simply to ensure that the two most sensitive links in the circuit—from client to entry, and from exit to destination—do not pass through the same AS. Previous papers have suggested that the Tor client should predict the AS-level paths involved in these links, and select entries and exits accordingly [1] [2]. This paper observes that AS path prediction is itself a difficult problem, and that different techniques can give substantially different results. Therefore, they collected traceroute data from 28 Tor relays and compared AS paths inferred from these traces with those predicted from BGP monitoring (using the algorithm of On AS-Level Path Inference [3]).

The core finding is that traceroute-based AS path inference does indeed give substantially different results from BGP-based path prediction. The authors assume that traceroute is more accurate; the discrepancy is consistently described as an error in the BGP-based prediction, and (since BGP-based prediction tends to indicate exposure to more different ASes) as overstating the risk exposure of any given Tor link. This seems unjustified to me. The standard traceroute algorithm is known to become confused in the presence of load-balancing routers, which are extremely common in the backbone [4]; refinements have been proposed (and implemented in the scamper tool used in this paper) but have problems themselves [5] [6]. More elementally, traceroute produces a snapshot: these UDP packets did take this route just now. Tor links are relatively long-lived TCP connections (tens of minutes) which could easily be rerouted among several different paths over their lifetime. I think it would be better to say that BGP path prediction produces a more conservative estimate of the ASes to which a Tor link could be exposed, and highlight figuring out which one is more accurate as future work.

A secondary finding is that AS-aware path selection by the Tor client interacts poorly with the guard policy, in which each Tor client selects a small number of entry nodes to use for an extended period. These nodes must be reliable and high-bandwidth; the economics of running a reliable, high-bandwidth Internet server mean that they are concentrated in a small number of ASes. Similar economics apply to the operation of exit nodes, plus additional legal headaches; as a result, it may not be possible to find any end-to-end path that obeys both the guard policy and the AS-selection policy. This situation is, of course, worsened if you take the more conservative, BGP-based estimation of AS exposure.

I’ve been concerned for some time that guards might actually be worse for anonymity than the problem they are trying to solve. The original problem statement [7] is that if you select an entry node at random for each circuit, and some fraction of entry nodes are malicious, with high probability you will eventually run at least one circuit through a malicious entry. With guards, either all your circuits pass through a malicious entry for an extended period of time, or none do. My fundamental concern with this is, first, having all your traffic exposed to a malicious entry for an extended period is probably much worse for your anonymity than having one circuit exposed every now and then; second, the hypothetical Tor adversary has deep pockets and can easily operate reliable high-bandwidth nodes, which are disproportionately likely to get picked as guards. Concentration of guards in a small number of ASes only makes this easier for the adversary; concentration of guards together with exits in a small number of ASes makes it even easier. It’s tempting to suggest a complete about-face, preferentially choosing entry nodes from the low-bandwidth, short-lived population and using them only for a short time; this would also mean that entry nodes could be taken from a much broader pool of ASes, and it would be easier to avoid overlap with the AS-path from exit to destination.

The Correctness-Security Gap in Compiler Optimization

At this year’s LangSec Workshop, a paper on formal analysis of cases where compiled-code optimizations can introduce security holes that weren’t present in the source code. This class of problems has been a concern for some time; for a general overview, see this series of posts on John Regehr’s blog: [1] [2] [3]. I’ll give one concrete example, which happens to be the running example in the paper.

void crypt (...)
{
    key = load_secret_key();
    ...;          // work with the secure key
    key = 0x0;    // scrub memory
}

No correct C program can observe values in local variables after a function returns, so the compiler is very likely to delete the final store to key as unnecessary (dead in compiler jargon). However, the programmer’s intent was to ensure that the secret value in key does not survive in memory after crypt returns, where it might be accessible to malicious code—the adversary is not bound by the definition of correct C program. By removing the dead store, the compiler introduces a security hole.

This paper proposes a general strategy for analyzing compiler optimizations to determine whether or not they can introduce security holes. They observe that the definition of correct C program is in terms of an abstract machine that ignores or leaves undefined many low-level nuances. Continuing with the above example, the C standard says only that if code executing in the same program attempts to access the memory allocated to key after crypt returns, the behavior is undefined—anything at all can happen. In particular, reading from that memory might legitimately return either zero or the secret key … or some other value altogether, or it might crash the program. And the standard takes no notice at all of the possibility that another program (such as a debugger, or malware) might gain access to this program’s memory.

The real computer on which the program is executing, by contrast, exhibits some concrete and (usually) deterministic behavior for any arbitrary sequence of machine operations. Even when the architecture specification leaves an outcome undefined, some concrete thing will happen on a physical CPU and it will usually be the same concrete thing for all instances of that model of CPU. This paper’s proposal is simply that compiler optimizations have the potential to introduce security holes whenever they change the observable behavior of the physical machine. Therefore, a correctness proof for an optimization can be converted into a does not introduce security holes proof by changing its model from the abstract machine to a physical machine. In the crypt example, deleting the final store to key is invisible in the abstract machine, and therefore a valid optimization of the original code. But in the physical machine, it means the previous, secret value persists on the stack (a concept which doesn’t even exist in the C abstract machine) and may be retrievable later. If the stack is included in the machine model, the original correctness proof for dead-store elimination will detect the information leak.

Dead-store elimination is well understood as a source of this kind of problem, and library-level band-aids exist. I suspect the greatest value of this technique will be in identifying other optimizations that can potentially introduce information leaks. There is some discussion of this in the paper (common subexpression elimination as a source of timing-channel vulnerabilities) but they do not take it all the way to a model. It would be interesting to know, for instance, whether the scary-sophisticated algebraic loop-nest transformations that you might want applied to your cryptographic primitives can introduce data-dependent timing variation that wasn’t present in the source code.

Sadly, this technique by itself is not enough to prevent compiler-introduced vulnerabilities; a dead-store elimination pass that was valid in the physical machine for all memory writes would wind up doing almost nothing, which is not what we want. (The authors acknowledge this, implicitly, when they talk about the undesirability of turning all optimizations off.) To some extent, tightening up the language specification—making the abstract machine exhibit less undefined behavior—can help, by making the effects of optimization more predictable. John Regehr has a proposal for Friendly C along those lines; I think it throws several babies out with the bathwater, but as a starting point for discussion it’s worthwhile. It also won’t cure all the problems in this area: another example from the paper is

int *alloc(int nrelems)
{
    int size = nrelems * sizeof(int);  // may overflow
    if (size < nrelems) abort();       // attempt to check for overflow
    return malloc(size);
}

According to the rules of the C abstract machine, signed integer overflow causes undefined behavior, which means the compiler is entitled to assume that the comparison size < nrelems is always false. Unfortunately, making signed integer overflow well-defined does not render this code safe, because the check itself is inadequate: fixed-width twos-complement wraparound multiplication (which is the behavior of the integer multiply instruction on most CPUs) can produce values that are larger than either multiplicand but smaller than the mathematically correct result, even when one multiplicand is a small number. (In this example, with the usual sizeof(int) == 4, passing 1,431,655,766 for nrelems would produce 1,431,655,768 for size when it should be 5,726,623,064.) For the same reason, making all the variables unsigned does not help.

On a higher level, compiler authors will fight tooth and nail for aggressive optimization, because compiler authors (kinda by definition) like writing optimizations, but also because they’ve got to satisfy three conflicting user groups at once, and the latter two are more numerous and have deeper pockets:

  1. Systems programmers, who want their C to be translated directly and transparently to machine language, and believe that undefined behavior exists to allow variation in CPU behavior but not compiler behavior.

  2. Applications programmers, who want the compiler to crush as many abstraction penalties as possible out of their gigantic white-elephant C++ codebases which there will never be time or budget to clean up.

  3. Number crunchers, who care about four things: speed, speed, not having to code in FORTRAN or assembly language anymore, and speed.

And what makes this extra fun is that cryptographers are simultaneously in group 1 and group 3. I’m not going to make ETAPS 2015, but I am very curious to know what DJB is going to say about the death of optimizing compilers there.