Regional Variation in Chinese Internet Filtering

This is one of the earlier papers that looked specifically for regional variation in China’s internet censorship; as I mentioned when reviewing Large-scale Spatiotemporal Characterization of Inconsistencies in the World’s Largest Firewall, assuming that censorship is monolithic is unwise in general and especially so for a country as large, diverse, and technically sophisticated as China. This paper concentrates on variation in DNS-based blockade: they probed 187 DNS servers in 29 Chinese cities (concentrated, like the population, toward the east of the country) for a relatively small number of sites, both highly likely and highly unlikely to be censored within China.

The results reported are maybe better described as inconsistencies among DNS servers than regional variation. For instance, there are no sites called out as accessible from one province but not another. Rather, roughly the same set of sites is blocked in all locales, but all of the blocking is somewhat leaky, and some DNS servers are more likely to leak—regardless of the site—than others. The type of DNS response when a site is blocked also varies from server to server and site to site; observed behaviors include no response at all, an error response, or (most frequently) a success response with an incorrect IP address. Newer papers (e.g. [1] [2]) have attempted to explain some of this in terms of the large-scale network topology within China, plus periodic outages when nothing is filtered at all, but I’m not aware of any wholly compelling analysis (and short of a major leak of internal policy documents, I doubt we can ever have one).

There’s also an excellent discussion of the practical and ethical problems with this class of research. I suspect this was largely included to justify the author’s choice to only look at DNS filtering, despite its being well-known that China also uses several other techniques for online censorship. It nonetheless provides valuable background for anyone wondering about methodological choices in this kind of paper. To summarize:

  • Many DNS servers accept queries from the whole world, so they can be probed directly from a researcher’s computer; however, they might vary their response depending on the apparent location of the querent, their use of UDP means it’s hard to tell censorship by the server itself from censorship by an intermediate DPI router, and there’s no way to know the geographic distribution of their intended clientele.

  • Studying most other forms of filtering requires measurement clients within the country of interest. These can be dedicated proxy servers of various types, or computers volunteered for the purpose. Regardless, the researcher risks inflicting legal penalties (or worse) on the operators of the measurement clients; even if the censorship authority normally takes no direct action against people who merely try to access blocked material, they might respond to a sufficiently high volume of such attempts.

  • Dedicated proxy servers are often blacklisted by sites seeking to reduce their exposure to spammers, scrapers, trolls, and DDoS attacks; a study relying exclusively on such servers will therefore tend to overestimate censorship.

  • Even in countries with a strong political commitment to free expression, there are some things that are illegal to download or store; researchers must take care not to do so, and the simplest way to do that is to avoid retrieving anything other than text.

Ad Injection at Scale: Assessing Deceptive Advertisement Modifications

Today we have a study of ad injection software, which runs on your computer and inserts ads into websites that didn’t already have them, or replaces the website’s ads with their own. (The authors concentrate on browser extensions, but there are several other places where such programs could be installed with the same effect.) Such software is, in 98 out of 100 cases (figure taken from paper), not intentionally installed; instead it is a side-load, packaged together with something else that the user intended to install, or else it is loaded onto the computer by malware.

The injected ads cannot easily be distinguished from ads that a website intended to run, by the person viewing the ads or by the advertisers. A website subjected to ad injection, however, can figure it out, because it knows what its HTML page structure is supposed to look like. This is how the authors detected injected ads on a variety of Google sites; they say that they developed software that can be reused by anyone, but I haven’t been able to find it. They say that Content-Security-Policy should also work, but that doesn’t seem right to me, because page modifications made by a browser extension should, in general, be exempt from CSP.

The bulk of the paper is devoted to characterizing the ecosystem of ad-injection software: who makes it, how does it get onto people’s computers, what does it do? Like the malware ecosystem [1] [2], the core structure of this ecosystem is a layered affiliate network, in which a small number of vendors market ad-injection modules which are added to a wide variety of extensions, and broker ad delivery and clicks from established advertising exchanges. Browser extensions are in an ideal position to surveil the browser user and build up an ad-targeting profile, and indeed, all of the injectors do just that. Ad injection is often observed in conjunction with other malicious behaviors, such as search engine hijacking, affiliate link hijacking, social network spamming, and preventing uninstallation, but it’s not clear whether the ad injectors themselves are responsible for that (it could equally be that the extension developer is trying to monetize by every possible means).

There are some odd gaps. There is no mention of click-fraud; it is easy for an extension to forge clicks, so I’m a little surprised the authors did not discuss the possibility. There is also no discussion of parasitic repackaging. This is a well-known problem with desktop software, with entire companies whose business model is take software that someone else wrote and gives away for free; package it together with ad injectors and worse; arrange to be what people find when they try to download that software. [3] [4] It wouldn’t surprise me if these were also responsible for an awful lot of the problematic extensions discussed in the paper.

An interesting tidbit, not followed up on, is that ad injection is much more common in South America, parts of Africa, South Asia, and Southeast Asia than in Europe, North America, Japan, or South Korea. (They don’t have data for China, North Korea, or all of Africa.) This could be because Internet users in the latter countries are more likely to know how to avoid deceptive software installers and malicious extensions, or, perhaps, just less likely to click on ads in general.

The solutions presented in this paper are rather weak: more aggressive weeding of malicious extensions from the Chrome Web Store and similar repositories, reaching out to ad exchanges to encourage them to refuse service to injectors (if they can detect them, anyway). A more compelling solution would probably start with a look at who profits from bundling ad injectors with their extensions, and what alternative sources of revenue might be viable for them. Relatedly, I would have liked to see some analysis of what the problematic extensions’ overt functions were. There are legitimate reasons for an extension to add content to all webpages, e.g. [5] [6], but extension repositories could reasonably require more careful scrutiny of an extension that asks for that privilege.

It would also help if the authors acknowledged that the only difference between an ad injector and a legitimate ad provider is that the latter only runs ads on sites with the site’s consent. All of the negative impact to end users—behavioral tracking, pushing organic content below the fold or under interstitials, slowing down page loads, and so on—is present with site-solicited advertising. And the same financial catch-22 is present for website proprietors as extension developers: advertising is one of the only proven ways to earn revenue for a website, but it doesn’t work all that well, and it harms your relationship with your end users. In the end I think the industry has to find some other way to make money.

Security Analysis of Android Factory Resets

Unlike the last two papers, today’s isn’t theoretical at all. It’s a straightforward experimental study of whether or not any data can be recovered from an Android phone that’s undergone a factory reset. Factory reset is supposed to render it safe to sell a phone on the secondhand market, which means any personal data belonging to the previous owner should be erased at least thoroughly enough that the new owner cannot get at it without disassembling the phone. (This is NIST logical media sanitization, for those who are familiar with those rules—digital sanitization, where the new owner cannot extract any data short of taking an electron microscope to the memory chip, would of course be better, but in the situations where the difference matters, merely having a cell phone may mean you have already failed.)

The paper’s core finding is also straightforward: due to a combination of UI design errors, major oversights in the implementation of the factory-reset feature, driver bugs, and the market’s insistence that flash memory chips have to pretend to be traditional spinning rust hard disks even though they don’t work anything like that, none of the phones in the study erased everything that needed to get erased. The details are confusingly presented, but they’re not that important anyway. It ends with a concrete set of recommendations for future improvements, which is nice to see.

There are a few caveats. The study only covers Android 2.3 through 4.3; it is unclear to me whether the situation is improved in newer versions of Android. They don’t discuss device encryption at all—this is at least available in all versions studied, and should completely mitigate the problem provided that a factory reset successfully wipes out the old encryption keys, and that there’s no way for unencrypted data to survive the encryption process. (Unfortunately, the normal setup flow for a new phone, at least in the versions I’ve used, asks for a bunch of sensitive info before offering to encrypt the phone. You can bypass this but it’s not obvious how.) It would be great if someone tested whether encryption is effective in practice.

Beyond the obvious, this is also an example of Android’s notorious update problem. Many of the cell carriers cannot be bothered to push security updates—or any updates—to phones once they have been sold. [1] They are also notorious for making clumsy modifications to the software that harm security, usability, or both. [2] In this case, this means driver bugs that prevent the core OS from erasing everything it meant to erase, and failure to deploy patches that made the secure erase mechanism work better. Nobody really has a good solution to that. I personally am inclined to say that neither the telcos nor Google should be in the business of selling phones, and conversely, the consumer electronics companies who are in that business should not have the opportunity to make any modifications to the operating system. Whatever the hardware is, it’s got to work with a stock set of drivers, and the updates have to come direct from Google. That would at least simplify the problem.

(Arguably Google shouldn’t be in the OS business either, but that’s a more general antitrust headache. One thing at a time.)

Censorship Resistance: Let a Thousand Flowers Bloom?

This short paper presents a simple game-theoretic analysis of a late stage of the arms race between a censorious national government and the developers of tools for circumventing that censorship. Keyword blocking, IP-address blocking, and protocol blocking for known circumvention protocols have all been insitituted and then evaded. The circumvention tool is now steganographically masking its traffic so it is indistinguishable from some commonly-used, innocuous cover protocol or protocols; the censor, having no way to unmask this traffic, must either block all use of the cover protocol, or give up.

The game-theoretic question is, how many cover protocols should the circumvention tool implement? Obviously, if there are several protocols, then the tool is resilient as long as not all of them are blocked. On the other hand, implementing more cover protocols requires more development effort, and increases the probability that some of them will be imperfectly mimicked, making the tool detectable. [1] This might seem like an intractable question, but the lovely thing about game theory is it lets you demonstrate that nearly all the fine details of each player’s utility function are irrelevant. The answer: if there’s good reason to believe that protocol X will never be blocked, then the tool should only implement protocol X. Otherwise, it should implement several protocols, based on some assessment of how likely each protocol is to be blocked.

In real life there probably won’t be a clear answer to will protocol X ever be blocked? As the authors themselves point out, the censors can change their minds about that quite abruptly, in response to political conditions. So, in real life several protocols will be needed, and that part of the analysis in this paper is not complete enough to give concrete advice. Specifically, it offers a stable strategy for the Nash equilibrium (that is, neither party can improve their outcome by changing the strategy) but, again, the censors might abruptly change their utility function in response to political conditions, disrupting the equilibrium. (The circumvention tool’s designers are probably philosophically committed to free expression, so their utility function can be assumed to be stable.) This requires an adaptive strategy. The obvious adaptive strategy is for the tool to use only one or two protocols at any given time (using more than one protocol may also improve verisimilitude of the overall traffic being surveilled by the censors) but implement several others, and be able to activate them if one of the others stops working. The catch here is that the change in behavior may itself reveal the tool to the censor. Also, it requires all the engineering effort of implementing multiple protocols, but some fraction of that may go to waste.

The paper also doesn’t consider what happens if the censor is capable of disrupting a protocol in a way that only mildly inconveniences normal users of that protocol, but renders the circumvention tool unusable. (For instance, the censor could be able to remove the steganography without necessarily knowing that it is there. [2]) I think this winds up being equivalent to the censor being able to block that protocol without downside, but I’m not sure.

Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice

Diffie-Hellman key exchange is a cryptographic primitive used in nearly all modern security protocols. Like many cryptographic primitives, it is difficult to break because it is difficult to solve a particular mathematical problem; in this case, the discrete logarithm problem. Generally, when people try to break a security protocol, they either look at the pure math of it—searching for an easier way to solve discrete logarithms—or they look for mistakes in how the protocol is implemented—something that will mean you don’t have to solve a discrete logarithm to break the protocol.

This paper does a bit of both. It observes something about the way Diffie-Hellman is used in Internet security protocols that makes it unusually easy to solve the discrete logarithms that will break the protocol. Concretely: to break an instance of Diffie-Hellman as studied in this paper, you have to solve for xx in the equation

gxy(modp)g^x \equiv y \;(\text{mod}\,p)

where all numbers are positive integers, gg and pp are fixed in advance, and yy is visible on the wire. The trouble is with the fixed in advance part. It turns out that the most efficient known way to solve this kind of equation, the general number field sieve, can be broken into two stages. The first stage is much more expensive than the second, and it depends only on gg and pp. So if the same gg and pp were reused for many communications, an eavesdropper could do the first stage in advance, and then breaking individual communications would be much easier—perhaps easy enough to do on the fly, as needed.

At least three common Internet security protocols (TLS, IPsec, and SSH) do reuse gg and pp, if they are not specifically configured otherwise. As the paper puts it, if the attacker can precompute for one 1024-bit group, they can compromise 37% of HTTPS servers in the Alexa Top 1 million, 66% of all probeable IPsec servers, and 26% of all probeable SSH servers. A group is a specific pair of values for gg and pp, and the number of bits essentially refers to how large pp is. 1024 bits is the smallest size that was previously considered secure; this paper demonstrates that the precomputation for one such group would cost only a little less than a billion dollars, most of which goes to constructing the necessary supercomputer—the incremental cost for more groups is much smaller. As such we have to move 1024 bits onto the not secure anymore pile. (There’s an entire section devoted to the possibility that the NSA might already have done this.)

(Another section of the paper demonstrates that 512-bit groups can be precomputed by a small compute cluster, and 768-bit groups by a substantial (but not enormous) cluster: 110 and 36,500 core-years of computation, respectively. The former took one week of wall-clock time with the equipment available to the authors. We already knew those groups were insecure; unfortunately, they are still accepted by ~10% of servers.)

What do we do about it? If you’re running a server, the thing you should do right now is jump to a 2048-bit group; the authors have instructions for common TLS servers and SSH, and generic security configuration guidelines for HTTPS servers and SSH also cover this topic. (If you know where to find instructions for IPsec, please let me know.) 2048 bits is big enough that you probably don’t need to worry about using the same group as anyone else, but generating your own groups is also not difficult. It is also important to make sure that you have completely disabled support for export ciphersuites. This eliminates the 512- and 768-bit groups and also several other primitives that we know are insecure.

At the same time, it would be a good idea turn on support for TLSv1.2 and modern ciphersuites, including elliptic curve Diffie-Hellman, which requires an attacker to solve a more complicated equation and is therefore much harder to break. It’s still a discrete logarithm problem, but in a different finite field that is harder to work with. I don’t understand the details myself, but an elliptic curve group only needs 256 bits to provide the same security as a 2048-bit group for ordinary Diffie-Hellman. There is a catch: the general number field sieve works for elliptic curves, too. I’m not aware of any reason why the precomputation attack in this paper wouldn’t apply, and I wish the authors had estimated how big your curve group needs to be for it to be infeasible. 256 bits is almost certainly big enough; but how about 128 bits, which is usually considered equivalent to 1024 bits for ordinary DH?

In the longer timeframe, where we think about swapping out algorithms entirely, I’m going to say this paper means cryptographers should take precomputation should not help the attacker as a design principle. We already do this for passwords, where the whole point of salt is to make precomputation not help; it’s time to start thinking about that in a larger context.

Also in the longer timeframe, this is yet another paper demonstrating that old clients and servers, that only support old primitives that are now insecure, hurt everyone’s security. Just about every time one peer continues to support a broken primitive for backward compatibility, it has turned out that a man in the middle can downgrade it—trick it into communicating insecurely even with counterparties that do support good primitives. (There’s a downgrade attack to 512-bit Diffie-Hellman groups in this paper.) This is one of those thorny operational/incentive alignment problems that hasn’t been solved to anyone’s satisfaction. Windows XP makes an instructive example: it would have been technically possible for Microsoft to backport basically all of the security improvements (and other user-invisible improvements) in later versions of Windows, and there are an enormous number of people who would have benefited from that approach. But it wouldn’t have been in Microsoft’s own best interest, so instead they did things geared to force people onto newer versions of the OS even at the expense of security for people who didn’t want to budge, and people who needed to continue interoperating with people who didn’t want to budge (schannel.dll in XP still doesn’t support TLS 1.2). I could spin a similar story for basically every major player in the industry.

Links that speak: The global language network and its association with global fame

The paper we’re looking at today isn’t about security, but it’s relevant to anyone who’s doing field studies of online culture, which can easily become part of security research. My own research right now, for instance, touches on how the Internet is used for online activism and how that may or may not be safe for the activists; if you don’t have a handle on online culture—and how it varies worldwide—you’re going to have a bad time trying to study that.

What we have here is an analysis of language pairings as found on Twitter, Wikipedia, and a database of book translations. Either the same person uses both languages in a pair, or text has been translated from one to the other. Using these pairs, they identify hub languages that are very likely to be useful to connect people in distant cultures. These are mostly, but not entirely, the languages with the greatest number of speakers. Relative to their number of speakers, regional second languages like Malay and Russian show increased importance, and languages that are poorly coupled to English (which is unsurprisingly right in the middle of the connectivity graph), like Chinese, Arabic, and the languages of India, show reduced importance.

There is then a bunch of hypothesizing about how the prominence of a language as a translation hub might influence how famous someone is and/or how easy it is for something from a particular region to reach a global audience. That’s probably what the authors of the paper thought was most important, but it’s not what I’m here for. What I’m here for is what the deviation between translation hub and widely spoken language tells us about how to do global field studies. It is obviously going to be more difficult to study an online subculture that conducts itself in a language you don’t speak yourself, and the fewer people do speak that language, the harder it will be for you to get some help. But if a language is widely spoken but not a translation hub, it may be difficult for you to get the right kind of help.

For instance, machine translations between the various modern vernaculars of Arabic and English are not very good at present. I could find someone who speaks any given vernacular Arabic without too much difficulty, but I’d probably have to pay them a lot of money to get them to translate 45,000 Arabic documents into English, or even just to tell me which of those documents were in their vernacular. (That number happens to be how many documents in my research database were identified as some kind of Arabic by a machine classifier—and even that doesn’t work as well as it could; for instance, it is quite likely to be confused by various Central Asian languages that can be written with an Arabic-derived script and have a number of Arabic loanwords but are otherwise unrelated.)

What can we (Western researchers, communicating primarily in English) do about it? First is just to be aware that global field studies conducted by Anglophones are going to be weak when it comes to languages poorly coupled to English, even when they are widely spoken. In fact, the very fact of the poor coupling makes me skeptical of the results in this paper when it comes to those languages. They only looked at three datasets, all of which are quite English-centric. Would it not have made sense to supplement that with polylingual resources centered on, say, Mandarin Chinese, Modern Standard Arabic, Hindi, and Swahili? These might be difficult to find, but not being able to find them would tend to confirm the original result, and if you could find them, you could both improve the lower bounds for coupling to English, and get a finer-grained look at the languages that are well-translated within those clusters.

Down the road, it seems to me that whenever you see a language cluster that’s widely spoken but not very much in communication with any other languages, you’ve identified a gap in translation resources and cultural cross-pollination, and possibly an underserved market. Scanlations make a good example: the lack of officially-licensed translations of comic books (mostly of Japanese-language manga into English) spawned an entire subculture of unofficial translators, and that subculture is partially responsible for increasing mainstream interest in the material to the point where official translations are more likely to happen.

Protecting traffic privacy for massive aggregated traffic

Nowadays we are pretty darn sure we know how to encrypt network packets in transit such that there’s no realistic way to decrypt them. There are a whole host of second-order problems that are not as well solved, but basic message confidentiality is done—except for one thing: the length and timing of each packet are still visible to anyone who can monitor the wire. It turns out that this enables eavesdroppers to extract a frightening amount of information from those encrypted packets, such as which page of a medical-advice website you are reading [1], the language of your VoIP phone call [2], or even a transcript of your VoIP phone call [3].

In principle, we know how to close this information leak. It is simply necessary for Alice to send Bob a continuous stream of fixed-size packets, at a fixed rate, forever, whether or not she has anything useful to say. (When she doesn’t have anything useful to say, she can encrypt an endless stream of binary zeroes.) This is obviously a non-starter in any context where Alice cares about power consumption, shared channel capacity, or communicating with more than one Bob. Even when none of these is a concern—for instance, high-volume VPN links between data centers, which are running at significant utilization 24x7x365 anyway—forcing all traffic to fit the constant packet length and transmission schedule adds significant overhead. Thus, there’s a whole line of research trying to minimize that overhead, and/or see how far we can back down from the ideal-in-principle case without revealing too much.

Today’s paper offers a theoretical model and test framework that should make it easier to experiment with the high-volume-VPN-link case. I like it for its concreteness—often theoretical modeling papers feel so divorced from practice that it’s hard to see how to do anything constructive with them. It is also easy to see how the model could be extended to more sophisticated traffic-shaping techniques, which would be the obvious next step. The big downside, though, is that it only considers a fixed network topology: Alice talking to Bob and no one else, ever. Even for inter-data-center links, topologies can change on quite short notice, and a topology-change event might be exactly what Eve is listening for (perhaps it gives her advance notice of some corporate organizational change). To be fair, the continuous stream of fixed size packets scheme does completely solve the length-and-timing issue in principle; we do not have nearly as good schemes for concealing who is talking to whom, even in principle. Which is unfortunate, because you can do even more terrifying things with knowledge only of who is talking to whom. [4]

Secrets, Lies, and Account Recovery

Today’s paper appeared in WWW 2015 and was also summarized on Google’s security blog. The subtitle is Lessons from the Use of Personal Knowledge Questions at Google, and the basic conclusion is: security questions (for recovering access to an account when you’ve forgotten your password) are terribly insecure. Well, duh, you are probably thinking, the answers are right there on my social network profile for anyone to read. That’s true, but it’s not the angle this paper takes. This paper is more interested in guessing by attackers who haven’t read your social network profile. At most, they know your preferred language and country of residence. It turns out that, if you only ask one security question for account recovery, three guesses by this adversary are sufficient to penetrate 10–20% of accounts; that is how many people typically share the most frequent three answers per language. (Depending on the question and language, it might be much worse; for instance, the three most common places of birth for Korean speakers cover 40% of that population.)

They also observe that questions where there should be no shared answers (what is your frequent flyer number?) do have shared answers, because people give fake answers, and the fake answers tend to be things like 123456790. This brings the security of such questions down to the level of all the others. Finally, there is a user survey, from which the most telling two items are: People think account recovery via security question is more secure and more reliable than account recovery via a secret code sent to them in a text message or email (precisely the opposite is true). People give fake answers because they want to avoid having the answers be visible on their social network profile (or otherwise easy to find).

If someone knows almost nothing about you, why are they bothering to try to take over your account? Probably the two most important reasons are they need throwaway addresses to send spam with and if they steal $100 from ten thousand people, that adds up to a million dollars. These are compelling enough that any major online service provider (such as Google) sees continuous, bulk, brute-force attacks by adversaries who don’t care which accounts they compromise. The authors don’t say, but I wouldn’t be surprised if those attacks constituted a supermajority of all attacks on Google accounts.

Turning it around, though, the probable negative consequences of having your account cracked by a spammer who doesn’t know or care about you as a person are really not that bad, compared to what someone who knows you and bears you a personal grudge could do. (Ransomware is the most prominent counterexample at present.) Google as a corporate entity is right to worry more about the most frequent type of attack they see—but you as an individual are probably also right to worry more about an attack that’s much less common but has way more downside for you. I think this contrast of priorities explains fake answers and the general reluctance to adopt two-factor authentication, recovery codes, etc. all of which might provide a security benefit in the future but definitely make life more inconvenient now. So Long, And No Thanks for the Externalities and How Do We Design For Learned Helplessness? go into considerably more detail on that point. (Facebook may well have improved their password reset sequence since the latter was written.) On a related note, if someone happens to have an email address or phone number that isn’t already looped back into the account that needs a recovery contact point, they might well consider it critically important that the panopticon running that account does not find out about it!

Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors

A few weeks ago, a new exploit technique called rowhammering was making the rounds of all the infosec news outlets. Turns out you can flip bits in RAM by reading over and over and over again from bits that are physically adjacent on the RAM chip; turns out that with sufficient cleverness you can leverage that to escape security restrictions being enforced in software. Abstractly, it’s as simple as toggling the this process is allowed to do whatever it wants bit, which is a real bit in RAM on many operating systems—most of the cleverness is in grooming the layout of physical RAM so that that bit is physically adjacent to a group of bits that you’re allowed to read from.

This paper describes and analyzes the underlying physical flaw that makes this exploit possible. It is (as far as I know) the first suggestion in the literature that this might be a security vulnerability; chip manufacturers have been aware of it as a reliability problem for much longer. [1] [2] If you’re not familiar with the details of how DRAM works, there are two important things to understand. First, DRAM deliberately trades stability for density. You can (presently) pack more bits into a given amount of chip surface area with DRAM than with any other design, but the bits don’t retain their values indefinitely; each bit has an electrical charge that slowly leaks away. DRAM is therefore refreshed continuously; every few tens of milliseconds, each bit will be read and written back, restoring the full charge.

Second, software people are accustomed to thinking of memory at the lowest level of abstraction as a one-dimensional array of bytes, an address space. But DRAM is physically organized as a two-dimensional array of bits, laid out as a grid on the two-dimensional surface of the chip. To read data from memory, the CPU first instructs the memory bank to load an entire row into a buffer (unimaginatively called the row buffer), and then to transmit columns (which are the width of entire cache lines) from that row. Rows are huge—32 to 256 kilobits, typically; larger than the page granularity of memory access isolation. This means, bits which are far apart in the CPU’s address spaces (even the so-called physical address space!) may be right next to each other on a RAM chip. And bits which are right next to each other on a RAM chip may belong to different memory-isolation regions, so it’s potentially a security problem if they can affect each other.

Now, the physical flaw is simply that loading one row of a DRAM chip into the row-buffer can cause adjacent rows to leak their charges faster, due to unavoidable electromagnetic phenomena, e.g. capacitative coupling between rows. Therefore, if you force the memory bank to load a particular row over and over again, enough charge might leak out of an adjacent row to make the bits change their values. (Reading a row implicitly refreshes it, so the attack row itself won’t be affected.) It turns out to be possible to do this using only ordinary CPU instructions.

The bulk of this paper is devoted to studying exactly when and how this can happen on a wide variety of different DRAM modules from different vendors. Important takeaways include: All vendors have affected products. Higher-capacity chips are more vulnerable, and newer chips are more vulnerable—both of these are because the bit-cells in those chips are smaller and packed closer together. The most common error-correcting code used for DRAM is inadequate to protect against or even detect the attack, because it can only correct one-bit errors and detect two-bit errors within any 64-bit word, but under the attack, several nearby bits often flip at the same time. (You should still make a point of buying error-correcting RAM, plus a motherboard and CPU that actually take advantage of it. One-bit errors are the normal case and they can happen as frequently as once a day. See Updates on the need to use error-correcting memory for more information.)

Fortunately, the authors also present a simple fix: the memory controller should automatically refresh nearby rows with some low, fixed probability whenever a row is loaded. This will have no visible effect in normal operation, but if a row is being hammered, nearby rows will get refreshed sufficiently more often that bit flips will not happen. It’s not clear to anyone whether this has actually been implemented in the latest motherboards, vendors being infamously closemouthed about how their low-level hardware actually works. However, several of the authors here work for Intel, and I’d expect them to take the problem seriously, at least.

What Deters Jane from Preventing Identification and Tracking on the Web?

If you do a survey, large majorities of average people will say they don’t like the idea of other people snooping on what they do online. [1] [2] Yet, the existing bolt-on software that can prevent such snooping (at least somewhat) doesn’t get used by nearly as many people. The default explanation for this is that it’s because the software is hard to install and use correctly. [3] [4]

This paper presents a complementary answer: maybe people don’t realize just how ubiquitous or invasive online snooping is, so the benefit seems not worth the hassle. The authors interviewed a small group about their beliefs concerning identification and tracking. (They admit that the study group skews young and technical, and plan to broaden the study in the future.) Highlights include: People are primarily concerned about data they explicitly provide to some service—social network posts, bank account data, buying habits—and may not even be aware that ad networks and the like can build up comprehensive profiles of online activity even if all they do is browse. They often have heard a bunch of fragmentary information about cookies and supercookies and IP addresses and so on, and don’t know how this all fits together or which bits of it to worry about. Some people thought that tracking was only possible for services with which they have an account, while they are logged in (so they log out as soon as they’re done with the service). There is also general confusion about which security threats qualify as identification and tracking—to be fair, just about all of them can include some identification or tracking component. The consequences of being tracked online are unclear, leading people to underestimate the potential harm. And finally, many of the respondents assume they are not important people and therefore no one would bother tracking them. All of these observations are consistent with earlier studies in the same vein, e.g. Rick Wash’s Folk Models of Home Computer Security.

The authors argue that this means maybe the usability problems of the bolt-on privacy software are overstated, and user education about online security threats (and the mechanism of the Internet in general) should have higher priority. I think this goes too far. It seems more likely to me that because people underestimate the risk and don’t particularly understand how the privacy software would help, they are not motivated to overcome the usability problems. I am also skeptical of the effectiveness of user education. The mythical average users may well feel, and understandably so, that they should not need to know exactly what a cookie is, or exactly what data gets sent back and forth between their computers and the cloud, or the internal structure of that cloud. Why is it that the device that they own is not acting in their best interest in the first place?