Diffie-Hellman key exchange is a cryptographic primitive used in nearly all modern security protocols. Like many cryptographic primitives, it is difficult to break because it is difficult to solve a particular mathematical problem; in this case, the discrete logarithm problem. Generally, when people try to break a security protocol, they either look at the pure math of it—searching for an easier way to solve discrete logarithms—or they look for mistakes in how the protocol is implemented—something that will mean you don’t have to solve a discrete logarithm to break the protocol.

This paper does a bit of both. It observes something about the way Diffie-Hellman is used in Internet security protocols that makes it unusually easy to solve the discrete logarithms that will break the protocol. Concretely: to break an instance of Diffie-Hellman as studied in this paper, you have to solve for $x$ in the equation

$g^x \equiv y \;(\text{mod}\,p)$

where all numbers are positive integers, $g$ and $p$ are fixed in advance, and $y$ is visible on the wire. The trouble is with the fixed in advance part. It turns out that the most efficient known way to solve this kind of equation, the general number field sieve, can be broken into two stages. The first stage is much more expensive than the second, and it depends only on $g$ and $p$. So if the same $g$ and $p$ were reused for many communications, an eavesdropper could do the first stage in advance, and then breaking individual communications would be much easier—perhaps easy enough to do on the fly, as needed.

At least three common Internet security protocols (TLS, IPsec, and SSH) do reuse $g$ and $p$, if they are not specifically configured otherwise. As the paper puts it, if the attacker can precompute for one 1024-bit group, they can compromise 37% of HTTPS servers in the Alexa Top 1 million, 66% of all probeable IPsec servers, and 26% of all probeable SSH servers. A group is a specific pair of values for $g$ and $p$, and the number of bits essentially refers to how large $p$ is. 1024 bits is the smallest size that was previously considered secure; this paper demonstrates that the precomputation for one such group would cost only a little less than a billion dollars, most of which goes to constructing the necessary supercomputer—the incremental cost for more groups is much smaller. As such we have to move 1024 bits onto the not secure anymore pile. (There’s an entire section devoted to the possibility that the NSA might already have done this.)

(Another section of the paper demonstrates that 512-bit groups can be precomputed by a small compute cluster, and 768-bit groups by a substantial (but not enormous) cluster: 110 and 36,500 core-years of computation, respectively. The former took one week of wall-clock time with the equipment available to the authors. We already knew those groups were insecure; unfortunately, they are still accepted by ~10% of servers.)

What do we do about it? If you’re running a server, the thing you should do right now is jump to a 2048-bit group; the authors have instructions for common TLS servers and SSH, and generic security configuration guidelines for HTTPS servers and SSH also cover this topic. (If you know where to find instructions for IPsec, please let me know.) 2048 bits is big enough that you probably don’t need to worry about using the same group as anyone else, but generating your own groups is also not difficult. It is also important to make sure that you have completely disabled support for export ciphersuites. This eliminates the 512- and 768-bit groups and also several other primitives that we know are insecure.

At the same time, it would be a good idea turn on support for TLSv1.2 and modern ciphersuites, including elliptic curve Diffie-Hellman, which requires an attacker to solve a more complicated equation and is therefore much harder to break. It’s still a discrete logarithm problem, but in a different finite field that is harder to work with. I don’t understand the details myself, but an elliptic curve group only needs 256 bits to provide the same security as a 2048-bit group for ordinary Diffie-Hellman. There is a catch: the general number field sieve works for elliptic curves, too. I’m not aware of any reason why the precomputation attack in this paper wouldn’t apply, and I wish the authors had estimated how big your curve group needs to be for it to be infeasible. 256 bits is almost certainly big enough; but how about 128 bits, which is usually considered equivalent to 1024 bits for ordinary DH?

In the longer timeframe, where we think about swapping out algorithms entirely, I’m going to say this paper means cryptographers should take precomputation should not help the attacker as a design principle. We already do this for passwords, where the whole point of salt is to make precomputation not help; it’s time to start thinking about that in a larger context.

Also in the longer timeframe, this is yet another paper demonstrating that old clients and servers, that only support old primitives that are now insecure, hurt everyone’s security. Just about every time one peer continues to support a broken primitive for backward compatibility, it has turned out that a man in the middle can downgrade it—trick it into communicating insecurely even with counterparties that do support good primitives. (There’s a downgrade attack to 512-bit Diffie-Hellman groups in this paper.) This is one of those thorny operational/incentive alignment problems that hasn’t been solved to anyone’s satisfaction. Windows XP makes an instructive example: it would have been technically possible for Microsoft to backport basically all of the security improvements (and other user-invisible improvements) in later versions of Windows, and there are an enormous number of people who would have benefited from that approach. But it wouldn’t have been in Microsoft’s own best interest, so instead they did things geared to force people onto newer versions of the OS even at the expense of security for people who didn’t want to budge, and people who needed to continue interoperating with people who didn’t want to budge (schannel.dll in XP still doesn’t support TLS 1.2). I could spin a similar story for basically every major player in the industry.

The Internet Measurement Conference brings us an attempt to figure out just how X.509 server certificates are being used in the wild, specifically for HTTPS servers. Yet more specifically, they are looking for endemic operational problems that harm security. The basic idea is to scan the entire IPv4 number space for servers responding on port 443, make note of the certificate(s) presented, and then analyze them.

This research question is nothing new; the EFF famously ran a similar study back in 2010, the SSL Observatory. And operational concerns about the way certificates are used in the wild go back decades; see Peter Gutmann’s slide deck Everything you Never Wanted to Know about PKI but were Forced to Find Out (PDF). What makes this study interesting is, first, it’s three years later; things can change very fast in Internet land (although, in this case, they have not). Second, the scale: the authors claim to have successfully contacted 178% more TLS hosts (per scan) and harvested 736% more certificates (in total, over the course of 110 scans spanning a little more than a year) than any previous such study.

What do we learn? Mostly that yeah, the TLS PKI is a big mess, and it hasn’t gotten any better since 2010. There are too many root CAs. There are far too many unconstrained intermediate certificates, and yet, at the same time, there are too few intermediates! (The point of intermediates is that they’re easy to replace, so if they get compromised you don’t have a catastrophe on your hands. Well, according to this paper, some 26% of all currently valid HTTPS server certificates are signed by one intermediate. No way is that going to be easy to replace if it gets compromised.) Lots of CAs ignore the baseline policies for certificate issuance and get away with it. (Unfortunately, the paper doesn’t say whether there are similar problems with the EV policies.)

Zoom out: when you have a piece of critical infrastructure with chronic operational issues, it’s a safe bet that they’re symptoms and the real problem is with operator incentives. The paper doesn’t discuss this at all, unfortunately, so I’ll throw in some speculation here. The browser vendors are notionally in the best position to Do Something about this mess, but they don’t: because the only real option they have is to delete root certificates from the Official List. Not only does this tend to put the offending CA out of business, it also causes some uncertain-but-large number of websites (most or all of which didn’t do anything wrong) to stop working. Such a drastic sanction is almost never seen to be appropriate. Browsers have hardly any good positive incentives to offer the CAs to do things right; note that EV certificates, which get special treatment in the browser UI and can therefore be sold at a premium, do come with a tighter set of CA requirements (stronger crypto, reliable OCSP, that sort of thing) which are, as far as I’m aware, followed.

Zoom out again: there’s no shortage of technical suggestions that could turn into less drastic sanctions and incentives for the CAs, but they never get implemented: why? Well, you ask me, I say it’s because both OpenSSL and NSS are such terrible code that nobody wants to hack on them, and the brave souls who do it anyway are busy chipping away at the mountain of technical debt and/or at features that are even more overdue. This, though, we know how to fix. It only takes money.