HTTPS | Readings in Information Security

Today’s paper is a pilot study, looking into differences in adoption rate of HTTPS between various broad categories of websites (as defined by Alexa). They looked at raw availabilty of TLS service on port 443, and they also checked whether an HTTP request for the front page of each Alexa domain would redirect to HTTPS or vice versa. This test was conducted only once, and supplemented with historical data from the University of Michigan’s HTTPS Ecosystem project.

As you would expect, there is a significant difference in the current level of HTTPS availability from one category to another. They only show this information for a few categories, but the pattern is not terribly surprising: shopping 82%, business 70%, advertisers 45%, adult 36%, news 30%, arts 26%. (They say The relatively low score for Adult sites is surprising given that the industry has a large amount of paid content, but I suspect this is explained by that industry’s habit of outsourcing payment processing, plus the ubiquitous (not just in the adult category) misapprehension that only payment processing is worth bothering to encrypt.) It is also not surprising to find that more popular sites are more likely to support HTTPS. And the enormous number of sites that redirect their front page away from HTTPS is depressing, but again, not surprising.

What’s more interesting to me is the trendlines, which show a steady, slow, category-independent, linear uptake rate. There’s a little bit of a bump in adult and news around 2013 but I suspect it’s just noise. (The response growth over time figure (number 2), which appears to show a category dependence, is improperly normalized and therefore misleading. You want to look only at figure 1.) The paper looks for a correlation with the Snowden revelations; I’d personally expect that the dominant causal factor here is the difficulty of setting up TLS, and I’d like to see them look for correlations with major changes in that: for instance, Cloudflare’s offering no-extra-cost HTTPS support [1], Mozilla publishing a server configuration guide [2], or the upcoming Let’s Encrypt no-hassle CA [3]. It might also be interesting to look at uptake rate as a function of ranking, rather than category; it seems like the big names are flocking to HTTPS lately, it would be nice to know for sure.

The study has a number of methodological problems, which is OK for a pilot, but they need to get fixed before drawing serious conclusions. I already mentioned the normalization problem in figure 2: I think they took percentages of percentages, which doesn’t make sense. The right thing would’ve been to just subtract the initial level seen in figure 1 from each line, which (eyeballing figure 1) would have demonstrated an increase of about 5% in each category over the time period shown, with no evidence for a difference between categories. But before we even get that far, there’s a question of the difference between an IP address (the unit of the UMich scans), a website (the unit of TLS certificates), and a domain (the unit of Alexa ranking). To take some obvious examples: There are hundreds, if not thousands, of IP addresses that will all answer to the name of www.google.com. Conversely, Server Name Indication permits one IP address to answer for dozens or even hundreds of encrypted websites, and that practice is even more common over unencrypted HTTP. And hovering around #150 in the Alexa rankings is amazonaws.com, which is the backing store for at least tens of thousands of different websites, each of which has its own subdomain and may or may not have configured TLS. The correct primary data sources for this experiment are not Alexa and IPv4 scanning, but DNS scanning and certificate transparency logs. (A major search engine’s crawl logs would also be useful, if you could get your hands on them.) Finally, they should pick one set of 10-20 mutually exclusive, exhaustive categories (one of which would have to be Other) and consistently use them throughout the paper.

The Internet Measurement Conference brings us an attempt to figure out just how X.509 server certificates are being used in the wild, specifically for HTTPS servers. Yet more specifically, they are looking for endemic operational problems that harm security. The basic idea is to scan the entire IPv4 number space for servers responding on port 443, make note of the certificate(s) presented, and then analyze them.

This research question is nothing new; the EFF famously ran a similar study back in 2010, the SSL Observatory. And operational concerns about the way certificates are used in the wild go back decades; see Peter Gutmann’s slide deck Everything you Never Wanted to Know about PKI but were Forced to Find Out (PDF). What makes this study interesting is, first, it’s three years later; things can change very fast in Internet land (although, in this case, they have not). Second, the scale: the authors claim to have successfully contacted 178% more TLS hosts (per scan) and harvested 736% more certificates (in total, over the course of 110 scans spanning a little more than a year) than any previous such study.

What do we learn? Mostly that yeah, the TLS PKI is a big mess, and it hasn’t gotten any better since 2010. There are too many root CAs. There are far too many unconstrained intermediate certificates, and yet, at the same time, there are too few intermediates! (The point of intermediates is that they’re easy to replace, so if they get compromised you don’t have a catastrophe on your hands. Well, according to this paper, some 26% of all currently valid HTTPS server certificates are signed by one intermediate. No way is that going to be easy to replace if it gets compromised.) Lots of CAs ignore the baseline policies for certificate issuance and get away with it. (Unfortunately, the paper doesn’t say whether there are similar problems with the EV policies.)

Zoom out: when you have a piece of critical infrastructure with chronic operational issues, it’s a safe bet that they’re symptoms and the real problem is with operator incentives. The paper doesn’t discuss this at all, unfortunately, so I’ll throw in some speculation here. The browser vendors are notionally in the best position to Do Something about this mess, but they don’t: because the only real option they have is to delete root certificates from the Official List. Not only does this tend to put the offending CA out of business, it also causes some uncertain-but-large number of websites (most or all of which didn’t do anything wrong) to stop working. Such a drastic sanction is almost never seen to be appropriate. Browsers have hardly any good positive incentives to offer the CAs to do things right; note that EV certificates, which get special treatment in the browser UI and can therefore be sold at a premium, do come with a tighter set of CA requirements (stronger crypto, reliable OCSP, that sort of thing) which are, as far as I’m aware, followed.

Zoom out again: there’s no shortage of technical suggestions that could turn into less drastic sanctions and incentives for the CAs, but they never get implemented: why? Well, you ask me, I say it’s because both OpenSSL and NSS are such terrible code that nobody wants to hack on them, and the brave souls who do it anyway are busy chipping away at the mountain of technical debt and/or at features that are even more overdue. This, though, we know how to fix. It only takes money.

Readings in Information Security

Papers tagged ‘HTTPS’

Scandinista! Analyzing TLS Handshake Scans and HTTPS Browsing by Website Category

Analysis of the HTTPS Certificate Ecosystem