Over the years there have been several attempts to build anonymous publication or distributed anonymous storage systems—usually they start with a peer-to-peer file sharing protocol not entirely unlike BitTorrent, and then build some combination of indexing, replication, encryption, and anonymity on top. All have at least one clever idea. None has achieved world domination. (We’ll know someone’s finally gotten it right when the web browsers start shipping native support for their protocol.)

Tangler is a relatively old example, and its one clever idea is what they call document entanglement. To understand document entanglement you have to know about something called $k$-of-$n$ secret sharing. This is a mathematical technique that converts a secret into $n$ shares. Each share is the same size as the original secret, possibly plus a little overhead. Anyone who has a copy of $k$ of those $n$ shares can reconstruct the original secret, but if they have even just one fewer, they can’t. $k$ and $n$ can be chosen arbitrarily. Secret sharing is normally not used for large secrets (like an entire document) because each share is the same size as the original, so you’ve just increased your overall storage requirement $n$ times—but in a distributed document store like Tangler, you were going to do that anyway, because the document should remain retrievable even if some of the peers holding shares drop out of the network.

Document entanglement, then, is secret sharing with a clever twist: you arrange to have some of the $n$ shares of your document be the same bitstring as existing shares for other documents. This is always mathematically possible, as long as fewer than $k$ existing shares are used. This reduces the amount of data added to the system by each new document, but more importantly, it makes the correspondence between shares and documents many-to-many instead of many-to-one. Thus, operators can honestly say they do not know which documents are backed by which shares, and they have an incentive not to cooperate with deletion requests, since deleting one document may render many other documents inaccessible.

I am not convinced entanglement actually provides the security benefit claimed; deleting all $n$ of the shares belonging to one document should cause other documents to lose no more than one share and thus not be permanently damaged. (The originators of those documents would of course want to generate new shares to preserve redundancy.) It is still probably worth doing just because it reduces the cost of adding new documents to the system, but security-wise it’s solving the wrong problem. What you really want here is: server operators should be unable to determine which documents they hold shares for, even if they know the metadata for those documents. (And yet, somehow, they must be able to hand out the right shares on request!) Similar things are possible, under the name private information retrieval, and people are trying to apply that to anonymous publication, but what I said one really wants here is even stronger than the usual definition of PIR, and I’m not sure it’s theoretically possible.

Commercial intrusion detection systems are designed for corporate networks; they almost always assume a small number of choke points between internal and external networks, and often they also assume centralized control of all the devices on the internal network. Neither assumption is valid for a peer-to-peer overlay network, where there are typically a large number of mutually distrusting human agencies operating a small number of network peers each, and the routes between them are diverse.

It might seem that in the peer-to-peer environment, each node would have no choice but to run its own IDS. However, if we are willing to assume some degree of trust vested in other node operators, perhaps the task could be delegated. That’s the germ of this paper. For an idealized peer-to-peer network, they derive a game-theoretically optimal strategy for rotating the job of running the IDS around all the super-peers (long-lived nodes with extra responsibilities; many real P2P networks have such nodes).

I like the concept, but the idealized scenario they used may be too idealized to be applicable in real life. Key assumptions which probably don’t hold include:

• The attacker does not already control any super-peers.
• The IDS is perfect: that is, if attack traffic passes through a node running an IDS, the attack will be detected and blocked.
• The attacker’s goal is to take control of, or deny availability of, a specific set of super-peers.
• The defender can predict in advance which nodes will be attacked. (I would accept this if it were probabilistic, e.g. assuming that the attacker is more likely to target nodes that contribute more to to the overall network capacity.)

I think a more realistic model would go something like this: The attacker is assumed already to control some fraction of the super-peers. (The attacker may also mount attacks from other computers, either independently or in collaboration with malicious super-peers.) The attacker seeks to avoid detection, and so does not mount overt attacks on other super-peers; instead, it has some strategy for violating the protocol to achieve an adversarial goal (e.g. forging blockchain transactions, deanonymizing users, delivering false data to users) The malicious peers execute the protocol honestly most of the time, but sometimes break the rules. The defender’s goal is to detect peers that are violating the protocol often enough that this can’t be an accident, while not wasting too many resources on monitoring overhead.

Note: This paper is said to have been published in the International Conference on Secure Knowledge Management in Big-data era, 2014 but I cannot confirm this, as the conference website no longer exists and the Internet Archive’s copy does not include a program.