Background
I’ve a long standing interest in the state of the Tor network. In 2015 I started OrNetRadar to help detect new relay groups and possible Sybil attacks that could pose a risk to Tor users. In 2017 I was asked to join a closed Tor Project mailing list to help confirming reports of malicious Tor relays — a list where I previously submitted suspicious relays to. Soon after joining that list I suggested some improvements but things didn’t change since then. Even though I’m on that list since then, the decision process to get relays actually removed happens elsewhere (dir-auth) and remains opaque to me.
In April 2018 a Tor core member — the most active Tor Project person on that closed mailing list — made an attempt to initiate a “do not do” relay requirements list to improve and streamline the handling of malicious Tor relay reports. (I’m not mentioning his name since he does not want to be publicly associated with bad-relays handling for safety reasons.)
Unfortunately also this attempt failed since no Tor directory authority operator answered. (Tor directory authorities are required to enforce any Tor network wide rules unless it is part of the tor code itself.)
Starting with June 2019, after multiple reports about suspicious relays remained with no reaction I stopped sending them to the list. Occasionally I sent some suspicious relay groups to the public tor-talk mailing list instead — which ironically was more fruitful.
The Discovery
In autumn 2019 I stumbled on something odd: Tor relays doing something that the official tor software is unable to do. This is intentionally vague to avoid giving away the detection methodology to the adversary. This new detection technique combined with other methods resulted in the discovery of a large long-running suspicious relay group. For the first time since looking for malicious relays an actually worrisome and non-trivially discoverable group that has been running relays since before 2017 was uncovered (initially, at least some of their relays).
At their peak they reached >10% of the Tor network’s guard capacity. A guard relay is the first relay in the chain of 3 Tor relays forming a circuit and the only relay seeing the Tor user’s real IP address, but not seeing the destination accessed by the user.
To give you a feeling about their size in relation to other known operators:
The biggest known guard relay operator as of 2019–12–08 is bellow 2% guard capacity.
After reporting them to the Tor Project they got removed (the once I knew about initially), but it did not take them long to setup new relays soon after.
Until this day (2019–12–08) they are actively running high bandwidth relays on the Tor network. Due to the sheer size of this particular adversary I had some hope that this discovery would act as a wake-up call and finally spark some improvements, unfortunately it did not so far.
Why didn’t we detect them earlier?
Initially their capacity was somewhat limited and most of their capacity got added in the course of the past year but a year is still a very long time for detection.
To avoid detection they spread their relays across multiple hosting providers and added them relatively slowly over a long period of time.
They make use of the biggest Tor hosters (OVH and Hetzner) to blend in with the rest, but they also make use of hosters rarely seen before they joined (i.e. AS20860). In fact their relays made the autonomous system “Iomart Cloud Services” (AS20860) so big, it is now the 6th biggest ASN by guard capacity on the Tor network:
Whenever I retroactively find out about malicious relays I look into
OrNetRadar archives to see if it triggered on them when they initially joined
the Tor network and it actually did in multiple cases, here are a few examples:
- 2017–09–15 OVH
- 2018–03–05 OVH
- 2018–12–19 Hetzner (the last 4 entries in the table)
- 2018–12–30 Iomart Cloud Services
Unnatural growth
Everyone can add as many relays as they like, technically there are no restrictions. This helps the Tor network grow and the design of the Tor network even allows for (some) malicious relays without resulting in total
loss of Tor’s privacy properties.
The Tor network is basically run under the assumption that no one is actually exploiting the unrestricted possibility to add new relays or that we can at least detect Sybil attacks and remove them before they cause (too much) harm. The past year has shown that it is not only possible to exploit this but it is actually being exploited without sufficient detective countermeasures being in place.
The open mode of operation was fine until probably somewhere around 2017 but after that so much non-attributable guard capacity was added that it no longer appeared like a natural growth. Two and a half years ago Roger Dingledine gave a talk at DEFCON 25 where he mentions that he knows 2/3 of the Tor network by capacity. This was probably somewhat true back then but the Tor network changed significantly since then.
The graph bellow shows the significant guard-only capacity growth in Gbit/s during the past years
while the number of relays actually decreased:
While the Tor network capacity significantly increase the possibility to attribute relays to operators significantly decreased: In the last year the amount of guard capacity with no ContactInfo increased from < 30% to >45%. Most of this can probably be attributed to the discovered Sybil since they had no ContactInfo. The graph bellow shows the amount of guard probability that has no ContactInfo over the past 3 years.
Reducing Risk
There is no real “solution” for malicious relays due to the open design but risk reduction is still possible. Tor directory authorities can consider new relay groups without any MyFamily and ContactInfo as “do not do” violations (April 2018 discussions) and make it a bit more time consuming for adversaries to add huge amounts of Tor capacity.
The tor manual also states:
ContactInfo must be set to a working address if you run more than
one relay or bridge.If you run more than one relay, the MyFamily option on each relay
must list all other relays
Due to the involved asymmetry (attacker can automate everything and require no manual effort vs. detection is hard and time consuming) the Tor Project could consider new requirements for the guard, hsdir flag and exit flag. Some ideas:
- require a non-empty ContactInfo to startup as relay
- require verified ContactInfo for the guard, hsdir and exit flags
- design an automated system to verify ContactInfo once a year (it should allow for non-public/encrypted ContactInfos)
- require all Tor relay guards added after 2017–01–01 to go through the automated verification process
I have no illusion that this will deter determined attackers, since they have demonstrated already that they are willing to put in extra effort to put usable email addresses into their ContactInfo field but it increases their required effort.
So why am I writing this?
I believe the first step towards an improvement is realizing there is a problem
and this blog post is trying to raise awareness for malicious relay activity on the Tor network. I also think it is not sufficient to throw more resources at the problem without changing the rules of the game (namely new relay requirements for certain flags, especially the guard flag).