How effective is blacklisting?

How effective is blacklisting?
Which blacklists are the best?
Which ports are scanned on the internet?

These were some of my questions before setting out on this research project.

I decided to compare 3 of my favorite free blacklists:

abuseipdb.com‘s top 10,000 IP’s. (Requires free registration)
isc.sans.edu‘s Reasearch netset. (No registration) The intended use of SANS lists are not really for blacklisting, but for research, therefore DNS servers that were abused in DDoS attacks frequently appear on this list. Use with caution.
isc.sans.edu’s top 10,000 IP’s

Methodology

I set up a fresh network telescope facing the public internet. A network telescope is a computer that does not respond to any requests that are sent to it, but allows traffic to enter so that it can be analysed, similar to what an actual telescope does with light.

I used iptables combined with ipsets and custom log file entries to gather the data.

# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-A INPUT -m set --match-set exclude src -j ACCEPT
-A INPUT -j LOG --log-prefix "[Entry-r3kt] "
-A INPUT -m set --match-set sansResearch src -j LOG --log-prefix "[r3kt-sansResearch] "
-A INPUT -m set --match-set sans10k src -j LOG --log-prefix "[r3kt-sans10k] "
-A INPUT -m set --match-set abuseIpDb src -j LOG --log-prefix "[r3kt-abuseIpDb] "
-A INPUT -m set --match-set sansResearch src -j DROP
-A INPUT -m set --match-set sans10k src -j DROP
-A INPUT -m set --match-set abuseIpDb src -j DROP
-A INPUT -j LOG --log-prefix "[Exit-r3kt] "
-A INPUT -j DROP

The very first rule was to ACCEPT any traffic originating from: (1) my remote client, (2) local networks, and (3) multicasts from the Cloud Service Provider. Since traffic from these sources was expected/wanted they were set to immediately be accepted and won’t be evaluated by the rules further down the chain, the result is that this wanted traffic won’t be captured in the log file. Since there were no links on the internet advertising services on this IP (to my knowledge) all other traffic entering the telescope was assumed to be undesired.

Then, all of the traffic that is not expected was immediately logged with the tag “[Entry-r3kt]”.

After the initial “check-in” every packet’s source address was compared to the 3 blacklist ipsets. For every match, a log-entry was written as “r3kt-<setname>”. Packets that matched any of the sets were dropped after it was evaluated by all 3 sets.

The only packets still left are the ones that were not caught by the 3 blacklists, so all of these packets were logged with the tag “[Exit-r3kt]” indicating packets that would have been able to make a connection, then they were also dropped. (Successful connection attempts regularly triggers more aggressive scanning).

The result of this setup is that:
– (from the perspective of the public internet) this machine is completely unresponsive
– expected/desired traffic will not poison the data
– every blacklist has a fair chance to evaluate incoming packets.

A copy of the raw log file can be obtained here.

The results

I let the machine ‘listen’ for exactly 24 hours and here are the results:

The total number of undesired packets received:

# cat r3kt.log | grep "Entry" | wc -l
14697

A total of 14697 undesired packets in 24 hours equate to roughly 10 packets per minute or one every 6 seconds

The top 20 ports scanned for:

# cat r3kt.log | grep "Entry" | egrep -o 'DPT=\S*' | sort | uniq -c | sort -r | head -n 20 | cat -n
    1 587 DPT=22
    2 273 DPT=23
    3 204 DPT=6379
    4 200 DPT=80
    5 164 DPT=8088
    6 95 DPT=443
    7 86 DPT=445
    8 82 DPT=5060
    9 76 DPT=2375
    10 72 DPT=2376
    11 61 DPT=8080
    12 56 DPT=3389
    13 52 DPT=389
    14 48 DPT=1433
    15 44 DPT=123
    16 40 DPT=81
    17 39 DPT=53
    18 33 DPT=9530
    19 32 DPT=9200
    20 32 DPT=8443

587 Of the requests (4.0%) were sent to Port 22. This is almost double the second most popular port, Port 23. A surprise entry, port 6379, came in 3rd place.

The top 20 IPS generating unwanted traffic

# cat r3kt.log | grep "Entry" | egrep -o 'SRC=\S*' | sort | uniq -c | sort -r | head -n 20 | cat -n
    1     6149     SRC=79.124.62.86
    2      130     SRC=162.142.125.159
    3      127     SRC=162.142.125.152
    4      124     SRC=162.142.125.149
    5      122     SRC=162.142.125.155
    6      121     SRC=162.142.125.147
    7      120     SRC=162.142.125.157
    8      111     SRC=162.142.125.153
    9      111     SRC=162.142.125.144
    10     109     SRC=162.142.125.150
    11     108     SRC=162.142.125.145
    12     105     SRC=162.142.125.151
    13     104     SRC=162.142.125.154
    14     104     SRC=162.142.125.146
    15     102     SRC=162.142.125.148
    16     100     SRC=162.142.125.156
    17      83     SRC=162.142.125.158
    18      66     SRC=45.61.186.148
    19      55     SRC=45.143.203.15
    20      53     SRC=92.63.197.86

One IP was particularly aggressive sending 41.8% of all requests. A closer look at the log file showed evidence of a syn sweep. Probing started immediately and continued until the end, suggesting that the sweep was already in progress before I started logging. The only blacklist that matched this IP was the sans10k ipset.

# whois 79.124.62.86
...
inetnum: 79.124.62.0 - 79.124.62.255
netname: CLOUDVPS-NET
...
org-name: Internet Solutions & Innovations LTD.
...
address: National Cultural Centre 865 P.O. Box 1494, Victoria Mahe, Seychelles
...

Then there is also a specific subnet that was very aggressive. Cencys claims to be a group of “security researchers” based in the USA. Unlike their “research”, mine generates no unwanted log file entries on unsuspecting machines.

# whois 162.142.125.0
...
NetRange: 162.142.125.0 - 162.142.125.255
...
Organization: Censys, Inc. (CENSY)
...

Since it is clear that some sources generate a lot more traffic than others, it is best to look at the data from a “per source address” rather than a “per connection attempt” perspective.

The total amount of sources that sent undesired traffic:

# cat r3kt.log | egrep -o 'SRC=\S*' | sort | uniq | wc -l
2399

Amount of sources not caught by the firewall:

# cat r3kt.log | grep 'Exit' | egrep -o 'SRC=\S*' | sort | uniq | wc -l
1156

Amount of sources caught by all the blacklists combined:

# cat r3kt.log | grep 'abuseIpDb\|sans10k\|sansResearch' | egrep -o 'SRC=\S*' | sort | uniq | wc -l
1243

1243 Out of a total of 2399 is an effective block rate of 51.8%. Not bad, although I was expecting more. If I were to include the host that sent 6000+ requests, this block rate would have been disproportionally high and not a fair representation of the effectiveness of these blacklists.

Amount of sources caught by each blacklist individually:

# cat r3kt.log | grep 'sansResearch' | egrep -o 'SRC=\S*' | sort | uniq | wc -l
415

# cat r3kt.log | grep 'sans10k' | egrep -o 'SRC=\S*' | sort | uniq | wc -l
567

# cat r3kt.log | grep 'abuseIpDb' | egrep -o 'SRC=\S*' | sort | uniq | wc -l
986

– The sansResearch blacklist managed to detect 17.2% of unwanted traffic and came in third place. In its defense, it is the smallest blacklist containing only the netsets of the networks that operate under the guise of “security research” and bothered to register.

– The sans10k blacklist managed to catch a respectable 23.6% of unwanted traffic and came in second place.

– The clear winner (in this case) was abuseIpDb that managed to catch 41.1% of unwanted traffic. Well done.