reederz

Sun 23 September 2018

Blacklist Referer Spam Bots with NGINX

Posted by Justas Ažna in archive   

Originally posted on 2015-04-22 at fadeit.dk/blog which is no longer available

Some Background

Recently, we submitted fadeit.dk to AWWWARDS - the awards for design, creativity and innovation on the Internet. It gave us a nice little boost of website visitors. However, not all of that attention was positive…

Referer Spam Bots

While the majority of our traffic came from genuine sources, we started noticing a pattern in our referral traffic.

Google Analytics Dashboard - Acquisition / Referrals

What’s up with social-buttons.com? We didn’t sign up for that… Apparently, this is something called Referer Spam*:

Referrer spam (also known as log spam or referrer bombing) is a kind of spamdexing (spamming aimed at search engines). The technique involves making repeated web site requests using a fake referer URL to the site the spammer wishes to advertise. Sites that publish their access logs, including referer statistics, will then inadvertently link back to the spammer’s site. These links will be indexed by search engines as they crawl the access logs. - Wikipedia

This is not cool, Mr. social-buttons.com. P**s off!

* No, I didn’t mispell. The mispelling actually made it into the HTTP/1.0 standard and now it’s there forever :)

NGINX Solution

Since we’re using NGINX to serve our site, the solution is going to be described for NGINX. Apache people can take a look at this excellent article.

The official NGINX wiki does mention a solution to this problem. Basically, you just use ngx_http_referer_module* and add something like this to your location or server block:

valid_referers none blocked server_names *.social-buttons.com social-buttons.com badreferer2.com;

if ($invalid_referer) {
  return   444;
}

This works, but what if we want to maintain a larger blacklist of referers? Our valid_referers directive would get crazy long. If that’s fine with you, you can stop reading here. It sure isn’t fine with me :).

In order to make our blacklist more maintainable, we can use ngx_http_map_module. Let’s save /etc/nginx/conf.d/blacklist.conf file with the following content:

# /etc/nginx/conf.d/blacklist.conf

map $http_referer $bad_referer {
    hostnames;

    default                           0;

    # Put regexes for undesired referers here
    "~social-buttons.com"             1;
    "~semalt.com"                     1;
    "~kambasoft.com"                  1;
    "~savetubevideo.com"              1;
    "~descargar-musica-gratis.net"    1;
    "~7makemoneyonline.com"           1;
    "~baixar-musicas-gratis.com"      1;
    "~iloveitaly.com"                 1;
    "~ilovevitaly.ru"                 1;
    "~fbdownloader.com"               1;
    "~econom.co"                      1;
    "~buttons-for-website.com"        1;
    "~buttons-for-your-website.com"   1;
    "~srecorder.co"                   1;
    "~darodar.com"                    1;
    "~priceg.com"                     1;
    "~blackhatworth.com"              1;
    "~adviceforum.info"               1;
    "~hulfingtonpost.com"             1;
    "~best-seo-solution.com"          1;
    "~googlsucks.com"                 1;
    "~theguardlan.com"                1;
    "~i-x.wiki"                       1;
    "~buy-cheap-online.info"          1;
    "~Get-Free-Traffic-Now.com"       1;
}

Now add conditions to the sites, for which you want to block referer spam bots:

# /etc/nginx/sites-enabled/mysite.conf

server {
  # ...

  if ($bad_referer) {
    return 444;
  }

  # ...
}

OK, now let’s test if this thing works:

# with subdomain
 $ curl --referer http://www.social-buttons.com https://fadeit.dk/en
curl: (52) Empty reply from server

# without subdomain
 $ curl --referer http://social-buttons.com https://fadeit.dk/en
curl: (52) Empty reply from server

Sweet! It worked.

* Both ngx_http_referer_module and ngx_http_map_module are included in the standard NGINX distribution and you don’t need to recompile your server.

That’s it!

What’s your experience with Referer Spam? Don’t hesitate to use the comment section :)

Additional Resources

Comments