Ignoring Google Analytics Ghost Spam
One of the first things I usually do when I start working with a client is to make sure their site is setup for both Google Webmaster Tools and Google Analytics. The former is a tool for monitoring your site’s index status and getting notified when Google has problems, and the latter is a tool for tracking visitors. In this post I wanted to touch on the latter and one of my major pet peeves with the service: Google Analytics ghost spam.
What Is Google Analytics?
Back to the point, as a user clicks links on your site, Google Analytics accumulates a lot of useful data about your site:
- Which pages are requested most often.
- The source of your visitors (direct, search engine, social media, links from other sites, etc).
- Breakdown of visitors by device, browser, country, etc.
All of this taken in aggregate can be incredibly valuable for web masters to know how their site is performing and better understand which content is working and which content needs attention. For example, knowing that 70% of your visitors use a mobile device is a good indication that your site should be responsive. Or knowing that 80% of your visitors stop browsing when they see your pricing page probably tells you that your service or products are priced too high.
What is Google Analytics Ghost Spam?
And that’s exactly how ghost spammers work. They randomly generate tracking IDs, hoping that at least a few of them are legitimate IDs for real sites that Google knows about, and pummel Google with false data. In particular, the false data includes a fake referrer that points to a site they want you to visit. Take a look at the screen snap below for my own site:
You can see a variety of junk URLs that claim to be sending traffic to my site. In reality, these sites have never even heard of my business or visited my page. They’re just spammers sending fake tracking data to Google, and they happen to have randomly generated the same tracking ID Google assigned to me. All of this is done in hopes that me, as the admin, will see these URLs and my natural human curiosity will take over and make me click those links. Most of these sites are probably benign and just trying to sell something, but some could be a little more scary and actually try to harm your computer. As with most things in life, don’t touch it if you’re not 100% sure what it is your getting into.
And of course the worst part of this is that is makes it hard to truly track your real visitors and see how they’re using your site. Most of the spam page views are likely for your home page (i.e. “/”) because the spammers don’t really visit your site and therefore do know your links. But it still gives you a false view of page views, sessions, users, etc.
How To Fix It
The good news is that you don’t need to worry that your site has been hacked or compromised. These spammers never visit your site, and they haven’t taken it over. The bad news is that you cannot prevent this. You have no control over what some computer on the internet sends to Google. Your only recourse is to ignore it, which turns out to be pretty simple with filters in Google Analytics.
Notice the hostname column in the image above. Only one of them is for my site, and the rest are either junk or “not set”. You can pretty much sure that if the data didn’t come from your own host name it’s safe to ignore. There could be some exceptions with third party sites you use, such as shopping carts, etc. If the host name looks like a legitimate service you have setup for your site, you probably don’t want to ignore it.
To setup the filter, you’ll use a regular expression (RegEx for the cool kids) to tell analytics which data to ignore based on that column. Remember to always include your own site and third party site you expect traffic from. Below is sample image of what the filter looks like for site:
A few things to take note of. First, I only include my domain, and not the “www” portion. If I had multiple sub-domains that regular expression would match them all (i.e. www.elvtn.com, blog.elvtn.com, etc). Second, notice that right before the dot character I put a slash. This is necessary because the dot character normally has special meaning to regular expressions, so the slash indicates you want to literally match a “.” character in the hostname.
That’s it. Apply the filter to all your views and data and you’ll get a nice clean view of real, actual human visitors to your site.
Ghost spam can be a nuisance when it comes to tracking visitors and behavior, but rest assured your site has not been hacked or compromised. You’ve just fallen victim, like millions of other Google Analytics users, to spammers trying to get you to visit their site. Now you know how to safely ignore them and get a true view of who is visiting your site.