Confirm that you’re using Analytics on all pages
Here’s something from my mailbox - someone wanted to know how he could crawl his site and confirm that all of his pages really have the Google Analytics tracking-code on them. WordPress users have it easy, there are plugins that handle it automatically. Sometimes it’s worth asking nicely
- let me show you how I did it. As a bonus, I’ll also show how you can check the AdSense ID on your pages, if you’re worried that you copy/pasted it incorrectly.
This is pretty much cross-platform, but as a Windows-user you’ll have to grab and install two files first:
- wget - a tool to download copies of web pages
- UnxTools - a collection of popular Unix/Linux tools for the hacker in you
Extract the ZIP files, copy the contents somewhere where you can find it and make sure that the appropriate folders are in your “path” (the files you’ll need for UnxTools are in “…\usr\local\wbin”). We’ll need to access these tools through the command line. I have a feeling I may need to elaborate on that for Windows users
— let me know if that’s the case.
First, we’ll mirror our site on our local machine (this assumes that your site is crawlable; if it isn’t, then fix it first
):
- Open a command box or terminal window (on Windows, hit Start / Run … and enter “cmd”)
- Go to or create a temporary folder
- Run the following command to mirror your site:
wget --mirror --accept=html,htm,php,asp,aspx http://domain.com/This command mirrors pages with .html, .htm, .php, .asp and .aspx extensions on http://domain.com/. It’ll create a folder for the domain and put all the files in it. Dynamic URLs will get adjusted so that they can be used as file names.
- Wait … until it’s all downloaded … if it feels endless, you might have endless URLs, perhaps an infinite calendar script or something similar? It’s worth fixing!
Alrighty, now that we have a copy of your site, let’s check things out.
Finding pages without Analytics
We can find pages without the Analytics tracking code by listing all pages which do not have certain content in them:
grep -r -L "google-analytics.com" *.*
This command goes through all subfolders (the “-r” option) and lists the files that do not contain a match (”-L”) for “google-analytics.com”. That could be extended to just about anything :).
How about pages that don’t have a “description” meta tag?
grep -r -L "meta name=.description" *.*
The “.” (period) matches any character — in this case, it is used to match the ” (double-quote).
Finding pages with AdSense (and the ID used)
Finding pages that contain a certain text is even easier:
grep -r "google_ad_client" *.*
Note that all we did was drop the “-L” (and change the text, obviously). It will show the lines that match this pattern in all of your pages, which includes the AdSense ID.
Similar to the earlier check for missing “description” meta tags, assuming you have the contents of that tag all in one line, you can easily find all of these meta tags with:
grep -r "meta name=.description" *.*
What would you like to search for today?
What’s a “command line”?
Neat! Thanks for posting! I think I’ll just go the non-Windows route for this one. For some reason, I’ve got it in my head that tools like cygwin aren’t good for windows. I base that comment off no factual information or experience using them, it’s just one of those mental hurdles I guess. Although, wget is like lightening compared to FTP clients (it seemed like that at least just now when I tried it) so I maybe I’ll give it a try on Windows.
P.s. Post more, John - please.
Great Post, John. Thank you.
I tried the download link you have for wget and it returns a blank page. Do you have an alternate link?
ofcourse i do use Analytics but never thought about this thing
Hi John,
Thanks for your advice on the webmaster forums a couple of days ago.
Do you ever update this blog?
Cheers,
Colin