First result of our sitemaps study (old, probably outdated)

Warning: This page is pretty old and was imported from another website. Take any content here with a grain of salt.

Today we’re going to take a look at one of our sites, and see some of the first results from the test-sites Tobias Kluge and I started. We’re going to take a look at our site “C” - which was set up with the same general content as the other sites, and promoted to Google using only Adsense and a simulated user clicking on the site with an Internet Explorer with the Googlebar installed. I like this site because it did something I really didn’t expect. Here’s what a user in a forum said about this site (remember to check your referrers, folks :-)):

_

Which site made it first to the index? Site C: Same as Site B but without Google Sitemaps (just Adsense + remote user with Google-Bar)

In which time frame? No more than a day or 2.

Did all sites get indexed? Yes

Which factors were most important? Adsense - Google wants its ads to be clicked on, hence they index pages with Adsense ASAP.

Which factors the least? a remote user with a Google-Bar (ok, not a real user, but rather a remote-controlled IE-user, clicking on a schedule - 6 times daily to random URLs within the site)

Did all factors work towards getting the sites indexed? Yes.

_

Adsense is contextual advertising by Google, for those who do not know. The website owner can earn up to 50% of the income generated by the ads. In order to make sure that the displayed ads are interesting for the user, Google first checks the site. It then displays ads that match the content of the pages where they are displayed. So if you have a site about cars, you’ll mostly have car-related ads shown. When a site displays Adsense blocks, Google will have usually looked at the site beforehand. Google wants to make more money (who doesn’t?) and - as the user mentioned - perhaps Google will use that to help push the site? Could that be a trick to be indexed faster?

The GoogleBar is a plugin / extension to some browsers made by Google. It offers the user many functions that are based on Googles sites. Some people whisper that the GoogleBar is also used to measure the importance of sites, ie those sites that are visited more often (by users using the GoogleBar) will also get indexed faster and listed higher in the search results. However, that is unconfirmed. (A similar system, from Alexa, does measure the “importance” and the number of visitors of a website.)

The site did not use Google Sitemaps and did not have any inbound links at all. It did have several outbound links, though. The site was not submitted to any search engines.

So, did it work?

No.

At the end of out study (after 3 months), the site was not indexed at all on Google.

Bummer.

The site was visited often by the Adsense-Bot. However, even though it probably runs on the same servers, it has its own database. It uses the user agent “Mediapartners-Google/2.1”. It visited all of the 911 URIs within the site, which means our virtual visitor was busy and accessed all the URIs. It also checked the robots.txt several times. No other Google-Bot even came to take a look.

Other visitors of interest (there were no “human” visitors other than our simulated one) were:

  • a spambots testing existing known pages: it checked default.aspx (non existant),

  • 2 rounds of visits by “Cyveillance” using the user agent “Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+XP)” - a bot acting like a human visitor, found by checking the IP-address against whois using samspade.org’s super service (archive.org). You can find more about Cyveillance at http://cyveillance.linuxgod.net/ (archive.org) (among lots of others); apparently it’s a bot from the RIAA and MPAA to search for copyrighted digital audio files. They came end of September and end of October. They crawled two layers of the site.

  • twice by the Netcraft Web server survey (user agent “Mozilla/4.0+(compatible;+Netcraft+Web+Server+Survey)") - I assume they try to check each domain to see which type of server it’s running on automatically.

  • once by totaldomaindata.com (no idea why), using user agent “Mozilla/5.0+(X11;+U;+Linux+i686;+en-US;+rv:1.7.5)+Gecko/20041107+Firefox/1.0”

  • 8 times by the user agent " SurveyBot/2.3+(Whois+Source)” coming from www.whois.sc (archive.org) - again I assume to check the server status, it checked the “/” (home page) and the robots.txt file.

Then we come to one interesting visitor:

User agent “Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp)” - yep: the Yahoo! crawler.

It crawled a total of 828 times in 9 weeks, starting about 3 weeks after the site went live. It crawled parts of all three levels (but only 46 unique URIs + robots.txt several times). 41 URIs are indexed on Yahoo! after our test period. The site hit position 6 in the search results on Yahoo! (preceded by our other test sites… oops, did I give anything away?). Other than the first visit by whois.sc (a day after the site went live, ie a day after the domain name was bought), all the other bots came after the Yahoo! crawler.

10 Points and a virtual beer (or whatever the personal equivalent would be) to the first person who can tell us why Yahoo! indexed this site and where it got the information from. Anyone?

So, what did we learn from this test site? Adsense and Googlebar don’t make a difference when it comes to indexing in Googles search engine. I expected that, I think it is already well known. But as the comment on that other forum shows, some things that are well known about search engines, aren’t always believed to be true :-).

Warning: This page is pretty old and was imported from another website. Take any content here with a grain of salt.

Comments / questions

There's currently no commenting functionality here. If you'd like to comment, please use Twitter and @me there. Thanks!

Tweet about this - and/or - search for latest comments / top comments

Related pages