… or examining a Google automated spam penalty
Matt Cutts, a Google Engineer, explained in his private blog how off-topic and affiliate links can change the Google crawlers “priority” for a site, even deindexing it. The examples shown were quite extreme, but what surprised me was this part:
The person said that every page has original content, but every link that I clicked was an affiliate link that went to the site that actually sold the T-shirts.
Google and the other search engines are constantly changing their software and infrastructure. Google has apparently switched to a new infrastructure starting beginning 2006 and is currently working on optimizing the “settings”.
How does all of this show in the test sites? How does it show in a normal site? Do the number of indexed pages for a “spammy” site go down? Does the activity of the crawlers change? What are the other engines doing?
Site D is a normal website, with a little startup-funding in form of deep links from several external sites. It does not use Google Sitemaps, nor anything otherwise special.
There were 4 links, one to each of the 4 levels, in different parts of the site. The site structure is strictly top-down, with links from the parent to about 10 children and a link to the main URL. There are no cross-links and no links from the children to the parent (just to the main URL).
How long would you suggest it will take until a new webpage gets indexed by Google?
You might say, this depends. You’re right with that. But you can help yourself getting your webpages indexed better. One approach is to participate with Google Sitemaps - and give Google the urls to add. The people say it takes very long until you see new webpages appearing at the serps.
This article describes an example for adding a new article to enarion.
Site B is a mixture of Site A (Google Sitemaps, no links) and Site C (Adsense, GoogleBar, no Sitemaps, no links). Site B uses Google Sitemaps along with Adsense blocks - and is visited regularly by a virtual visitor using the Microsoft Internet Explorer with the GoogleBar plug-in.
Seeing that neither Site A nor Site C were indexed properly with Google, we can only assume that Site B will also not be indexed.
People who are new to the web and want to start with a website usually just put it online and hope that visitors come. With Google Sitemaps the webmaster has a way to let Google know about his site and to try to help Google find all of the pages.
I’ll just go through the other sites in the order we had them, site A now, next site B, then site D (we already covered site C) and finally site E.
Today we’re going to take a look at one of our sites, and see some of the first results from the test-sites Tobias Kluge and I started. We’re going to take a look at our site “C” - which was set up with the same general content as the other sites, and promoted to Google using only Adsense and a simulated user clicking on the site with an Internet Explorer with the Googlebar installed.
I know - everyone just wants the results of our small study - but let me give you all a small insight in the way we tested, how we set up our site and what we logged.
Domain and server setup Our domains were 6 characters, the same for all test sites, followed by an identifier for the sites. That could be something like “kwekuqA.com”,“kwekuqB.com”, “kwekuqC.com”, “kwekuqD.com”, etc.. We tested several variants of the starting characters to make sure that we use one which doesn’t have anything associated with it in Google (or at least not that much).
There are lots of ways to get indexed by Google. Using Google Sitemaps is only one way - the way that seems to be a bit trendy at the moment. “In the beginning” (June / July 2005), when Google had first introduced Google Sitemaps, it was a sure-fire way to get indexed within hours. It really worked. I bet it not only worked for us, but for lots of spammer sites, so Google had to button it down a bit.