Now it’s official: The top three search engines now support the sitemaps format Great going, Vanessa and the sitemaps team!! You’ve done great work since Summer 2005, it’s come a long way. A new standard after a bit more than a year, congratulations! Google: Search engines united (archive.org) MSN/Live: Microsoft, Google, Yahoo! Unite to Support Sitemaps (archive.org) Yahoo!: Yahoo, Google and Microsoft join forces (really !!) behind Sitemaps (archive.org) So, .
While playing with the AOL search data I came to look at the specific queries that were used to access any particular URL. This information is similar to what you have when you look at your referrer statistics: which queries were used to gain access to your site? Some background to the AOL search data (archive.org): The AOL database simplifies the queries a bit and in general contains only the words used in the query, none of the formatting, the operators (eg, “+”, “-”, etc.
Yahoo now also offers some more information in their SiteExplorer software about owned websites as Google already does. Playing a bit with the Yahoo! SiteExplorer showed a new nice button: MySites (archive.org) Hmmm… sounds interesting, doesn’t it? So I just added one of my websites and what happened now? Verification file (ever heard that from Google, eh?!) - ok, installed. Now the verification process is pending, I’m waiting for the next steps and the information that might be available.
… or examining a Google automated spam penalty Matt Cutts, a Google Engineer, explained in his private blog how off-topic and affiliate links can change the Google crawlers “priority” for a site, even deindexing it. The examples shown were quite extreme, but what surprised me was this part: The person said that every page has original content, but every link that I clicked was an affiliate link that went to the site that actually sold the T-shirts.
The Google Sitemap system just turned one year old! The “Big Daddy” infrastructure and the “Crawl Caching Proxy” look like they were made to be a perfect match for Google Sitemaps (but it is more likely the other way around). In theory, Google Sitemaps can tell the Google crawlers more about a website, even without having to crawl it. The attributes can be used to help the proxy determine when actual accesses are necessary, keeping bandwidth use on all sides at a minimum.
Google and the other search engines are constantly changing their software and infrastructure. Google has apparently switched to a new infrastructure starting beginning 2006 and is currently working on optimizing the “settings”. How does all of this show in the test sites? How does it show in a normal site? Do the number of indexed pages for a “spammy” site go down? Does the activity of the crawlers change? What are the other engines doing?
Google “Related Links” looks to be the poor man’s version of Google Adsense (meaning you don’t get money for publishing it, ha ha). Let’s take a quick first look at the way they compare (an in-depth comparison will take some time, especially since Adsense is known to adapt in the period of a few days to a week, Related-Links might do the same). How well does it work compared to Adsense?
Google Labs has released a new service: “Related Links (archive.org)”. According to Google: Google Related Links use the power of Google to automatically bring fresh, dynamic and interesting content links to any website. Webmasters can place these units on their site to provide visitors with links to useful information related to the site's content, including relevant news, searches, and pages. Wow! This is great, finally Adsense for the publishers who don’t want the hassle of specifying a bank-account for the payout.
I like watching the traffic my sites get from Googles internal network. It’s a bit of an ego-thing, I guess :-). Looking at the statistics (Google Analytics are fun) for yesterdays joke, I noticed that we had a bit of traffic from Google that was interesting. It’s normal to see them come (and a little bit of traffic comes through the Google Web Accelerator proxy with their prefetch commands), but this time it was interesting because they came with a referrer.
Site D is a normal website, with a little startup-funding in form of deep links from several external sites. It does not use Google Sitemaps, nor anything otherwise special. There were 4 links, one to each of the 4 levels, in different parts of the site. The site structure is strictly top-down, with links from the parent to about 10 children and a link to the main URL. There are no cross-links and no links from the children to the parent (just to the main URL).