Bots that impersonate Googlebot

Anyone can act like a bot just by using the Googlebot useragent in a request. Sometimes crawlers do that to see what other bots might see. Sometimes it’s to circumvent robots.txt directives that apply to them, but not to Googlebot. Sometimes people hope to get a glimpse at cloaking. Whatever the reason, these kinds of requests can be annoying since they make log file analysis much harder. Motivation for this excursion:

Staging Site indexing

So you got your staging site indexed? Happens to everyone. Here’s a rough guide on fixing it, and suggestions for preventing it. (thought I’d write this up somewhere) The fastest way to get the staging site removed from search is remove it via Search Console. For that, you need to verify ownership via Search Console [1] (ironically, this means you’ll likely have to make it accessible to search engines again, or figure out DNS verification, which isn’t that common but also not that hard).

Trailing slash or not

The “trailing-slash or not” question comes up from time to time, so I thought I’d write something short up. tl;dr: the slash after a hostname or domain name is irrelevant, you can use it or not when referring to the URL, it ends up being the same thing. However, a slash anywhere else is a significant part of the URL and will change the URL if it’s there or not. This is not SEO-specific, but just how websites work :).

Practically dealing with recurring/updated items

“When people search for our event, they find last year’s listing; help!” Whether it’s an event (FooBarConf 2017), a recurring report (FooBar Earnings Q1 2017), an updated product (FooBarPhone 23) or anything else that has a current version and previous versions, this is a really simple way to help make sure that search is able to easily find the current version. Place the current version on a generic, non-versioned URL (/foobarconf) Copy last year’s version onto a versioned URL (/foobarconf/2016) Browse Twitter or Google+


How to make JavaScript-based sites work well with search is something I’ve been getting asked about a bit, so in the spirit of having open discussions, I set up a public working-group to discuss how things are working out :) Feel free to join and/or send folks there!!forum/js-sites-wg My goal is to figure out how things are working out for sites at the moment, what tricks they’re doing to “fix” search, and what we need to change or document on our side.

HTML validation & SEO

HTML validation and Google’s web-search … validation isn’t necessary for crawling, indexing, or ranking, as we’ve said [1] many times in the past. However, if you have problems with meta-tags, structured data, or link-elements (for example, if your hreflang markup isn’t being picked up properly), then using a HTML validator can sometimes point you at problems in your markup. In a recent example, a site was accidentally including a banner in the HEAD of a page.

HTTP/2 & You

HTTP/2 & You? I occasionally hear webmasters ask about HTTP/2 and their site + web-search. The good news is that HTTP/2 doesn’t change the core concepts of HTTP, it doesn’t change the URLs, and is transparently supported for users & crawlers if they ask for it. Its primary differences focus on improved performance. Your hoster could add support for HTTP/2 and you might not even notice – who knows, maybe they support it already :).

crawl budget & 404s

I have a large site and removed lots of irrelevant pages for good. Should I return 404 or 410? What’s better for my “crawl budget”? (more from the depths of my inbox) The 410 (“Gone”) HTTP result code is a clearer sign that these pages are gone for good, and generally Google will drop those pages from the index a tiny bit faster. However, 404 vs 410 doesn’t affect the recrawl rate: we’ll still occasionally check to see if these pages are still gone, especially when we spot a new link to them.


HTTPS & HSTS: 301, 302, or 307? If the combination of these letters & numbers mean anything to you, you might be curious to know why Chrome shows you a 307 redirect for HSTS pages. In the end, it’s pretty easy. After seeing the HTTPS URL with the HSTS header (for example, with any redirect from the HTTP version), Chrome will act like it’s seeing a 307 redirect the next time you try to access the HTTP page.

Mobile sites suck

I was using my phone more on the weekend, and your mobile-friendly sites blew me away. Way too many of these are just horrible. Subscription interstitials, app interstitials, browser popups asking for my location, impossible to fill out search forms, login interstitials, tiny UI elements, cookie & age interstitials, “you’re in the wrong country, idiot” interstitials, full-screen ads, “add to homescreen” overlays, etc. One - popular & well-known - site had four levels of popups/overlays on a page.