robots.txt

I noticed there’s a bit of confusion on how to tweak a complex robots.txt file (aka longer than two lines :)). We have awesome documentation (of course :)), but let me pick out some of the parts that are commonly asked about: Disallowing crawling doesn’t block indexing of the URLs. This is pretty widely known, but worth repeating. More-specific user-agent sections replace less-specific ones. If you have a section with “user-agent: *” and one with “user-agent: googlebot”, then Googlebot will only follow the Googlebot-specific section.

robots.txt »

hreflang canonical

For those of you using hreflang for international pages: Make sure any rel=canonical you specify matches one of the URLs you use for the hreflang pairs. If the specified canonical URL is not a part of the hreflang pairs, then the hreflang markup will be ignored. In this case, if you want to use the “?op=1” URLs as canonicals, then the hreflang URLs should refer to those URLs. Alternately, if you want to use just “/page”, then the rel=canonical should refer to that too.

hreflang canonical »

Add rows to a Google Spreadsheet with Python, without API

Sometimes it’s a hassle to track auth data for the Google Spreadsheet API. Here’s a quick hack using Google Forms to post data to a Spreadsheet (similar to the previous post that uses Curl). You can use it as a function in your code, or as a simple command-line tool. Gist (archive.org) #!/usr/bin/python """Posts to a Google Sheet using a Form""" import re import sys import urllib import urllib2 def get_field_ids(form_url): """Returns list of field IDs on the form.

Add rows to a Google Spreadsheet with Python, without API »

Using Curl to add rows to a Google Spreadsheet without using an API

Adding content to a Google Spreadsheet usually requires using the Spreadsheet API (archive.org), getting auth tokens, and tearing out 42 pieces of hair or more. If you just want to use Google Spreadsheets to log some information for you (append-only), a simple solution is to use a Google Form (archive.org) to submit the data. To do that, you just need to POST data using the field names, and you’re done. The data is stored in your spreadsheet, you even get a timestamp for free.

Using Curl to add rows to a Google Spreadsheet without using an API »

Totally simple syntax highlighting in Blogger

Sometime you don’t need to host code, you just want to post it in a blog post. Google Code Prettify (archive.org) does this really well, either per post, or across the blog. 1. Copy the script tag. Here’s what you need to copy into either your template, or into your post: <script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js" ></script> 2. HTML-encode your code There are a bunch of HTML encoders (archive.org) online, I haven’t found one that I’m really a fan of.

Totally simple syntax highlighting in Blogger »

429 or 503

Here’s one for fans of the hypertext HTTP protocol – should I use 429 or 503 when the server is overloaded? It used the be that we’d only see 503 as a temporary issue, but nowadays we treat them both about the same. We see both as a temporary issue, and tend to slow down crawling if we see a bunch of them. If they persist for longer and don’t look like temporary problems anymore, we tend to start dropping those URLs from our index (until we can recrawl them normally again).

429 or 503 »

TLDs & ranking

It feels like it’s time to reshare this again. There still is no inherent ranking advantage to using the new TLDs. They can perform well in search, just like any other TLD can perform well in search. They give you an opportunity to pick a name that better matches your web-presence. If you see posts claiming that early data suggests they’re doing well, keep in mind that’s this is not due to any artificial advantage in search: you can make a fantastic website that performs well in search on any TLD.

TLDs & ranking »

Mobile friendly

I’ve been asked about the mobile-friendly tag in search and noticed two common mistakes that I wanted to share. Both of these result in the Mobile-Friendly Test showing that a page isn’t mobile-friendly, but the PageSpeed Insights tool showing that it’s ok: Too much blocked by robots.txt. Googlebot needs to be able to recognize the mobile-friendliness through crawling. If a JavaScript file that does a redirect is blocked, if a CSS file that’s necessary for the mobile version of the page is blocked, or if you use separate URLs and block those, then Googlebot won’t be able to see your mobile site.

Mobile friendly »

Authorship

I’ve been involved since we first started testing authorship markup and displaying it in search results. We’ve gotten lots of useful feedback from all kinds of webmasters and users, and we’ve tweaked, updated, and honed recognition and displaying of authorship information. Unfortunately, we’ve also observed that this information isn’t as useful to our users as we’d hoped, and can even distract from those results. With this in mind, we’ve made the difficult decision to stop showing authorship in search results.

Authorship »