It’s useful to understand the differences between the common kinds of redirects, so that you know where to use them (and can recognize when they’re used incorrectly). Luckily, when it comes to Google, we’re pretty tolerant of mistakes, so don’t worry too much :). In general, a redirect is between two pages, here called R & S (it also works for pages called https://example.com/filename.asp , or pretty much any URL). Very simplified, when you call up page R, it tells you that the content is at S, and when it comes to browsers, they show the content of S right away.
Planning on moving to HTTPS? Here are 13 FAQs! What’s missing? Let me know in the comments and I’ll expand this over time, perhaps it’s even worth a blog post or help center article. Note that these are specific to moving an existing site from HTTP to HTTPS on the same hostname. Also remember to check out our help center at https://support.google.com/webmasters/answer/6073543 # Do I need to set something in Search Console?
We call a URL a soft-404 when it is essentially a placeholder for URLs that no longer exist, but doesn’t return 404. Using soft-404s instead of real 404s is a bad practice, and it makes things harder for our algorithms – and often confuses users too. We’ve been talking about soft-404s since “forever,” here’s a post from 2008: http://googlewebmastercentral.blogspot.com/2008/08/farewell-to-soft-404s.html for example. In 2010 we added information about soft-404s to Webmaster Tools ( http://googlewebmastercentral.
Every now and then I hear from someone who accidentally got a bunch of email addresses indexed as parameters to some script on their site. There’s sometimes an easy solution to that, which will (temporarily!) take care of it fairly quickly: If there’s a common part of the path that identifies these URLs, use a “directory” URL removal request in Search Console (verify ownership first). For example, you can submit “email.
Looking for something simple & easy to do during the holidays? Double-check your site’s contact forms to see that they actually work. We occasionally reach out to webmasters who have technical issues that aren’t easily visible in Webmaster Tools. Sometimes not fixing those issues will result in us dropping your website completely from our search results. If your website doesn’t have an email address on it, and your contact form is an error-page, then you’re gonna have a bad time, and we’re not going to be able to warn you of those issues.
I see a bunch of posts about the robotted resources message that we’re sending out. I haven’t had time to go through & review them all (so include URLs if you can :)), but I’ll spend some time double-checking the reports tomorrow. Looking back a lot of years, blocking CSS & JS is something that used to make sense when search engines weren’t that smart, and ended up indexing & ranking those files in search.
I noticed there’s a bit of confusion on how to tweak a complex robots.txt file (aka longer than two lines :)). We have awesome documentation (of course :)), but let me pick out some of the parts that are commonly asked about: Disallowing crawling doesn’t block indexing of the URLs. This is pretty widely known, but worth repeating. More-specific user-agent sections replace less-specific ones. If you have a section with “user-agent: *” and one with “user-agent: googlebot”, then Googlebot will only follow the Googlebot-specific section.
For those of you using hreflang for international pages: Make sure any rel=canonical you specify matches one of the URLs you use for the hreflang pairs. If the specified canonical URL is not a part of the hreflang pairs, then the hreflang markup will be ignored. In this case, if you want to use the “?op=1” URLs as canonicals, then the hreflang URLs should refer to those URLs. Alternately, if you want to use just “/page”, then the rel=canonical should refer to that too.
Sometimes it’s a hassle to track auth data for the Google Spreadsheet API. Here’s a quick hack using Google Forms to post data to a Spreadsheet (similar to the previous post that uses Curl). You can use it as a function in your code, or as a simple command-line tool. Gist (archive.org) #!/usr/bin/python """Posts to a Google Sheet using a Form""" import re import sys import urllib import urllib2 def get_field_ids(form_url): """Returns list of field IDs on the form.