So you got your staging site indexed? Happens to everyone. Here’s a rough guide on fixing it, and suggestions for preventing it.
The fastest way to get the staging site removed from search is remove it via Search Console. For that, you need to verify ownership via Search Console [1] (ironically, this means you’ll likely have to make it accessible to search engines again, or figure out DNS verification, which isn’t that common but also not that hard). From there, you can do a site-removal request [2], which will take the whole hostname out of Google’s search for ca 90 days. During this time, you can figure out and implement your general plan to block the staging site from search.

My recommendation for staging sites is to block access on the server side, either with server-side / HTTP authentication [3] or IP address whitelisting (IP addresses can change, and this would block you from using tools from home, etc, so it’s worth being cautious there and whitelisting rather than blacklisting).

I don’t like the alternatives. Using page-level or HTTP response noindex [4] means the pages need to be accessible (open to competitors, scrapers, etc). Using robots.txt [5] [6] means you need to remember to change the robots.txt when moving from staging to production (another source of common problems), and can result in URLs being indexed without their content (URLs blocked by robots.txt may be indexed, even without their content being known).

Regardless of the method, if you’re not using the site-removal request, keep in mind that crawling is at a page-level and can take time, especially if we’re not sure about the importance of your staging site (which is usually the case). It’s normal for URLs to not be recrawled in months, so if you add any block on the URL level, it can easily take a half year or longer to be fully processed for all URLs. The site-removal request gives you most of that time, and you can submit another one should you need to extend it.

What did I miss? Which is your favorite setup?


