Check if a static site is moved correctly

The lazy person’s guide to confirming that a move to a static site worked.

Overview:

  • Download all relevant URLs from Search Console
  • Convert download to a URL list
  • Check for http to https redirects
  • Check for valid final URLs

Download all relevant URLs

I’m picking one approximate source of truth - the URLs that received impressions in Google Search. This list doesn’t need to be comprehensive, just something more than I’d manually pick. In general, any reasonable sample will include URLs from a variety of different templates / sections of the site – and usually problems are not unique to URLs, but rather templates / sections. You can also use a Google Analytics export, for example. I use Search Console.

  1. Verify ownership (if necessary – then wait a few days for the data to appear)
  2. Go to the performance report, pick full time frame (16 months).
  3. Export to CSV file
  4. Done.

Convert download into a URL list

Search Console does a funky ZIP file with data in various places. We’ll unzip, and take the URLs out of the CSV file. We’ll drop the rest (you can keep it, I don’t want it).

$ unzip yoursite.com-Performance-on-Search-2021-04-24.zip 

Archive:  yoursite.com-Performance-on-Search-2021-04-24.zip
  inflating: Queries.csv             
  inflating: Pages.csv               
  inflating: Countries.csv           
  inflating: Devices.csv             
  inflating: Search appearance.csv   
  inflating: Dates.csv               
  inflating: Filters.csv             

$ rm Queries.csv && rm Countries.csv && rm Devices.csv && \
  rm "Search appearance.csv" && rm Dates.csv && rm Filters.csv

$

Outcome: we have Pages.csv

Extract URL list

First: Fix the wonky Search Console multi-line CSV file. URLs with spaces in them may be line-wrapped, making it impossible to parse the CSV file on a per-line basis.

prev="" ;
while read line ; 
  do if [[ $line == \"* ]] ; then 
    prev="$line ";
  else 
    echo "$prev$line";
    prev="";
  fi;
done < Pages.csv > PagesClean.csv

If you have access to csvtool, everything is trivial. If you don’t have access, you can simplify the file (drop some URLs) and use the remaining sample. These commands remove the first line, then take the first column from the CSV file and put them into a urls.txt file.

$ # With csvtool
$ csvtool format "%(1)\n" PagesClean.csv | tail -n +2 >urls.txt

$ # Without csvtool (drop lines with quotes)
$ grep -v \" PagesClean.csv | awk -F',' '{print $1}' | tail -n +2 >urls.txt

Outcome: we have urls.txt

Check for http/https redirect (if needed)

If, like me, you were too lazy to move to HTTPS, here’s a way to check for the redirects. This creates a tab separated file with the URL, HTTP status code, and any redirect target.

while read line ; do
    echo -ne "$line\t";
    curl -sI "$line" | grep -E "(^HTTP|^Location)" | tr '\n\r' '\t';
    echo "";
done < urls.txt > urls-result.txt

(The code goes through the list of URLs, checks the header for the URL, and returns the URL, the HTTP result code, and the location field)

We can also just list the ones that have a missing redirect to the HTTPS version:

while read line ; do
  result=$(curl -sI "$line" | grep -E "(^HTTP|^Location)" | tr '\n\r' '\t');
  if [[ ${result} != *" 301 "* ]]; then
    echo "$line - Missing 301: $result";
  else
    httpsurl=$(echo $line | sed "s/http:\/\//https:\/\//");
    if [[ ${result} != *"$httpsurl"* ]]; then
      echo "$line - Wrong redirect: $result";
    fi;
  fi;
done < urls.txt 

(The code goes through the list of URLs, checks the header for the URL, looks for a “301” and checks if the https-version of the URL is in the result)

Check that HTTPS URLs are accessible with a 200 result code

while read line ; do
  httpsurl=$(echo $line | sed "s/http:\/\//https:\/\//");
  result=$(curl -sIL "$httpsurl" | grep -E "(^HTTP)");
  if [[ ${result} != *" 200 "* ]]; then
    echo "$httpsurl - Missing 200: $result";
  fi;
done < urls.txt 

Well, it looks like I still have work to do. :-)

Comments / questions

There's currently no commenting functionality here. If you'd like to comment, please use Twitter and @me there. Thanks!

Tweet about this - and/or - search for latest comments / top comments

Related pages