A set of command-line Windows website tools (old, probably outdated)

Warning: This page is pretty old and was imported from another website. Take any content here with a grain of salt.

If you have to do things over and over again, it’s a good idea to use a tool to make things easier. Windows is a bit limited (or very - when compared to Linux) when it comes to batch file scripts and “wget” is limited to what it can do right out the box, so I sat down and wrote a few command line tools to help me with some of the website checks that I like to do.

The tools I included in this set can do the following:

  • Check the result codes for a URL (and follow in the case of a redirect) - or for a list of URLs
  • Create a list of the links found on a URL (or just particular ones)
  • Create a list of the links and anchor texts found on a URL (or just particular ones)
  • Create a simple keyword analysis of the indexable content on a URL

You can get the down from here (requires the Windows .NET runtime v1.1):

WebResult

This tool accesses a URL and shows the result code that was returned. If the status is a redirect, it will display the redirection location and optionally follow it to check the final result code. It may be used with a list of URLs. The output is tab-delimited.

Usage:

WebResult [options] (URL|urllist.txt)
Options:
 --referer|-r [referrer] (default: none)
 --user-agent|-u [user-agent] (default: "WebResult")
 --follow-redirect|-f (default: not)
 --headers|-h (displays the full response headers)
 --verbose|-v

Example: Check for correct canonical redirect:

Webresult http://johnmu.com/
Webresult http://www.johnmu.com/

This tool lists the links that are found on a URL. Note that it has an integrated HTML/XHTML parser - if the code on the page is not fully compliant, there is a chance of the parser not recognizing all links (it is fairly fail-safe, though).

This tool can use a cached version of the URL (from either this tool or one of the other ones) to save bandwidth. The cached versions are saved in the user’s temp-folder.

You have the choice of only listing domain outbound or insite links (to help simplify the output). Additionally links with the HTML microformat “rel=nofollow” may be marked as such. The output is in alphabetical order.

Usage:

WebLinks [options] (URL|urllist.txt)
Options:
 --referer [referrer] (default: none)
 --user-agent [user-agent] (default: "WebLinks"
 --insite-only|-i (default: both in + out)
 --outbound-only|-o (default: both in + out)
 --ignore-nofollow|-n (default: off)
 --cache|-c (default: off)
 --verbose|-v (default: off)

Example: Check the outbound links on a site.

WebLinks -o http://johnmu.com/

WebAnchors

This tool lists the links and anchor text as found on a URL. It uses the same HTML/XHTML parser as WebLinks. It can be used to find certain links (based on the URL, domain name, URL-snippets, or even parts of the anchor text). If the anchor for a link is an image, it will use the appropriate ALT-text, etc.

Usage:

WebAnchors [options] (URL|urllist.txt)
Options:
 --referer|-r [referrer] (default: none)
 --user-agent|-u [user-agent] (default: "WebLinks"
 --find-url|-f http://URL
 --find-domain|-d DOMAIN.TLD
 --find-anchor|-a TEXT
 --find-url-snippet|-s TEXT
 --url-only|-o (default: show anchor text as well)
 --skip-nofollow|-n (default: off)
 --cache|-c (default: off)
 --verbose|-v (default: off)

Example: Check the links with “Google” in the anchor text.

WebAnchors -s "Google" http://johnmu.com/

WebKeywords

This tool does a simple keyword analysis on the indexable content of a URL. It also uses the above HTML/XHTML parser to extract the indexable text. It is possible to get single-word keywords or to use multi-word-phrases. The output is tab-delimited for re-use.

Usage:

WebKeywords [options] (URL|urllist.txt)
Options:
 --referer|-r [referrer] (default: none)
 --user-agent|-u [user-agent] (default: "WebLinks"
 --verbose|-v (default: off)
 --words|-w [NUM] (phrases with number of words, default: 1)
 --ignore-numbers|-n (default: off)
 --cache|-c (cache web page, default: off)

Example: Extract 3-word keyphrases from a page:

Webkeywords -w 3 http://johnmu.com/

Combined usage of these tools

Find common keyphrases on sites linked from a page (uses a temporary file to store the URLs):

webanchors -c -o -a "Google" http://johnmu.com >temp.txt
webkeywords -c -w 3 temp.txt

Check result codes of all URLs linked from a page:

weblinks -c http://johnmu.com >temp.txt
webresult temp.txt >links.tsv

Compare result codes for multiple accesses:

echo. >results.tsv
for /L %i IN (1,1,100) DO webresult http://johnmu.com/ >>results.tsv

or more complicated to test a hack based on the referrer (all on one line):

for /L %i IN (1,1,100) DO \
    webresult -u "Mozilla/5.0 (Windows; U) Gecko/20070725 Firefox/2.0.0.6" \
    -r http://www.google.com/search?q=johnmu http://johnmu.com/ \
    >>results.tsv

I’d love to hear about your usage of these tools :) .

Warning: This page is pretty old and was imported from another website. Take any content here with a grain of salt.

Comments / questions

There's currently no commenting functionality here. If you'd like to comment, please use Twitter and @me there. Thanks!

Tweet about this - and/or - search for latest comments / top comments

Related pages