Link checking as cache warmup or integration testing

Sometimes I use wget awesome recursive spidering (crawling) feature (alternative to linkchecker) beside broken link check also for cache warmup or looking for PHP errors after commit.

On production for some projects I use PHP error logging into separate log directory per day:

if (!is_dir(DOC_ROOT.'/log')) {
    mkdir(DOC_ROOT.'/log');
}

ini_set('display_errors', 0);
ini_set('display_startup_errors', 0);
ini_set('log_errors', 1);
ini_set('error_log', DOC_ROOT.'/log/php-error-'.date('d').'.log');

So after crawling I check if there is a log file.
And the wget shell script that crawls the site is:

#!/bin/bash

timestamp=$(date +"%Y%m%d%H%M%S")

cd /tmp
time wget -4 --spider -r --delete-after \
     --no-cache --no-http-keep-alive --no-dns-cache \
     -U "Wget" \
     -X/blog \
     http://www.example.com -o"/tmp/wget-example-com-$timestamp.log"

During the crawling your site is hit and cache is generated if you have some implemented, for example phpFastCache.
I just only crawl via IPv4 without HTTP keep alive (only for better performance). Setting some unique user agent is good for parsing in access_log, too.
-X stands for excluding, also handy for improving performance on large sites.
-o outputs the wget status report where you can search for HTTP status codes as 404, 403, 500, etc.
Remember the excluded path, you might run another wget in parallel if you need. Unfortunately wget can’t run parallel threads as of writing.