Updating WordPress XML Sitemaps Offline

I personally love Arne Brachhold’s Google XML Sitemap plugin for WordPress. I personally use it on any WordPress install I do. On larger blogs, or blogs where you’re using automated content generators (i.e. posting content in an automated way through XML-RPC) the default build mode will slow down your blog because it rebuilds the entire XML sitemap from scratch every time you create or update a post or page.

There is a second build mode this plugin supports, which is to build via a GET request. For sites that have a lot of posts or do automated posting, this is a great option. It’s possible to schedule the XML sitemap updates to happen at specific times of the day with a simple script that uses an HTTP GET request to refresh them. This will speed up posting, especially for sites that use automatically generated content. Here’s a simple php script that you can schedule via cron to update your sitemaps and send you an email when it is done. Just update the $admin_email variable to where you want the email to go and the $sitemap_link variable to whatever the XML Sitemaps plugin tells you when you change the build mode. Notethat you may need to change the link to include the wp-admin especially if you’re on WordPress mu – the link the plugin gives doesn’t work (i.e. http://myblog.com/?sm_command… to http://myblog.com/wp-admin/?sm_command…)

Here’s the script:

<?php

$admin_email='info@myblog.com';
$sitemap_link = 'http://myblog.com/?sm_command=build&sm_key=90210';

function getURIContents( $uri ) {
    return file_get_contents( $uri );
}

function generateSitemap( $link ) {
    $ret = '';
    $result = getURIContents( $link );
    if( !preg_match( '/.*DONE.*/', $result ) ) {
        $ret = $result;
    }
    return $ret;
}

$result = generateSitemap( $sitemap_link );
if( $result != '' ) {
        mail( $admin_email, 'Sitemap Generator failed:$result" );
} else {
        mail( $admin_email, 'Sitemap Generator Complete',
                "Completed sitemap generation." );
}
?>

If you get errors related to file_get_contents not being able to load a remote URI, just replace the above function with this, which uses libcurl and you should be ok:

function getURIContents( $uri ) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $uri);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
    curl_setopt($ch, CURLOPT_USERAGENT, 'IE 6 - Mozilla/4.0' );
    ret = curl_exec( $ch );
    if( curl_errno( $ch ) ) {
        $ret = '';
    } else {
        curl_close( $ch );
    }
    return $ret;
}
WORDPRESS