
Originally Posted by
dody
Dody,
Although your solution works it is terribly inefficient and technically incorrect in implementing the Google Sitemap API in addition does not hold to the intent of the service in general.
The intent of the Google Sitemap service is to get a quick list of pages that need to be indexed or has changed content and needs to be indexed again. Your code assigns an arbitrary number to the change frequency as well as the priority.
In the spirit of the service the detail and category listings should be sorted by date added, date modified.
Another technical point is that the need to create/write the sitemap files requires either PHP installed in CGI mode or if SAPI module they must be CHMOD 777 so the daemon can write the files.
A more elegant solution (provided the don't have more than 50K listings) is to create a PHP script that will aggregate the data in real time. Thus, no need to create nasty CRON jobs or mess with file permissions. The API says that it needs to be of certain format...but does not necessarily need to be of XML extension.
Here is an example of my script that I use for the detail listing sitemap. There are a few points to notice:
First, I use an external DB connector as the ADO abstraction layer you chose to incorporate has almost 1 Mb (RAM) overhead for each initialization. In future projects I highly encourage you to look at incorporating ADOdb Lite which utilizes about 90K.
Second, I send the text/xml header to remain in strict compliance of the Google API.
I also use a GZIP output buffer to decrease the transfer file size.
Example googleSitemapListings.php
PHP Code:
/**
* Google Sitemap Generator, Listings
*
* Creates detail listing sitemap for IndexU DB architecture
*
* @license http://opensource.org/licenses/gpl-license.php GNU Public License
* @version 1.0
* @link http://www.oscommerce-freelancers.com/ osCommerce-Freelancers
* @copyright Copyright 2005, Bobby Easland
* @author Bobby Easland
* @filesource
*/
// Database host - i.e. localhost
define('DB_HOST', '');
// Database username
define('DB_USER', '');
// Database password
define('DB_PASS', '');
// Database name
define('DB_NAME', '');
// Base URL of the directory
define('BASE_URL', 'http://yourdomain.com/');
// Include the lightweight DB abstraction layer
include_once('lib/database.class.php');
// Initialize DB object
$db = new MySQL_DataBase(DB_HOST, DB_USER, DB_NAME, DB_PASS);
//// Function to format URL in SEO format
function formatNice($string){
$mod_rewrite_replacement_str_arr =
array(' ','-','/','\\',',','#',':',';','\'','"','[',']',
'{','}','|','`','~','!','@','%','$','^','&','*','=','+');
$title_mod = strtolower($string);
return str_replace($mod_rewrite_replacement_str_arr,'_',$title_mod);
} # end function
// Send the content type header
header('Content-Type: text/xml');
//// Start the output buffer with GZ handler callback
// This is optional but decreases the transfer size dramatically
ob_start('ob_gzhandler');
// Start the XML output
echo '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<?php
function GenerateNode($data){
$content = '';
$content .= "\t" . '<url>' . "\n";
$content .= "\t\t" . '<loc>'.trim($data['loc']).'</loc>' . "\n";
$content .= "\t\t" . '<lastmod>'.trim($data['lastmod']).'</lastmod>' . "\n";
$content .= "\t\t" . '<changefreq>'.trim($data['changefreq']).'</changefreq>' . "\n";
$content .= "\t\t" . '<priority>'.trim($data['priority']).'</priority>' . "\n";
$content .= "\t" . '</url>' . "\n";
return $content;
} # end function
// This is an example node...ones that will always change and should be listed first!
$container = array('loc' => htmlspecialchars(utf8_encode(BASE_URL . 'new.php')),
'lastmod' => date ("Y-m-d"),
'changefreq' => 'daily',
'priority' => '1'
);
echo generateNode($container);
// Initialize the SQL string
$sql = "SELECT link_id,
title,
date as date_added,
last_updated as last_mod
FROM idx_link
WHERE suspended != '1'
ORDER BY date_added DESC, last_mod DESC";
// Execute the query
$query = $db->Query($sql);
// Get the number of returned rows
$total = $db->NumRows($query);
if ( $total > 0 ){
$container = array();
$_total = $total;
while( $result = $db->FetchArray($query, MYSQL_ASSOC) ){
$location = BASE_URL . 'detail-' . formatNice($result['title']) . '-' . $result['link_id'] . '.html';
if ($result['date_added'] > $result['last_mod']){
$lastmod = $result['date_added'];
} else {
$lastmod = $result['last_mod'];
}
$changefreq = 'weekly';
$_total--;
$ratio = $_total/$total;
$priority = $ratio < .1 ? .1 : number_format($ratio, 1, '.', '');
$container = array('loc' => htmlspecialchars(utf8_encode($location)),
'lastmod' => date ("Y-m-d", strtotime($lastmod)),
'changefreq' => $changefreq,
'priority' => $priority
);
echo generateNode($container);
} # end while
} # end if
echo '</urlset>';
Example Lightweight DB Class (very basic)
PHP Code:
class MySQL_DataBase{
var $host, $user, $db, $pass, $link_id;
function MySQL_DataBase($host, $user, $db, $pass){
$this->host = $host;
$this->user = $user;
$this->db = $db;
$this->pass = $pass;
$this->ConnectDB();
$this->SelectDB();
}
function ConnectDB(){
$this->link_id = mysql_connect($this->$host, $this->user, $this->pass);
}
function SelectDB(){
return mysql_select_db($this->db);
}
function Query($query){
return mysql_query($query, $this->link_id);
}
function FetchArray($resource_id){
return mysql_fetch_array($resource_id);
}
function NumRows($resource_id){
return mysql_num_rows($resource_id);
}
function Free($resource_id){
return mysql_free_result($resource_id);
}
} # end class