Sitemap Generator Language Table of Contents
 Japanese [日本語]  
 English [英語]  
 Korean [韓国語]  
 Simplified Chinese [简体中文]  
 Traditional Chinese [繁體中文]  
 Español [スペイン語]  
 Français [フランス語]  
 Português [ポルトガル語]  
 Arabic العربية [アラビア語]  
 Deutsch [ドイツ語]  
 Italiano [イタリア語]  
 Russian [ロシア語]  
 Turkish [トルコ語]  
 Hindi [ヒンディー語]  
 Vietnamese [ベトナム語]  
 Thai [タイ語]  
 Dutch [オランダ語]  
 Indonesian [インドネシア語]  
 Malay [マレー語]  
 Filipino [フィリピン語]  
 Swedish [スウェーデン語]  
 Norwegian [ノルウェー語]  
 Danish [デンマーク語]  
 Finnish [フィンランド語]  
 Polish [ポーランド語]  
 Czech [チェコ語]  
 Hungarian [ハンガリー語]  
 Greek [ギリシャ語]  
 Romanian [ルーマニア語]  

XML Sitemap Generation Program
XML Sitemap Generator
Copy the code and create the program.
Sample XML Sitemap

Introduction
A sitemap is a page or file that lists the structure and content of your website, helping users and search engines understand the information on your site and assist with navigation.
Although the title of this page is 'XML Sitemap Generation Program' the current page does not function as a tool for generating sitemaps.
On this page, you will find the code to generate and retrieve a sitemap in XML format. Upload the code below  to your site's server and run the program when you need an XML sitemap for your site. You are free to modify the code. Please feel free to use it.
Disclaimer
Depending on your site's structure, there is a possibility that this program may not successfully generate a sitemap.
The program assumes that the site is created using UTF-8. However, even if the entire content is in UTF-8, it may not function if HTML tags do not match the program's pattern.
Risks
This program is designed to be executed in an online environment. Therefore, the following risks are associated with it.
  1. Risk of third-party tampering: Malicious third parties may execute the PHP program, potentially altering the root.
  2. Server performance: With numerous files/directories, it may consume time and memory, impacting server load.
  3. Impact on Googlebot: During Googlebot's reference to the sitemap, if the program is crawled while writing, correct information may not reach the bot.
Measures taken
We have taken the following measures to reduce risk:
  1. Storage of generated files: For enhanced security of the generated sitemap file, the program follows these steps:
    1. Temporary storage: Sitemap files are initially stored temporarily in a location outside the root directory, preventing direct writing or tampering by malicious third parties on the web server.
    2. File renaming: The generated sitemap file is renamed after temporary storage, reducing the risk of tampering by changing the original file name.
Measures to be taken
To reduce your risk, please note the following:
  1. Program file deletion: After sitemap generation, promptly remove the uploaded program file from the server.
  2. Execution in a secure environment: Be cautious about placing files on remote servers and consider running them locally if you feel they are at risk.
  3. Load monitoring: If server load increases, cease usage immediately to prevent performance issues.
  4. Measures for Googlebot: After generating the sitemap, verify crawling success in Google Search Console. Consider manually initiating crawling if needed.
  • Unless you have a sitemap generation plugin like WordPress, you will need to generate the sitemap yourself. In that case, please use the code introduced here.
  • Detailed information about site map structure and necessity can be found on many websites, so it is omitted here, providing only the PHP code.
  • Each item in the code has comments; choose the method that suits your preferences.
  • The generated XML file (.xml) looks like the following, providing a sample with the minimum required items for Google registration. Additionally, you can retrieve "Page Title," "Change Frequency," and "Priority."
    Sample XML Sitemap
    ・The last modification date (<lastmod>) is obtained in Coordinated Universal Time (UTC).
    ・For example, Japan time adds 9 hours to the displayed time.
  • Installation Steps:
    1. Prepare a directory (folder) with a name like "sitemap."
    2. Copy the code below   and paste it into your web page editor.
    3. Follow the comments in the code to replace each item with content that fits your conditions.
    4. Name the file, for example, 'sitemapgenerator.php,' and save the edited page with the '.php' extension, not '.html.'
    5. Upload it to your web server. If you can access the page and see a screen similar to the one below, the generation is successful.
      * Tested in a development environment with approximately 20,000 pages, the file size is about 6MB (selecting all available items in the settings), and it took approximately 3 seconds.
      Successful example of type including MTML section
      【Type including MTML section】
      Successful example of type with only PHP section
      【Type with only PHP section】
    6. Open the saved "XML file" in your browser to check it, or download the saved ".xml" file to check it, and if it is generated correctly, register it inGoogle Search Console  Bing Webmaster Tools  , and so on.
      * If the file size is large, you might not be able to open it in a browser. Download the ".xml" file from the web server for confirmation.
      * Depending on the browser, it might not display correctly when opened.
      * If you open it in a browser, the "XML Declaration" on the first line might not appear.
    7. If using it for the first time, ensure proper generation and check for unnecessary items.
    8. If you encounter an "Encoding error" when accessing the page, it could be caused by specific characters.
      Example of Encoding Error
      For instance, characters like [&] should be rewritten as escape codes, as shown in the table below.
      * Escape codes can be copied by clicking.
    9. Character  Escape Code 
      Ampersand
      &➡️
      &amp;
      Single Quote
      ➡️
      &apos;
      Double Quote
      ➡️
      &quot;
      Greater Than
      >➡️
      &gt;
      Less Than
      <➡️
      &lt;
      Common Causes of "Encoding Error":  
      An "Encoding Error" is an error message that occurs when an XML file is not correctly encoded with a specific character encoding. The following issues may be considered:
      1. Invalid characters are present
        [Cause] XML demands certain characters ([&][']["][>][<], etc.) to be escaped. An error occurs if these characters are not properly escaped.
        [Fix] Escape invalid characters or ensure they are correctly escaped.
      2. Correct encoding is not specified
        [Cause] If the XML file declaration at the beginning, such as <?xml version="1.0"?>, is present, and the encoding attribute is not correctly specified afterward, an error occurs.
        [Fix] Specify the correct encoding, for example, <?xml version="1.0" encoding="UTF-8"?>.
      3. Declaration section is incorrect
        [Cause] An error occurs if the declaration section is incorrect or if there is an error between "<?xml" and "?>".
        [Fix] Correct the declaration section.
      4. File encoding doesn't match the declaration
        [Cause] An error occurs if the actual encoding of the file does not match the encoding specified in the XML declaration.
        [Fix] Adjust the file encoding to match the declaration.
      5. File is corrupted
        [Cause] Errors occur if the file is not saved correctly or is corrupted.
        [Fix] Resave the file and ensure it can be loaded successfully.
    10. When we verified this in a test environment, when no exclusions were specified, a directory such as "sys", which is not used as a web page directory (folder) on this site, was found in the XML file. If confirmed, please specify the directory name in the 'Exclude directories' item, for example, "$excludeDirectories = ['sys']".
      Alternatively, try specifying something like "$excludeMetaTags = ['NOINDEX']" in the "Exclude Meta Tags" section. The PHP code below uses "NOINDEX" as the default setting.
* The "font-family" in the program should be adjusted to match your own site if necessary.
《 XML Sitemap Generation PHP Code 》
 Download ZIP File with PHP Part Only   
You can run the program even with only the PHP section.
Save the file with a ".php" extension.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>XML Sitemap Generation PHP Program</title>
<meta name="robots" content="NOINDEX,NOFOLLOW">
<!-- Load Free Icon Fonts - can be removed if not used -->
<link href="https://use.fontawesome.com/releases/v6.2.0/css/all.css" rel="stylesheet">
</head>
<body>
<h2>&nbsp;&nbsp;<i class="fa-regular fa-pen-to-square fa-2x" style="color: crimson"></i>&nbsp;XML Sitemap Creation</h2>
<hr>
<div style="margin:15px 5px 10px 20px;padding: 0 15px 0 0; font-size: 14px;background-color: lavenderblush; border: gray 1px solid; border-radius: 4px;">
<div style="margin:15px 0 0 20px">
    <form method="post">
        <input type="submit" name="downloadLocal" value="Download XML file locally" style="background-color: white;border-radius: 5px;cursor: pointer;">
    </form>
</div>
<ul>
<li>The XML file downloaded from here contains part of the HTML portion of the page.</li>
<li>If you need a pure XML file, download it from a remote server. Or, delete the HTML portion of the downloaded file.The XML files from "&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;" to "&lt;/urlset&gt;" are XML files.</li>
</ul>
</div>
<hr>
<?php
//************************************************
// XML Sitemap Generation PHP Program
// Program provided by: Everyone's Knowledge A Little Useful Book
//        Minna no Chishiki Chotto Benricho
//        みんなの知識 ちょっと便利帳
// https://www.benricho.org/Tips/sitemapgenerator/
// Released: January 3, 2024
//************************************************s

// Document root of the web server. Automatically retrieved.
$sitemapDirectory = $_SERVER['DOCUMENT_ROOT'];

// .xml file name (final filename)
// The .xml file name can be changed. However, many search engines use "sitemap.xml" as the standard naming convention, so using "sitemap.xml" is recommended.
$finalSitemapFilename = 'sitemap.xml';

// Directory for saving the sitemap file (final directory - root)
$finalSitemapPath = $sitemapDirectory . '/' . $finalSitemapFilename;

// Directory to generate the sitemap
$rootDirectory = $_SERVER['DOCUMENT_ROOT'];

// .xml file name (temporary file name)
$tempSitemapFilename = 'temporarysitemap.xml';

// Directory to save the sitemap file (temporary directory)
$tempSitemapPath = $sitemapDirectory . '/' . $tempSitemapFilename;

// Create a temporary directory if it does not exist
$tempDirectory = dirname($tempSitemapPath);
if (!file_exists($tempDirectory)) {
    mkdir($tempDirectory, 0777, true);
}

// Delete old sitemap and rename to the new sitemap (using shell command)
$oldSitemapFilename = 'old-' . $finalSitemapFilename;
$oldSitemapPath = $sitemapDirectory . '/' . $oldSitemapFilename;

// Delete old sitemap if it exists
if (file_exists($oldSitemapPath)) {
    unlink($oldSitemapPath);
}

// Rename to the new sitemap
if (file_exists($tempSitemapPath)) {
    rename($tempSitemapPath, $oldSitemapPath);
}

// Check if the download button is clicked
if (isset($_POST['downloadLocal'])) {
    // Copy the sitemap to the temporary directory
    copy($finalSitemapPath, $tempSitemapPath);

    // Set headers for downloading
    header('Content-Type: application/octet-stream');
    header('Content-Disposition: attachment; filename="' . $finalSitemapFilename . '"');
    header('Content-Length: ' . filesize($tempSitemapPath));

    // Output the file
    readfile($tempSitemapPath);

    // After download, delete the sitemap saved in the temporary directory
    unlink($tempSitemapPath);
    exit;
}

///// Exclusion Settings /////
// Directories to exclude. Specify only the directory names. ['dir-1', 'dir2'] etc.
// Set to an empty array if not needed. $excludeDirectories = [];
$excludeDirectories = [];

// Files to exclude. Specify only the file names. ['aaa.html', 'bbb.php'] etc.
// Set to an empty array if not needed. $excludeDirectories = [];
$excludeFiles = [];

// Directories/Files to exclude ['dir-1/dir1/file1.html', 'dir2/file2.php'] etc.
// Exclude paths should not include a leading "/".
// Set to an empty array if not needed. $excludeDirectories = [];
$excludePaths = [];

// Meta tags containing ['NOINDEX', 'NOFOLLOW', 'REFRESH'], etc. to be excluded
// Set to an empty array if not needed. $excludeDirectories = [];
// Recommend specifying ['NOINDEX']
$excludeMetaTags = ['NOINDEX'];

///// Sitemap Generation Conditions /////
// Get page title (1: Yes, 2: No)
// * Not recommended by Google Search Console
// * If registering with Google Search Console, set to "2: No"
// * If set to "1: Get," Google Search Console may display an alert stating, "Title tags are not recognized. Please make corrections."
$getTitle = 2;
// Strings to remove from the page title
// * Can remove specified strings from the title. ['of', 'is'] etc.
// Set to an empty array if not needed. $excludeDirectories = [];
$removeTitleStrings = [];

// Get last modification date of the file (1: Yes, 2: No)
// * Recommended by Google Search Console
$getLastMod = 1;

// Page update frequency
// * Ignored by Google. It is advised not to add values.
// Use page update frequency (1: Yes, 2: No)
$useChangeFreq = 2;
// Select elements for update frequency when "1" is chosen
// ['always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never']
// Choose values that apply to the update frequency of your site
$changefreqValues = ['element suitable for your site'];

// Page priority
// * Ignored by Google. It is advised not to add values.
// Get priority (1: Yes, 2: No)
$getPriority = 2;

// Sitemap generation message
$successMessage = "<p>・ The Sitemap XML file has been generated.</p><p>・ It has been saved in the root of the remote server as '{$finalSitemapFilename}'.<br>・ For security reasons, please delete the program file from the remote server.</p><p>・ <a href='/{$finalSitemapFilename}' target='_blank'>Open the 'XML file' in the browser [new tab].</a></p>";

// Sitemap XML Header (for the final sitemap file)
$xmlFinal = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
XML;

// Recursively process files within the directory (for the final Sitemap file)
function generateSitemap($directory, $excludeDirs, $excludeFiles, $getTitle, $removeTitleStrings, $getLastMod, $excludeMetaTags, $useChangeFreq, $changefreqValues, $getPriority, $excludePaths, &$xmlFinal) {
    $dir = new DirectoryIterator($directory);
    foreach ($dir as $fileInfo) {
        if ($fileInfo->isDot()) continue;

        $filename = $fileInfo->getFilename();
        $filepath = $fileInfo->getPathname();
        $fileExtension = pathinfo($filename, PATHINFO_EXTENSION);

        // Check if it is an excluded directory/file
        $excludePath = str_replace($_SERVER['DOCUMENT_ROOT'], '', $filepath);
        if (in_array(ltrim($excludePath, '/'), $excludePaths)) {
            continue;
        }
        if ($fileInfo->isDir()) {
            if (in_array($filename, $excludeDirs)) {
                continue;
            } else {
                generateSitemap($filepath, $excludeDirs, $excludeFiles, $getTitle, $removeTitleStrings, $getLastMod, $excludeMetaTags, $useChangeFreq, $changefreqValues, $getPriority, $excludePaths, $xmlFinal);
            }
        } else {
            if (in_array($filename, $excludeFiles) || in_array($filename, $excludeDirs)) {
                continue;
            }

            if (in_array($fileExtension, ['html', 'php'])) {
                processFile($filepath, $getTitle, $removeTitleStrings, $getLastMod, $excludeMetaTags, $useChangeFreq, $changefreqValues, $getPriority, $xmlFinal);
            }
        }
    }
}

// Process the file (for the final file
function processFile($filepath, $getTitle, $removeTitleStrings, $getLastMod, $excludeMetaTags, $useChangeFreq, $changefreqValues, $getPriority, &$xmlFinal) {
    $content = file_get_contents($filepath);
    if (shouldExcludeContent($content, $excludeMetaTags)) {
        return;
    }

    $url = getRelativeUrl($filepath);
    // Set last modification date in UTC (Coordinated Universal Time) format (indicated by "+00:00")
    $lastMod = ($getLastMod == 1) ? getLastModifiedDateUTC($filepath) : '';

    $xmlFinal .= "\n <url>";
    $xmlFinal .= "\n <loc>{$url}</loc>";

    if ($getTitle == 1) {
        $title = getTitleFromContent($content);
        if (!empty($title)) {
            $title = str_replace($removeTitleStrings, '', $title);
            $xmlFinal .= "\n <title>{$title}</title>";
        }
    }

    // Display last modification date
    if (!empty($lastMod)) {
        $xmlFinal .= "\n <lastmod>{$lastMod}</lastmod>";
    }

    // Display changefreq
    if ($useChangeFreq == 1) {
        $changefreq = $changefreqValues[array_rand($changefreqValues)];
        $xmlFinal .= "\n <changefreq>{$changefreq}</changefreq>";
    }

    // Display priority
    if ($getPriority == 1) {
        $priority = getPriorityFromDepth($filepath);
        $xmlFinal .= "\n <priority>{$priority}</priority>";
    }

    $xmlFinal .= "\n </url>";
}

// Calculate priority based on depth (for the final file)
function getPriorityFromDepth($filepath) {
    $depth = substr_count($filepath, DIRECTORY_SEPARATOR);
    return 1 - ($depth * 0.1);
}

// Check if meta tags should be excluded (for the final file)
function shouldExcludeContent($content, $excludeMetaTags) {
    foreach ($excludeMetaTags as $tag) {
        if (stripos($content, '<meta name="robots" content="' . $tag) !== false) {
            return true;
        }
    }

    return false;
}

///// Get the last modification date of the file - It is recommended to use UTC (Coordinated Universal Time) in the sitemap XML element
// Set the last modification date in UTC (Coordinated Universal Time) format (indicated by "+00:00")
function getLastModifiedDateUTC($filepath) {
    $lastModTimestamp = filemtime($filepath);
    $lastModDateTime = new DateTimeImmutable('@' . $lastModTimestamp);
    return $lastModDateTime->format('c');
}

// Get title from HTML file (for the final file)
function getTitleFromContent($content) {
    $dom = new DOMDocument;
    libxml_use_internal_errors(true); //Suppress errors during HTML parsing
    $dom->loadHTML($content);

    $titleElements = $dom->getElementsByTagName('title');
    if ($titleElements->length > 0) {
        $title = $titleElements->item(0)->textContent;
        return $title;
    }

    return '';
}

// Get the relative URL of the file (for the final file)
function getRelativeUrl($filepath) {
    $relativeUrl = str_replace($_SERVER['DOCUMENT_ROOT'], '', $filepath);
    return $_SERVER['REQUEST_SCHEME'] . '://' . $_SERVER['SERVER_NAME'] . str_replace('\\', '/', $relativeUrl);
}

// Start sitemap generation (for the final file)
generateSitemap($rootDirectory, $excludeDirectories, $excludeFiles, $getTitle, $removeTitleStrings, $getLastMod, $excludeMetaTags, $useChangeFreq, $changefreqValues, $getPriority, $excludePaths, $xmlFinal);

// Sitemap footer (for the final file)
$xmlFinal .= "\n</urlset>";

// Save the sitemap to a file (for the final file)
file_put_contents($finalSitemapPath, $xmlFinal);

// Display success message (indicating that the final sitemap file has been generated and saved)
echo $successMessage;
?>
<hr>
<p style="margin-top: 20px">
<!-- Set the file name of this PHP program in the <a> tag. The extension is ".php" -->
<a href="File name of this program.php" style="margin-left: 20px;text-decoration: none;"><i class="fa-solid fa-check" style="color: #005eff;"></i> Regenerate “XML file”&nbsp;&nbsp;<i class="fa-solid fa-rotate fa-spin fa-2x" style="color:crimson"></i>&nbsp;&nbsp;&nbsp;[Reload page]</a>
</p>
<p>
<!-- Open the generated "XML file" in a browser -->
<a href='<?php echo '/' . $finalSitemapFilename; ?>' target='_blank' style='margin-left: 20px;text-decoration: none;'><i class="fa-solid fa-check" style="color: #005eff;"></i> Open "XML file" in browser&nbsp;&nbsp;<i class="fa-solid fa-chalkboard-user fa-beat-fade fa-2x" style="color:green"></i>&nbsp;&nbsp;&nbsp;[Separate tab]</a>
<div style="margin-left: 30px;font-size: 12px;color: gray">* If the file size is large, you may not be able to open it in your browser. If you cannot open it in your browser, please download the ".xml" file from the web server and check it.</div>
</p>
<hr>
<!-- Confirm addition/update of "Google" sitemap -->
<p>
<!-- <a>Set your own URL in the tag -->
<a href="https://search.google.com/search-console/sitemaps?resource_id=your own URL" target="_blank" style="margin-left: 20px;text-decoration: none;"> <i class="fa-solid fa-check" style="color: #005eff;"></i>&nbsp;[<strong>Google</strong>]&nbsp;Confirm addition/update of site map&nbsp;&nbsp;<i class="fa-solid fa-arrow-up-from-bracket fa-bounce fa-2x" style="color: #db0016;"></i></a>
</p>
<!-- [Google Search Console] -->
<p>
<!-- <a>Set the URL of the "Sitemap" page of Google Search Console that you have registered in the tag. -->
<a href="https://search.google.com/search-console?resource_id=your own URL" target="_blank" style="margin-left: 20px;text-decoration: none;"><i class="fa-solid fa-check" style="color: #005eff;"></i>&nbsp;[<strong>Google Search Console</strong>]&nbsp;&nbsp;&nbsp;<i class="fa-solid fa-up-right-from-square fa-beat fa-2x" style="color: blue"></i></a>
<div style="font-size: 13px; margin: 0 40px">* It is assumed that you have already registered with "Google Search Console".</div>
</p>
<hr>
<!-- “Bing” Confirm addition/update of site map -->
<p>
<!-- Set the URL of the Bing "Webmaster Tools" page that you have registered and the XML file name you set in the <a> tag.The file name should match the file name set in "$sitemapFilename". -->
<a href="https://www.bing.com/webmasters/sitemaps?siteUrl=your own URL/&sitemap=your own URL/sitemap.xml" target="_blank" style="margin-left: 20px;text-decoration: none;"> <i class="fa-solid fa-check" style="color: #005eff;"></i>&nbsp;[<strong>Bing</strong>]&nbsp;Confirm addition/update of site map&nbsp;&nbsp;<i class="fa-solid fa-arrow-up-from-bracket fa-bounce fa-2x" style="color: #db0016;"></i></a>
</p>
<!-- [Bing webmaster Tools] -->
<p>
<!-- Set the URL of the Bing "Webmaster Tools" page that you have registered in the <a>tag. -->
<a href="https://www.bing.com/webmasters/home?siteUrl=your own URL" target="_blank" style="margin-left: 20px;text-decoration: none;">
<i class="fa-solid fa-check" style="color: #005eff;"></i>&nbsp;[<strong>Bing webmaster Tools</strong>]&nbsp;&nbsp;&nbsp;<i class="fa-solid fa-up-right-from-square fa-beat fa-2x" style="color: blue"></i></a>
<div style="font-size: 13px; margin: 0 40px">* It is assumed that you have registered with "Bing Webmaster Tools" or have completed cooperation with "Google Search Console".</div>
</p>
<hr>
<!-- Displays the site name, etc. If you do not need to display it, please delete it. -->
<h3 align="center"><i class="fa-solid fa-house" style="color: crimson"></i>&nbsp;Your site name etc.&nbsp;<i class="fa-solid fa-house" style="color: crimson"></i></h3>
<!-- You can delete it. -->
<h4 align="center"><a href="https://www.benricho.org/" target="_blank" style="text-decoration: none;"><i class="fa-solid fa-house" style="color: blue"></i>&nbsp;みんなの知識 ちょっと便利帳&nbsp;<i class="fa-solid fa-house" style="color: blue"></i></h4>
</body>
</html>
Save the file with a ".php" extension.
 Download ZIP File with PHP Part Only   
You can run the program even with only the PHP section.
This concludes our introduction to the PHP code for the XML sitemap generator.
By using this code, may you achieve effective crawling and indexing from search engines, ensuring smooth site operation.
While there are paid programs with advanced features like automatic updates, here, the focus was on providing essential functionality.
There may be errors in the wording as it is a translation from the Japanese version. We apologize for the inconvenience, but if there is an error in the wording in the code, please correct or adjust it yourself.
Your feedback on using this code is appreciated. Please note that I cannot respond to questions.

おすすめサイト・関連サイト…

Last updated : 2024/06/29