Seo May 20, 2026 4 min read

XML Sitemap Validation: What to Check Before Submitting to Google

Learn how to validate your XML sitemap properly — syntax, schema, URL accessibility, and indexability checks that catch issues before Google does.

A broken sitemap is one of those silent SEO problems. Google won't email you. Your rankings won't crash overnight. But pages quietly stop getting indexed, new content takes weeks to appear in search, and you'll spend hours hunting for the cause. Validating your XML sitemap takes ten minutes and prevents all of that.

Here's exactly what to check, in the order that catches the most issues first.

1. Confirm the sitemap is actually reachable

Before you worry about XML syntax, make sure crawlers can fetch the file. Open the URL directly in a browser — typically https://yourdomain.com/sitemap.xml — and verify:

  • It returns HTTP 200 OK, not 301, 302, 403, or 404
  • The Content-Type header is application/xml or text/xml
  • It's not gated behind authentication, a paywall, or a geo-block
  • The file is under 50 MB uncompressed and contains 50,000 URLs or fewer (the official limits)

If your sitemap is gzipped (sitemap.xml.gz), confirm the server sends the correct Content-Encoding: gzip header. A surprising number of sitemaps fail here because a CDN strips compression headers.

2. Validate the XML syntax and schema

Sitemaps follow the sitemaps.org protocol, which is a strict XML schema. A single unescaped ampersand in a URL will invalidate the entire file.

The required structure

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2024-11-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Common syntax failures

  • Unescaped characters in URLs: & must be &amp;, ' must be &apos;, " must be &quot;
  • Missing XML declaration on line 1, or a stray BOM character before it
  • Wrong namespacexmlns must point to the sitemaps.org schema exactly
  • Invalid date format in <lastmod>: use W3C datetime (2024-11-15 or 2024-11-15T09:30:00+00:00)
  • Priority out of range — must be between 0.0 and 1.0

The fastest way to catch these is to run the file through AXOX Hub's Sitemap Checker, which parses against the official schema and flags malformed entries with line numbers.

3. Check every URL is canonical and indexable

A valid XML file isn't enough. Google will fetch your sitemap, then check whether each URL is one it should actually index. Mismatches between sitemap URLs and on-page signals are one of the most common reasons pages stay out of the index.

For each URL in your sitemap, verify:

  1. It returns a 200 status, not a redirect or error
  2. The page does not have <meta name="robots" content="noindex">
  3. It's not blocked by robots.txt
  4. The <link rel="canonical"> tag points to itself, not a different URL
  5. The protocol and subdomain match your preferred version (https://www vs https://)

If your sitemap lists http://example.com/page but the page canonicalises to https://www.example.com/page, Google treats the sitemap URL as a hint it can ignore. Multiply that across thousands of URLs and your crawl budget evaporates.

4. Reference the sitemap from robots.txt

Submitting through Google Search Console is good, but adding a Sitemap: directive in robots.txt is what other crawlers (Bing, DuckDuckGo, Yandex) rely on.

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Use the full absolute URL, not a relative path. If you have multiple sitemaps, list each on its own Sitemap: line — or better, point to a sitemap index file.

5. Validate sitemap index files separately

If your site has more than 50,000 URLs, you'll use a sitemap index that references child sitemaps. The schema is different — note sitemapindex instead of urlset:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2024-11-15</lastmod>
  </sitemap>
</sitemapindex>

Check that:

  • Every child sitemap URL also returns a valid sitemap (recursive validation)
  • No child sitemap is itself an index — nesting indexes is not allowed
  • The index file itself is under 50 MB and 50,000 child sitemap entries

6. Cross-check against Google Search Console

After submission, watch the Sitemaps report. Useful signals:

  • Discovered URLs vs Indexed URLs — a large gap means Google is rejecting submissions for quality or duplication reasons
  • "Couldn't fetch" — almost always a server, redirect, or robots.txt issue
  • "Sitemap could be a HTML page" — your server is returning HTML (often a 404 page) with a 200 status

The last one is sneaky. Curl the sitemap URL and inspect the raw response body, not just the status code.

7. Re-validate after every deployment

Sitemaps drift. A CMS update changes URL slugs, a plugin starts including draft posts, someone toggles noindex on a template. Build sitemap validation into your release checklist:

  • Validate XML syntax in CI before pushing to production
  • Spot-check 10–20 random URLs from the live sitemap weekly
  • Monitor the Search Console Sitemaps report monthly
  • Re-run a full validation any time you migrate domains, change URL structures, or move CMS

Run your current sitemap through the free AXOX Hub Sitemap Checker to catch syntax errors, broken URLs, and indexability conflicts in one pass — no signup, no scan limits.

Try the free tool

Open Tool