XML Sitemap Generators: The Complete Guide for 2026
What an XML sitemap generator does, how Google actually reads sitemaps in 2026, and how to pick the right one for your site.
XML Sitemap Generators: The Complete Guide for 2026
An XML sitemap generator is a tool that builds — and ideally maintains — the sitemap.xml file that tells Google which pages on your site to crawl and index. Pick the wrong one and you'll spend the next year wondering why pages quietly disappear from Google every time you publish, migrate, or re-deploy.
This guide covers what a sitemap generator actually does, what Google reads in 2026 (hint: not what most tutorials still teach), the five types of generator you can pick from, size limits that will catch you out at scale, and a checklist you can run through today.
Table of contents
- What an XML sitemap is — and isn't — in 2026
- What an XML sitemap generator actually does
- The five types of sitemap generator
- What Google actually reads in your sitemap
- Sitemap size limits: 50,000 URLs and 50MB
- Sitemap best practices checklist
- Why hosting matters more than generation
- How to choose a generator (decision tree)
- FAQ
What an XML sitemap is — and isn't — in 2026
An XML sitemap is a plain text file in a specific format that lists the pages on your site you want search engines to know about. It's hosted at a stable URL — usually https://example.com/sitemap.xml — and linked from your robots.txt so crawlers find it.
Here's a minimal valid sitemap:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-04-15</lastmod>
</url>
<url>
<loc>https://example.com/pricing</loc>
<lastmod>2026-04-12</lastmod>
</url>
</urlset>
That's it. Two required pieces: the XML namespace declaration and a <url> block with a <loc> tag for every page.
What a sitemap is not:
- It is not a ranking signal. Being listed does not make a page rank.
- It is not a guarantee of indexing. Google still decides whether each page is worth keeping.
- It is not a substitute for internal linking. A page that only lives in your sitemap and has zero internal links is a page Google will likely ignore anyway.
Think of it as a courtesy: "Hey Google, here's my full list of pages in one place, so you don't have to reconstruct it from my navigation."
What an XML sitemap generator actually does
The naïve version is simple: walk a website, collect every URL that returns 200, write them into an XML file.
The non-naïve version — the one that keeps working six months after launch — has to solve four problems at once:
- Discovery. Which URLs exist? A sitemap that misses half your blog is a broken sitemap.
- Filtering. Which URLs should be excluded?
noindexpages, admin routes, expired products, canonicalised duplicates, paginated archives, search-result URLs — none belong in a sitemap. - Freshness. When did each page last change? Google uses this to schedule re-crawls. A stale sitemap sends Google to stale pages.
- Hosting. Where does the file live, and how does it update without you touching your app?
Most "free online sitemap generator" tools solve problem 1 and stop there. Plugin-based generators solve 1–3 but not 4. Hosted SaaS generators — the category we built Indexly to occupy — solve all four and keep solving them without you pressing a button.
The five types of sitemap generator
Not every tool is competing for the same job. Here's the honest landscape:
| Type | Example | Best for | Breaks when... |
|---|---|---|---|
| Online one-off | xml-sitemaps.com | Quick audits, static sites | You publish anything new |
| CMS plugin | Yoast, RankMath | WordPress under ~5k pages | Plugin conflicts, cache layers, DB bloat |
| Code library | spatie/laravel-sitemap | Full control in a codebase | You re-deploy on every content change |
| Build-time generator | next-sitemap, nuxt-sitemap | JAMstack with frequent rebuilds | Content changes between builds |
| Hosted SaaS | Indexly | Any site, any stack, hands-off | Rarely — the whole point is it doesn't |
Online one-off generators
You paste a URL, a site crawls it, you download a file, you upload it to your server. Zero recurring cost. Zero maintenance. Zero automation — which is the problem. The moment you publish a new post, the sitemap is wrong. Fine for a one-off audit. Not a production solution.
CMS plugins
Yoast and RankMath on WordPress, built-in generators on Shopify and Webflow. These ship with your CMS and update automatically when content changes.
The downsides stack up at scale: they generate the sitemap on every request (slow), can conflict with caching layers (broken sitemaps), bloat the database (pagination queries get heavy past ~10k pages), and tie your SEO tooling to a specific CMS. Change platforms and you start over.
Code libraries
Packages like spatie/laravel-sitemap give developers a programmatic API: define routes, generate the file, commit it. Maximum control, zero magic.
The trade-off is time: every change to your content requires a code change or a scheduled job that re-generates the file. You own the infrastructure, which means you also own the on-call pager when it breaks.
Build-time generators
next-sitemap, @nuxtjs/sitemap, Astro's sitemap integration — these run at npm run build and commit the file to your deployed output. Perfect for JAMstack sites where every content change triggers a rebuild.
Not perfect if your content updates more often than your builds. A blog that pushes three posts a day but re-deploys weekly has a sitemap that's always six days stale.
Hosted SaaS generators
A service that crawls your site on a schedule from its own infrastructure, generates the sitemap, hosts it on a CDN, and gives you a stable URL to submit to Google. No code, no plugin, no deploy.
This is the category Indexly lives in. Paste a URL, pick a schedule, the sitemap stays current forever. If you don't want to think about sitemaps again, this is where to look.
What Google actually reads in your sitemap
Most sitemap tutorials still show you how to carefully tune <priority> and <changefreq> values, as though it matters. It hasn't for years.
Here's Google's official position from Search Central: Google ignores <priority> and <changefreq> values . Google only uses <lastmod> when the date is consistently and verifiably accurate — meaning your CMS isn't just rewriting it on every save.
Google's Gary Illyes went further and publicly called those tags a "bag of noise" . That is, more or less, the corporate position.
The only three things that matter
1. <loc> — the URL. Required. Must be the fully-qualified absolute canonical URL, UTF-8 encoded, URL-escaped if it contains special characters.
2. <lastmod> — the last modification date. Optional, but Google uses it if it's accurate. "Accurate" means the date actually reflects when the page's main content changed — not when your CMS last touched the database row. If you lie (or your CMS lies on your behalf), Google stops trusting the field and starts ignoring it.
3. Which pages you include at all. A sitemap full of 404s, redirects, noindex pages, or canonicalised duplicates is a sitemap Google treats as low-signal. Quality beats quantity every time.
What to stop doing
- Stop setting every page to
<priority>1.0</priority>. Google doesn't read it. - Stop writing
<changefreq>daily</changefreq>on pages that haven't changed in four years. - Stop generating
<lastmod>fromNOW()on every sitemap build. That's exactly the kind of noise that makes Google ignore the field.
The best sitemap is short on metadata and accurate on the metadata it keeps.
Sitemap size limits: 50,000 URLs and 50MB
Google's hard limits are simple: 50,000 URLs or 50MB uncompressed per sitemap file, whichever you hit first. Exceed either and Google rejects the whole file — not just the overflow.
When you need a sitemap index
Once you pass 50,000 URLs, you split the sitemap into chunks (often by section: /blog-sitemap.xml, /products-sitemap.xml) and reference them from a single sitemap index file:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemaps/products-1.xml</loc>
<lastmod>2026-04-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemaps/products-2.xml</loc>
<lastmod>2026-04-15</lastmod>
</sitemap>
</sitemapindex>
You submit the index file to Google Search Console; Google follows the references and reads each child sitemap. A sitemap index can contain up to 50,000 loc tags , meaning in theory 2.5 billion URLs per index. You will not hit that ceiling.
If you're approaching the 50k limit, most generators break here — either because they can't chunk, or because they can chunk but can't keep the index file coherent when one child sitemap updates. This is where hosted tools earn their keep; we cover sharding and index orchestration in detail in our guide to sitemaps for large sites.
Sitemap best practices checklist
Print this. Pin it to the monitor.
- One sitemap per site, under 50,000 URLs and 50MB. Split into an index beyond that.
- Only canonical, indexable URLs. No
noindex, no redirects, no 404s, no duplicates. - Fully qualified absolute URLs.
https://example.com/page, not/page. - UTF-8 encoded. No exceptions.
- Accurate
<lastmod>dates — or none at all. Fake dates are worse than missing ones. - Listed in
robots.txt. One line:Sitemap: https://example.com/sitemap.xml. - Submitted in Google Search Console.
robots.txtlisting alone is not enough. - Gzip-compressed if large. Saves bandwidth. Google accepts
.xml.gz. - Updated automatically when pages are added or removed. Manual upkeep always drifts.
- Monitored for errors. Search Console reports them. Check weekly.
The last two are where 90% of sitemaps quietly rot over twelve months.
Why hosting matters more than generation
Most sitemap guides stop at "how to generate the file." That's the easy part. The hard part is hosting it reliably for years.
A sitemap that's wrong for a week is worse than no sitemap. Google fetches your sitemap on its own schedule; if it lands on a 500 error, a stale file, a redirect loop, or a sitemap with 30% dead links, Google silently down-weights how much it trusts that sitemap going forward.
Things that go wrong with self-hosted sitemaps:
- Your app returns a 500 when the sitemap controller OOMs past 20k URLs.
- Your CDN caches the sitemap for 24 hours, serving yesterday's file to Googlebot.
- A deploy wipes
/public/sitemap.xmland you don't notice for two weeks. - Your cron job silently fails after a dependency upgrade.
- The file grows past 50MB and starts getting rejected.
Offloading hosting to a service purpose-built for it eliminates all five. Indexly serves sitemaps from Cloudflare R2 via the Cloudflare CDN — the URL pattern looks like https://cdn.indexly.dev/sitemaps/{user_id}/{site_id}/{crawl_id}/sitemap.xml, cached for one hour, backed by infrastructure that doesn't share a fate with your app. Whatever happens to your application server, the sitemap keeps serving.
For agency users, that URL can sit behind a custom subdomain — sitemaps.yourclient.com — so clients never see an Indexly URL at all. More on that in the agency white-label guide.
How to choose a generator (decision tree)
Walk this top to bottom. The first "yes" is your answer.
Is your site fully static, under 100 pages, and rarely updated? → Use an online one-off generator. Re-run it when you publish.
Is your site on WordPress, under 5,000 pages, with standard plugins? → Yoast or RankMath. Free, reliable at that scale.
Is your site a JAMstack app (Next.js, Nuxt, Astro, Hugo) with scheduled deploys on every content change?
→ A build-time generator. next-sitemap or its equivalent.
Are you on a custom stack (Laravel, Rails, Django), comfortable maintaining code, and happy to own the on-call? → A code library. You get control and you take the pager.
Do you want a sitemap that just works — across any stack, with zero code, zero plugin, automatic updates, a hosted URL, and alerts when pages are added or removed? → A hosted SaaS generator. Indexly does exactly this. Paste a URL, pick a schedule, done.
The right generator depends on how much you want to own and how much you want to outsource. There's no universal answer — but there's a wrong answer, which is "the one I set up two years ago and haven't looked at since."
FAQ
What is an XML sitemap generator?
An XML sitemap generator is a tool that discovers the pages on your website and writes them into an XML file in the format search engines require. Good generators also update the file automatically, host it at a stable URL, and track which pages have been added or removed between runs so you catch problems early.
Do I still need an XML sitemap in 2026?
Yes — especially for sites over a few hundred pages, new sites with few backlinks, and any site where internal linking is incomplete. Google can crawl without a sitemap, but one speeds up discovery, helps find orphaned pages, and unlocks index-coverage diagnostics in Search Console you cannot get otherwise.
Does <priority> or <changefreq> affect my rankings?
No. Google officially ignores both tags and has for years. Setting every page to priority 1.0 does nothing. Focus on making sure your <lastmod> dates are accurate — that is the only metadata field Google actively uses, and only when it trusts your dates are real rather than auto-generated noise.
How often should my sitemap update?
It should update whenever content changes — ideally within 24 hours. Static sites can get away with weekly. News sites, e-commerce stores, and anything publishing daily need daily or faster. A sitemap that lags your content by a week pushes Google toward discovering pages the slow way, through crawling links.
What happens if my sitemap has more than 50,000 URLs?
Google rejects the whole file. You need to split it into multiple sitemaps, each under 50,000 URLs and 50MB, then reference them from a single sitemap index file. The index itself can point to up to 50,000 child sitemaps, so practical scale is effectively unlimited — you just need a tool that handles the chunking.
Can I have a sitemap generator for a JavaScript-heavy site?
Yes, but only if your pages render to HTML for crawlers. Pure client-only SPAs need server-side rendering or pre-rendering before Googlebot — and any sitemap tool — can discover them. Fix the rendering layer first, then point any generator at the site. This is a wider SEO requirement, not a sitemap one.
The bottom line
An XML sitemap is one of the cheapest, most reliable levers you have for getting pages indexed — when it's accurate. The problem is that most sitemaps drift out of sync with reality within months of going live, and nobody notices until organic traffic drops.
The best sitemap generator is the one you don't have to think about after you set it up. If that sounds like what you want, try Indexly free — one site, 500 pages, no credit card. Paste your URL, get a hosted sitemap URL you can submit to Google in sixty seconds. Every page found. Every page indexed.
Indexly Team
Writing about SEO, sitemaps, and how to get every page indexed by Google.
Enjoyed this post?
Get our next one delivered to your inbox — no spam, ever.
Ready to get your site fully indexed?
Get started free