Skip to content

fix: filter sitemap pathnames from generated sitemap.xml#19

Open
VictorD19 wants to merge 1 commit intomultivmlabs:mainfrom
VictorD19:fix/sitemap-pathname-filter
Open

fix: filter sitemap pathnames from generated sitemap.xml#19
VictorD19 wants to merge 1 commit intomultivmlabs:mainfrom
VictorD19:fix/sitemap-pathname-filter

Conversation

@VictorD19
Copy link
Copy Markdown

Problem

When @astrojs/sitemap (or similar plugins) is used alongside aeo.js, it generates sitemap-index.xml, sitemap-0.xml, sitemap-1.xml, etc. in the build output. If these pathnames reach config.pages — through manual configuration, a different framework plugin, or a future change in page discovery — generateSitemap() includes them as regular page URLs in the generated sitemap.xml.

Real-world scenario

Discovered on a production Astro site using @astrojs/sitemap. The generated sitemap-index.xml was being indexed as a regular page by external crawlers.

What changed

File: src/core/sitemap.ts

Added a isSitemapPathname() guard that filters any pathname starting with /sitemap before it enters the generated sitemap.xml:

+function isSitemapPathname(pathname: string): boolean {
+  return /^\/sitemap/i.test(pathname);
+}
+
 export function generateSitemap(config: ResolvedAeoConfig): string {
   const urls: string[] = [];

   // Add discovered pages from framework plugin
   if (config.pages && config.pages.length > 0) {
     for (const page of config.pages) {
+      if (isSitemapPathname(page.pathname)) continue;
       urls.push(`${config.url}${page.pathname === '/' ? '' : page.pathname}`);
     }
   }

The regex /^\/sitemap/i matches:

  • /sitemap-index (generated by @astrojs/sitemap)
  • /sitemap-0, /sitemap-1, /sitemap-2, ... /sitemap-N
  • /sitemap.xml (if it ever enters as a pathname)
  • /sitemap (any variation)

Test results

Simulated with config.pages containing 5 real pages + 5 sitemap pathnames:

Before fix (10 URLs in sitemap):

  ✅ OK    : https://example.com
  ✅ OK    : https://example.com/about
  ✅ OK    : https://example.com/contact
  ✅ OK    : https://example.com/experience
  ✅ OK    : https://example.com/projects
  ⚠️  LEAKED: https://example.com/sitemap-0
  ⚠️  LEAKED: https://example.com/sitemap-1
  ⚠️  LEAKED: https://example.com/sitemap-2
  ⚠️  LEAKED: https://example.com/sitemap-3
  ⚠️  LEAKED: https://example.com/sitemap-index

  Sitemap files incorrectly mapped as pages: 5

After fix (5 URLs in sitemap):

  ✅ OK    : https://example.com
  ✅ OK    : https://example.com/about
  ✅ OK    : https://example.com/contact
  ✅ OK    : https://example.com/experience
  ✅ OK    : https://example.com/projects

  Sitemap files incorrectly mapped as pages: 0

Impact

  • No breaking changes — only adds a filter, does not change any existing behavior for real pages
  • Minimal diff — 5 lines added, 0 lines removed
  • Defense in depthgenerateSitemap() consumes pages from multiple sources (framework plugins, manual config), so the filter guards against any path that feeds sitemap pathnames into config.pages

When sitemap plugins (e.g. @astrojs/sitemap) generate sitemap-index.xml,
sitemap-0.xml, etc., these pathnames could leak into the aeo.js generated
sitemap.xml as regular page URLs if they reach config.pages.

Adds isSitemapPathname() guard to exclude any pathname starting with
/sitemap from the generated sitemap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 27, 2026

Someone is attempting to deploy a commit to the Cytonic Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@VictorD19 VictorD19 closed this Mar 27, 2026
@VictorD19 VictorD19 deleted the fix/sitemap-pathname-filter branch March 27, 2026 12:54
@VictorD19 VictorD19 restored the fix/sitemap-pathname-filter branch March 27, 2026 12:55
@VictorD19 VictorD19 reopened this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant