fix: filter sitemap pathnames from generated sitemap.xml#19
Open
VictorD19 wants to merge 1 commit intomultivmlabs:mainfrom
Open
fix: filter sitemap pathnames from generated sitemap.xml#19VictorD19 wants to merge 1 commit intomultivmlabs:mainfrom
VictorD19 wants to merge 1 commit intomultivmlabs:mainfrom
Conversation
When sitemap plugins (e.g. @astrojs/sitemap) generate sitemap-index.xml, sitemap-0.xml, etc., these pathnames could leak into the aeo.js generated sitemap.xml as regular page URLs if they reach config.pages. Adds isSitemapPathname() guard to exclude any pathname starting with /sitemap from the generated sitemap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Someone is attempting to deploy a commit to the Cytonic Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When
@astrojs/sitemap(or similar plugins) is used alongside aeo.js, it generatessitemap-index.xml,sitemap-0.xml,sitemap-1.xml, etc. in the build output. If these pathnames reachconfig.pages— through manual configuration, a different framework plugin, or a future change in page discovery —generateSitemap()includes them as regular page URLs in the generatedsitemap.xml.Real-world scenario
Discovered on a production Astro site using
@astrojs/sitemap. The generatedsitemap-index.xmlwas being indexed as a regular page by external crawlers.What changed
File:
src/core/sitemap.tsAdded a
isSitemapPathname()guard that filters any pathname starting with/sitemapbefore it enters the generatedsitemap.xml:The regex
/^\/sitemap/imatches:/sitemap-index(generated by @astrojs/sitemap)/sitemap-0,/sitemap-1,/sitemap-2, .../sitemap-N/sitemap.xml(if it ever enters as a pathname)/sitemap(any variation)Test results
Simulated with
config.pagescontaining 5 real pages + 5 sitemap pathnames:Before fix (10 URLs in sitemap):
After fix (5 URLs in sitemap):
Impact
generateSitemap()consumes pages from multiple sources (framework plugins, manual config), so the filter guards against any path that feeds sitemap pathnames intoconfig.pages