-
Notifications
You must be signed in to change notification settings - Fork 7
Add markdown alternate links for LLM training data discovery #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add <link rel="alternate" type="text/markdown"> to page headers pointing to .md version - Improve MDX-to-markdown compilation to produce clean markdown output - Preserve code blocks and frontmatter while stripping JSX components Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pages that only contain React components (like the landing page) now return a helpful markdown response with the title, description, and a link to the full interactive page. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
- Add dedent function to normalize indentation when extracting content from JSX components - Add normalizeIndentation function to clean up stray whitespace while preserving meaningful markdown indentation (nested lists, blockquotes) - Move list detection regex patterns to module top level for performance - Ensures code block markers (```) start at column 0 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous regex patterns `["']?([^"'\n]+)["']?` would truncate text at the first apostrophe (e.g., "Arcade's" became "Arcade"). This fix: - Uses separate patterns for double-quoted, single-quoted, and unquoted values - Requires closing quotes to be at end of line to prevent apostrophes from being misinterpreted as closing delimiters - Adds stripSurroundingQuotes helper for fallback cases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When x-pathname header is not set, pathname defaults to "/" which would produce an invalid alternate link "https://docs.arcade.dev/.md". Only render the alternate link when we have a real page path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@evantahler Got a minute to have a second look? |
|
I like the If the goal is to keep html fragments, buy 0-out react, I'd suggest an alternative approach:
|
|
@evantahler Dammit! So when agents parse markdown, HTML and MDX getting mixed in there make it hard on them. IIIRC from the friend who did this, the entire thing needs rendering down to markdown. What approach do you recommend in light of this? |
|
Actually, let me just try parsing the HTML back into Markdown. We have some complex MDX. |
- Add scripts/generate-markdown.ts to pre-render MDX to markdown - Update proxy.ts to serve static .md files from public/ - Delete API route in favor of static file serving - Add link rewriting to add /en/ prefix and .md extension - Add markdown-friendly component implementations - Fix localhost URL in gmail integration page Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This reverts commit d7b7c71.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Summary
<link rel="alternate" type="text/markdown">to all page headers, pointing to the.mdversion of each page.mdURLs return clean, readable markdown instead of raw MDXThis enables LLM crawlers and training pipelines to discover and consume the markdown versions of our documentation, similar to what Vercel does with their docs.
Test plan
<link rel="alternate" type="text/markdown" href="...">https://docs.arcade.dev/en/get-started/quickstarts/call-tool-agent.md- should return clean markdown with code blocks preservedhttps://docs.arcade.dev/en/home.md- should return fallback content with title/description and link to full page🤖 Generated with Claude Code
Note
Introduces a clean markdown surface for each docs page and links to it from HTML.
<link rel="alternate" type="text/markdown">inapp/layout.tsxfor all non-root pages, pointing to.../<path>.mdapp/api/markdown/[[...slug]]/route.ts:Content-Type: text/markdownand returns 404 for missing pagesWritten by Cursor Bugbot for commit b0682e8. This will update automatically on new commits. Configure here.