fix(prebuild): escape-angles idempotency for HTML tags with attributes#343
Merged
Conversation
prebuild 跑 escape-angles 每次都把 leetcode markdown 注释里的 \`<img src="..." />\` / \`<a href="..." />\` 改成 \`<img src="..." />\`, 但 commit 进 git 后 main 上文件又是 \`<img>\` —— 永久脏。Backfill workflow ssh 进服务器要求 git status 干净,每次都在这两个文件上卡。 根因:negative lookahead /<(?![A-Za-z/][A-Za-z0-9:_-]*\s*\/?>)[^>]*>/ 只接受 \`<div>\` / \`</div>\` / \`<br />\` 这种**无属性**标签。带属性的合法 HTML(\`<img src="..." />\`、\`<a href="...">\`)会被误判为"可疑尖括号" 进入 escape 分支。 修:lookahead 加属性段 \`([ \\t][^<>]*)?\`,让 \`<tagname attr="val">\` / \`<tagname attr="val" />\` 都被识别为正常标签不 escape。\`[^<>]\` 避免 ReDoS 嵌套量词。 验证:跑两次 escape-angles,working tree 不变。之前 142.环形链表 II 和类似文件每次 build 都会脏,现在稳定。 不影响仍要 escape 的真正"可疑尖括号":\`<8>\` / \`<1,2,3>\` / \`<x, y>\` 仍被识别为非合法标签并 escape。
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes scripts/escape-angles.mjs so that running the prebuild step is idempotent by no longer escaping valid HTML/JSX tags that include attributes (e.g. <img src="..."/>, <a href="...">), which previously dirtied the git working tree and blocked workflows that require a clean git status.
Changes:
- Expanded the “valid tag” negative-lookahead pattern to recognize tags with attributes (and documented the intent/strategy in-file).
- Refactored the “valid tag” matcher into a named
VALID_TAG_LOOKAHEADconstant and used it to build the escaping regex.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+39
to
+42
| * 不接受样例(lookahead miss,进 escape 分支): | ||
| * <8> <1,2,3> <x, y> <not a tag> | ||
| */ | ||
| const files = await fg(["content/docs/**/*.md"], { dot: false }); | ||
| const VALID_TAG_LOOKAHEAD = /\/?[A-Za-z][A-Za-z0-9:_-]*([ \t][^<>]*)?\s*\/?>/; |
Comment on lines
63
to
68
| src = src | ||
| .replace(/<\d[^>]*>/g, (m) => | ||
| m.replaceAll("<", "<").replaceAll(">", ">"), | ||
| ) | ||
| .replace(/<(?![A-Za-z/][A-Za-z0-9:_-]*\s*\/?>)[^>]*>/g, (m) => | ||
| .replace(new RegExp(`<(?!${VALID_TAG_LOOKAHEAD.source})[^>]*>`, "g"), (m) => | ||
| m.replaceAll("<", "<").replaceAll(">", ">"), |
longsizhuo
pushed a commit
that referenced
this pull request
May 12, 2026
用户 review 后指出"丢西瓜捡芝麻"风险,复盘 Vercel 30 天 dashboard 后修订: ## 真实根因(不是 /_not-found) dashboard 30 天曲线显示 5/11 CPU 峰值 80-90min(基线 5-15min/day),完美 对应 SEO PR 落地时间: - 5/11 15:01-15:39 UTC: PR #341 (253 MDX descriptions + 32 新 EN 翻译) - 5/11 16:02-18:41 UTC: PR #342 (remark heading shift + leetcode dedup) - 5/11 19:01-19:27 UTC: PR #343 + #340,4 小时 4 次 deploy 清空 ISR 加上 deploy.yml 里 IndexNow 主动告诉 Bing 重抓 → 5/10-5/12 crawler 风暴。 **这是 SEO 工作 successful 的代价,不是 bug。** 真实流量。 /_not-found 静态化 + bot blocklist 是真实 waste 清理(保留),但不能独立 解释 4× 激增。 ## 撤回的两条 hack 1. Sentry tracesSampleRate 0.1 → 0.02:撤回,保持 10% observability 不能为这点 CPU 让步,10% 是行业标准,client/server/edge 三处必须一致才能跨 runtime 串联 trace。 2. fetchEvents 失败一律返空:改成只在 NEXT_PHASE === phase-production-build 时返空,运行时仍 throw 让 Sentry 抓真故障。否则 prod backend 挂了会被 误显示成"暂无活动",掩盖故障。 ## 保留的修复(best practice,不是 hack) - /_not-found ƒ → ○:根 404 本就不需要 i18n - proxy.ts bot blocklist:扫描器不该烧 Fluid - /[locale]/docs /events /login 缺 setRequestLocale → 补:SSG/ISR 本就该工作 - /editor /share cascade ●:纯 client component,安全 ## Build 验证 pnpm build 重跑: - /[locale]/events 仍是 ● ISR 5m 1y - [events] fetch failed at build, rendering empty shell(NEXT_PHASE guard 工作)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Prebuild 的
scripts/escape-angles.mjs每次跑都把 leetcode markdown 注释里的<img src="..." />改成<img src="..." />,导致:pnpm build每次都让 git working tree 脏Root cause
Negative lookahead 只接受无属性的合法标签:
/<(?![A-Za-z/][A-Za-z0-9:_-]*\s*\/?>)[^>]*>/<div>/</div>/<br /><img src="..." />/<a href="...">/<Component prop="val" />Fix
Lookahead 加属性段:
/\/?[A-Za-z][A-Za-z0-9:_-]*([ \t][^<>]*)?\s*\/?>/[ \t][^<>]*— 空格后跟任意非尖括号字符(属性段)[^<>]防止 ReDoS 嵌套量词(CodeQL 2026-05 已经在另一个文件提过同类)验证
跑两次 escape-angles 后 working tree 不变(之前每次跑都脏)。仍 escape:
<8><1,2,3>(数字开头)<x, y>(含逗号的"数学符号"型)不再误 escape:
<img src="..." /><a href="..." title="..."><Component prop="val" />Test plan
pnpm build后 git status 干净