Skip to content

fix(prebuild): escape-angles idempotency for HTML tags with attributes#343

Merged
longsizhuo merged 1 commit into
mainfrom
fix/escape-angles-idempotent
May 11, 2026
Merged

fix(prebuild): escape-angles idempotency for HTML tags with attributes#343
longsizhuo merged 1 commit into
mainfrom
fix/escape-angles-idempotent

Conversation

@longsizhuo
Copy link
Copy Markdown
Member

Summary

Prebuild 的 scripts/escape-angles.mjs 每次跑都把 leetcode markdown 注释里的 <img src="..." /> 改成 &lt;img src="..." /&gt;,导致:

  • pnpm build 每次都让 git working tree 脏
  • 服务器上 Backfill workflow(要求 git status 干净)反复 fail
  • 之前每个 PR 都要 git restore 这种产物文件再跑

Root cause

Negative lookahead 只接受无属性的合法标签:

/<(?![A-Za-z/][A-Za-z0-9:_-]*\s*\/?>)[^>]*>/
  • 命中(不 escape):<div> / </div> / <br />
  • 误判(被 escape)<img src="..." /> / <a href="..."> / <Component prop="val" />

Fix

Lookahead 加属性段:

/\/?[A-Za-z][A-Za-z0-9:_-]*([ \t][^<>]*)?\s*\/?>/
  • [ \t][^<>]* — 空格后跟任意非尖括号字符(属性段)
  • [^<>] 防止 ReDoS 嵌套量词(CodeQL 2026-05 已经在另一个文件提过同类)

验证

跑两次 escape-angles 后 working tree 不变(之前每次跑都脏)。仍 escape:

  • <8> <1,2,3> (数字开头)
  • <x, y> (含逗号的"数学符号"型)

不再误 escape:

  • <img src="..." />
  • <a href="..." title="...">
  • <Component prop="val" />

Test plan

  • CI: build / content-check 跑过
  • Backfill workflow 之后 main push 时不再卡在脏树
  • 用户本地跑 pnpm build 后 git status 干净

prebuild 跑 escape-angles 每次都把 leetcode markdown 注释里的
\`<img src="..." />\` / \`<a href="..." />\` 改成 \`&lt;img src="..." /&gt;\`,
但 commit 进 git 后 main 上文件又是 \`<img>\` —— 永久脏。Backfill workflow
ssh 进服务器要求 git status 干净,每次都在这两个文件上卡。

根因:negative lookahead

  /<(?![A-Za-z/][A-Za-z0-9:_-]*\s*\/?>)[^>]*>/

只接受 \`<div>\` / \`</div>\` / \`<br />\` 这种**无属性**标签。带属性的合法
HTML(\`<img src="..." />\`、\`<a href="...">\`)会被误判为"可疑尖括号"
进入 escape 分支。

修:lookahead 加属性段 \`([ \\t][^<>]*)?\`,让 \`<tagname attr="val">\` /
\`<tagname attr="val" />\` 都被识别为正常标签不 escape。\`[^<>]\` 避免
ReDoS 嵌套量词。

验证:跑两次 escape-angles,working tree 不变。之前 142.环形链表 II
和类似文件每次 build 都会脏,现在稳定。

不影响仍要 escape 的真正"可疑尖括号":\`<8>\` / \`<1,2,3>\` / \`<x, y>\`
仍被识别为非合法标签并 escape。
Copilot AI review requested due to automatic review settings May 11, 2026 19:27
@vercel
Copy link
Copy Markdown

vercel Bot commented May 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
involutionhell-github-io Building Building Preview, Comment May 11, 2026 7:27pm
website-preview Building Building Preview, Comment May 11, 2026 7:27pm

@longsizhuo longsizhuo merged commit 1dcd8ce into main May 11, 2026
7 of 9 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes scripts/escape-angles.mjs so that running the prebuild step is idempotent by no longer escaping valid HTML/JSX tags that include attributes (e.g. <img src="..."/>, <a href="...">), which previously dirtied the git working tree and blocked workflows that require a clean git status.

Changes:

  • Expanded the “valid tag” negative-lookahead pattern to recognize tags with attributes (and documented the intent/strategy in-file).
  • Refactored the “valid tag” matcher into a named VALID_TAG_LOOKAHEAD constant and used it to build the escaping regex.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/escape-angles.mjs
Comment on lines +39 to +42
* 不接受样例(lookahead miss,进 escape 分支):
* <8> <1,2,3> <x, y> <not a tag>
*/
const files = await fg(["content/docs/**/*.md"], { dot: false });
const VALID_TAG_LOOKAHEAD = /\/?[A-Za-z][A-Za-z0-9:_-]*([ \t][^<>]*)?\s*\/?>/;
Comment thread scripts/escape-angles.mjs
Comment on lines 63 to 68
src = src
.replace(/<\d[^>]*>/g, (m) =>
m.replaceAll("<", "&lt;").replaceAll(">", "&gt;"),
)
.replace(/<(?![A-Za-z/][A-Za-z0-9:_-]*\s*\/?>)[^>]*>/g, (m) =>
.replace(new RegExp(`<(?!${VALID_TAG_LOOKAHEAD.source})[^>]*>`, "g"), (m) =>
m.replaceAll("<", "&lt;").replaceAll(">", "&gt;"),
longsizhuo pushed a commit that referenced this pull request May 12, 2026
用户 review 后指出"丢西瓜捡芝麻"风险,复盘 Vercel 30 天 dashboard 后修订:

## 真实根因(不是 /_not-found)

dashboard 30 天曲线显示 5/11 CPU 峰值 80-90min(基线 5-15min/day),完美
对应 SEO PR 落地时间:
- 5/11 15:01-15:39 UTC: PR #341 (253 MDX descriptions + 32 新 EN 翻译)
- 5/11 16:02-18:41 UTC: PR #342 (remark heading shift + leetcode dedup)
- 5/11 19:01-19:27 UTC: PR #343 + #340,4 小时 4 次 deploy 清空 ISR

加上 deploy.yml 里 IndexNow 主动告诉 Bing 重抓 → 5/10-5/12 crawler 风暴。
**这是 SEO 工作 successful 的代价,不是 bug。** 真实流量。

/_not-found 静态化 + bot blocklist 是真实 waste 清理(保留),但不能独立
解释 4× 激增。

## 撤回的两条 hack

1. Sentry tracesSampleRate 0.1 → 0.02:撤回,保持 10%
   observability 不能为这点 CPU 让步,10% 是行业标准,client/server/edge
   三处必须一致才能跨 runtime 串联 trace。

2. fetchEvents 失败一律返空:改成只在 NEXT_PHASE === phase-production-build
   时返空,运行时仍 throw 让 Sentry 抓真故障。否则 prod backend 挂了会被
   误显示成"暂无活动",掩盖故障。

## 保留的修复(best practice,不是 hack)

- /_not-found ƒ → ○:根 404 本就不需要 i18n
- proxy.ts bot blocklist:扫描器不该烧 Fluid
- /[locale]/docs /events /login 缺 setRequestLocale → 补:SSG/ISR 本就该工作
- /editor /share cascade ●:纯 client component,安全

## Build 验证

pnpm build 重跑:
- /[locale]/events 仍是 ● ISR 5m 1y
- [events] fetch failed at build, rendering empty shell(NEXT_PHASE guard 工作)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants