fix(agent-skills): write skill files atomically to prevent partial reads#1144
Conversation
In-place writeFile truncates SKILL.md before rewriting it. AI coding agents (Claude Code, OpenCode) discover skills by globbing the skills tree for SKILL.md and parsing frontmatter; a scan landing inside that write window reads empty frontmatter and silently drops the skill. Write to a uniquely-named temp file (.SKILL.md.<pid>.<rand>.tmp, which never matches the SKILL.md glob) in the same directory, then rename() it into place. Same-dir guarantees same filesystem, so the rename is atomic on POSIX and a concurrent reader always sees the complete old or complete new file. The temp file is cleaned up best-effort on failure.
|
Codecov Results 📊✅ Patch coverage is 100.00%. Project has 5113 uncovered lines. Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
+ Coverage 81.52% 81.53% +0.01%
==========================================
Files 397 397 —
Lines 27671 27679 +8
Branches 17966 17966 —
==========================================
+ Hits 22556 22566 +10
- Misses 5115 5113 -2
- Partials 1867 1868 +1Generated by Codecov Action |
The sole prior atomic-write test ("leaves no temp files behind") passed even
when atomicWriteFile was reverted to a plain in-place writeFile — proven via
mutation — so it guarded nothing.
- Wrap node:fs/promises to observe the temp-file + rename mechanism and to
inject a rename failure.
- Add a guard asserting every writeFile targets a `.tmp` staging path and the
destination is published via rename. Fails under a non-atomic mutation.
- Add a rollback test: when rename fails, install returns null, the temp file
is cleaned up, and no partial SKILL.md is left — covering the previously
untested catch/cleanup branch.
- Clarify atomicWriteFile JSDoc: correct temp-file name pattern, note the
POSIX-only atomicity guarantee, and document the dir-exists precondition.
CI note: E2E
|
Add a test where both rename and the best-effort temp rm fail, so the inner cleanup catch in atomicWriteFile (captureException, not rethrow) runs and the original rename error still propagates to null. Brings agent-skills.ts to 100% line/function coverage; the new function is now fully exercised.
Problem
installAgentSkillswrites each skill file with an in-placewriteFile,which truncates the destination before rewriting it. AI coding agents
(Claude Code, OpenCode) discover skills by globbing the skills tree for
SKILL.mdand parsing the YAML frontmatter. If an agent's skill scan landsinside that truncate→rewrite window, it reads empty/partial frontmatter and
silently drops the skill.
Because the generated
SKILL.mdcarries a build-timestampversion, itscontent changes on every dev build, so the file is rewritten frequently —
making the race realistically reachable during normal install churn.
Fix
Add
atomicWriteFile(destPath, content)and use it inwriteSkillFiles:.<name>.<pid>.<rand>.tmpin thesame directory as the destination, then
rename()into place.concurrent reader always sees the complete old or complete new file.
.tmp-suffixed so it never matches theSKILL.mdglob agents look for, even while it briefly exists.debuglevel, no silent catch) and the original error is re-thrown.Tests
.tmpleftovers remain in the skill dir orreferences/after install (guards the rename/cleanup path).agent-skills.test.ts: 13/13 pass ·cli/setup.test.ts: 30/30 pass ·typecheck + biome clean.