feat: add skill command drift check (CI gate)#1244
Conversation
Skill docs reference lark-cli commands that can drift from the real CLI (renamed commands, removed flags, typo'd methods), so AI-generated commands silently fail. This adds a CI gate that catches such drift. The catalog of valid commands is built in-process from cmd.Build() — the same command tree the binary uses — so it can never drift from reality. The checker (internal/skilllint) extracts command references from skill markdown (bash blocks, inline code, bare prose, frontmatter cliHelp), resolves each by longest-prefix match, and reports unknown commands/subcommands/flags. Placeholder syntax (<resource>, [flags]) and prose are filtered to avoid false positives. TestSkillCommandDrift wires the catalog to a full skills/ scan and fails on any reference absent from the catalog, except those recorded in a baseline of known pre-existing drift (30 findings across 5 domains in cmd/testdata/skill_command_baseline.tsv). This blocks newly introduced errors while existing drift is cleaned up incrementally; regenerate with UPDATE_SKILL_BASELINE=1 go test ./cmd/ -run TestSkillCommandDrift. skill-command-check.yml runs the gate on changes to skills/, cmd/, or shortcuts/, so a command rename that breaks an unchanged doc is caught too; the existing ci.yml test job also covers it.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR introduces ChangesSkill command-lint validation system
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@c76a244bd1a2f091eddb6c762d91228651792c38🧩 Skill updatenpx skills add larksuite/cli#feat/skill-command-lint -y -g |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1244 +/- ##
==========================================
+ Coverage 69.21% 70.00% +0.78%
==========================================
Files 639 649 +10
Lines 59794 60482 +688
==========================================
+ Hits 41389 42340 +951
+ Misses 15059 14834 -225
+ Partials 3346 3308 -38 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/skill-command-check.yml:
- Line 36: Update the checkout step that uses "uses:
actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd" to disable persisted
credentials by adding a with block containing "persist-credentials: false";
specifically modify the checkout job step where actions/checkout is referenced
so it includes the with: persist-credentials: false setting to avoid leaving
authentication credentials available to subsequent steps.
In `@cmd/skilllint_test.go`:
- Around line 63-68: The filepath.WalkDir callback in collectSkillFindings
currently swallows callback errors by always returning nil; change the callback
to return the incoming err when it's non-nil (i.e. if err != nil { return err })
and only append files when err == nil and !d.IsDir() and strings.HasSuffix(p,
".md"), so unreadable/broken entries propagate up from WalkDir instead of being
silently ignored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e7af38bd-f640-4595-9682-c1ca14f0d160
⛔ Files ignored due to path filters (1)
cmd/testdata/skill_command_baseline.tsvis excluded by!**/*.tsv
📒 Files selected for processing (7)
.github/workflows/skill-command-check.ymlcmd/skilllint_test.gointernal/skilllint/baseline.gointernal/skilllint/catalog.gointernal/skilllint/check.gointernal/skilllint/parse.gointernal/skilllint/skilllint_test.go
An independent re-verification found the command catalog could be built from a stale ~/.lark-cli/cache/remote_meta.json, which misses service commands that the repo's embedded metadata actually defines. That produced ~19 false positives (slides xml_presentation.slide get/replace, mail user_mailbox.templates, drive permission.public patch) on machines with an old cache — non-reproducible and wrong. Add a cmd-package TestMain pinning LARKSUITE_CLI_REMOTE_META=off so the catalog is always built from embedded metadata (the repo's source of truth), identical across machines and CI. No cmd/ test depends on remote meta. Baseline corrected from 30 to 11 real findings (mail batch_modify/modify, mail +triage --filter, drive upload_prepare/finish, base +field-create/+field-update join, vc meeting get with_participants-in-params).
- skill-command-check.yml: set persist-credentials: false on checkout (job only reads the repo and runs tests; satisfies zizmor artipacked) - cmd/skilllint_test.go: return WalkDir callback errors instead of silently skipping entries (fail closed, keeps the drift gate strict)
Address code-review findings on the command-reference parser: - Blocker: join backslash line-continuations so flags on the continuation lines of a multi-line `lark-cli ... \` example are actually checked - Major: allow underscores in flag names (--input_format was truncated to --input and misreported); strip $(...) command substitutions so inner-command flags (e.g. jq -n --arg) aren't taken as lark-cli flags; strip trailing sentence/CJK punctuation from command words; cut commands at glued separators (get; / foo|) - Minor: LoadBaseline handles a nil reader and its error is checked at the call site; cache the sorted path list used for suggestions - Tests: add parser robustness cases Baseline unchanged: still 11 references / 10 deduped fingerprints. (Deferred for discussion: leading global flags, config-dir isolation, TestMain scope, CI paths for meta sources.)
Per design decision, the CI gate now only checks the command references a PR adds/changes in skill docs, instead of scanning all skills against a baseline. - TestSkillCommandDrift diffs against the PR base (SKILL_LINT_BASE, default origin/main), parses each changed skill file in full so backslash continuations join, and checks only refs whose start line changed - new internal/skilllint/diff.go: extract changed line numbers from git diff, filter refs by line - remove the baseline (internal/skilllint/baseline.go + the .tsv) — pre-existing drift is never scanned now, so no baseline is needed - workflow: fetch full history and pass the PR base; local runs without a base skip instead of fail Pre-existing drift stays tracked in the inventory doc; reverse drift (a command rename breaking an unchanged doc) is out of scope for this gate.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/skilllint/diff.go (1)
74-82: Confirm intended behavior: refs are filtered by start line only, so changes on continuation lines are missed.
FilterRefsByLineskeeps aRefonly whenr.Line(the line where the command starts) is in the changed set. SinceParseRefsjoins backslash-continuations into a singleRefanchored at the first line, a PR that adds/edits only a continuation line (e.g. appends--bad-flagon the next line) of a command whose first line is unchanged will not be re-checked — the gate silently passes that drift.If catching continuation-line edits is in scope,
Refwould need to expose its full line span (start..end) and the filter should match any line in that span. If this narrow gap is an accepted tradeoff, no change needed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/skilllint/diff.go` around lines 74 - 82, FilterRefsByLines currently only checks Ref.Line (start line) so continuation-line edits are missed; update the Ref type (e.g., add EndLine or LinesSpan fields populated by ParseRefs when joining backslash continuations) and change FilterRefsByLines to keep a Ref if any line in its span intersects the changed-lines map (iterate from Start to End or check intersection) instead of only r.Line; alternatively, if you intentionally accept the tradeoff, add a clarifying comment on FilterRefsByLines referencing ParseRefs behavior and leave as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/skilllint_test.go`:
- Around line 77-92: The test TestSkillCommandDrift currently unconditionaly
calls t.Skipf when the git diff (gitCmd.Run()) fails; change this so that if
SKILL_LINT_BASE was explicitly provided (the variable base is not the
empty/default "origin/main") the test fails instead of skipping: detect failure
from gitCmd.Run(), and if base != "origin/main" (or equivalently SKILL_LINT_BASE
was set) call t.Fatalf (or t.Errorf+return) with a clear message including err
and strings.TrimSpace(stderr.String()); only call t.Skipf when base was the
default (origin/main) to preserve local dev behavior. Ensure references to base,
SKILL_LINT_BASE, gitCmd and the current t.Skipf call are updated accordingly.
---
Nitpick comments:
In `@internal/skilllint/diff.go`:
- Around line 74-82: FilterRefsByLines currently only checks Ref.Line (start
line) so continuation-line edits are missed; update the Ref type (e.g., add
EndLine or LinesSpan fields populated by ParseRefs when joining backslash
continuations) and change FilterRefsByLines to keep a Ref if any line in its
span intersects the changed-lines map (iterate from Start to End or check
intersection) instead of only r.Line; alternatively, if you intentionally accept
the tradeoff, add a clarifying comment on FilterRefsByLines referencing
ParseRefs behavior and leave as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a4866fbf-6187-41f4-ad0c-348c7e9e12d6
📒 Files selected for processing (4)
.github/workflows/skill-command-check.ymlcmd/skilllint_test.gointernal/skilllint/diff.gointernal/skilllint/skilllint_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/skill-command-check.yml
| func TestSkillCommandDrift(t *testing.T) { | ||
| base := os.Getenv("SKILL_LINT_BASE") | ||
| if base == "" { | ||
| base = "origin/main" | ||
| } | ||
|
|
||
| // PR-style diff (merge-base..HEAD), zero context, limited to skill docs. | ||
| // Run from the repo root (the parent of cmd/). | ||
| gitCmd := exec.Command("git", "-C", "..", "diff", "-U0", base+"...HEAD", "--", "skills") | ||
| var stdout, stderr bytes.Buffer | ||
| gitCmd.Stdout, gitCmd.Stderr = &stdout, &stderr | ||
| if err := gitCmd.Run(); err != nil { | ||
| t.Skipf("diff-mode skill check skipped: cannot diff against %q (%v: %s). "+ | ||
| "CI supplies the base via SKILL_LINT_BASE and fetched history.", | ||
| base, err, strings.TrimSpace(stderr.String())) | ||
| } |
There was a problem hiding this comment.
Fail (don't skip) when the diff base was explicitly provided.
When SKILL_LINT_BASE is set, this is a CI run where the base history is expected to be fetched. If git diff fails there (e.g. shallow clone, unfetched base), the gate t.Skipfs and silently passes — a misconfigured CI step disables the drift check without any signal. Keep the skip only for the local/default-base case where origin/main may genuinely be absent.
Suggested fix
base := os.Getenv("SKILL_LINT_BASE")
+ explicitBase := base != ""
if base == "" {
base = "origin/main"
}
@@
if err := gitCmd.Run(); err != nil {
+ if explicitBase {
+ t.Fatalf("skill check: cannot diff against explicitly provided base %q (%v: %s). "+
+ "CI must fetch the base history before running this gate.",
+ base, err, strings.TrimSpace(stderr.String()))
+ }
t.Skipf("diff-mode skill check skipped: cannot diff against %q (%v: %s). "+
"CI supplies the base via SKILL_LINT_BASE and fetched history.",
base, err, strings.TrimSpace(stderr.String()))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestSkillCommandDrift(t *testing.T) { | |
| base := os.Getenv("SKILL_LINT_BASE") | |
| if base == "" { | |
| base = "origin/main" | |
| } | |
| // PR-style diff (merge-base..HEAD), zero context, limited to skill docs. | |
| // Run from the repo root (the parent of cmd/). | |
| gitCmd := exec.Command("git", "-C", "..", "diff", "-U0", base+"...HEAD", "--", "skills") | |
| var stdout, stderr bytes.Buffer | |
| gitCmd.Stdout, gitCmd.Stderr = &stdout, &stderr | |
| if err := gitCmd.Run(); err != nil { | |
| t.Skipf("diff-mode skill check skipped: cannot diff against %q (%v: %s). "+ | |
| "CI supplies the base via SKILL_LINT_BASE and fetched history.", | |
| base, err, strings.TrimSpace(stderr.String())) | |
| } | |
| func TestSkillCommandDrift(t *testing.T) { | |
| base := os.Getenv("SKILL_LINT_BASE") | |
| explicitBase := base != "" | |
| if base == "" { | |
| base = "origin/main" | |
| } | |
| // PR-style diff (merge-base..HEAD), zero context, limited to skill docs. | |
| // Run from the repo root (the parent of cmd/). | |
| gitCmd := exec.Command("git", "-C", "..", "diff", "-U0", base+"...HEAD", "--", "skills") | |
| var stdout, stderr bytes.Buffer | |
| gitCmd.Stdout, gitCmd.Stderr = &stdout, &stderr | |
| if err := gitCmd.Run(); err != nil { | |
| if explicitBase { | |
| t.Fatalf("skill check: cannot diff against explicitly provided base %q (%v: %s). "+ | |
| "CI must fetch the base history before running this gate.", | |
| base, err, strings.TrimSpace(stderr.String())) | |
| } | |
| t.Skipf("diff-mode skill check skipped: cannot diff against %q (%v: %s). "+ | |
| "CI supplies the base via SKILL_LINT_BASE and fetched history.", | |
| base, err, strings.TrimSpace(stderr.String())) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/skilllint_test.go` around lines 77 - 92, The test TestSkillCommandDrift
currently unconditionaly calls t.Skipf when the git diff (gitCmd.Run()) fails;
change this so that if SKILL_LINT_BASE was explicitly provided (the variable
base is not the empty/default "origin/main") the test fails instead of skipping:
detect failure from gitCmd.Run(), and if base != "origin/main" (or equivalently
SKILL_LINT_BASE was set) call t.Fatalf (or t.Errorf+return) with a clear message
including err and strings.TrimSpace(stderr.String()); only call t.Skipf when
base was the default (origin/main) to preserve local dev behavior. Ensure
references to base, SKILL_LINT_BASE, gitCmd and the current t.Skipf call are
updated accordingly.
Redirect the check at the real target: the example commands embedded in each
shortcut's Tips ("Example: lark-cli ..."), validated against the live command
tree. A shortcut without an example is skipped; a wrong example is a bug fixed
at the source. Full self-consistency check (examples and command definitions
are in the same Go code), so it needs no baseline, no diff, no skill-doc scan.
- TestShortcutExampleCommands iterates shortcuts.AllShortcuts(), parses example
commands from Tips, and checks them against cmd.Build()'s tree
- drop skill-doc diff scanning and internal/skilllint/diff.go
- workflow: full run on shortcut/cmd/registry/checker changes (no diff base)
meta.json has no command examples today (only field values) so it is not
checked; the same logic covers it if command examples are ever added.
Problem
Skill docs (
skills/**) teach AI agents how to invoke lark-cli, but the command examples drift from the real CLI when commands are renamed, flags removed, or method names typo'd — so AI-generated commands silently fail. Nothing currently validates command / flag /--helpcorrectness in skill docs.How it works
cmd.Build()— the exact command tree the binary uses (native commands + all+shortcuts). It cannot drift from reality and needs no hand-maintained list.internal/skilllint): extracts command references from skill markdown (bash blocks, inline code, bare prose, frontmattercliHelp), resolves each by longest-prefix match (separating real subcommands from positionals), and reports unknown commands / subcommands / flags. Placeholder syntax (<resource>,[flags],+<verb>) and prose are filtered to avoid false positives.TestSkillCommandDrift): scans allskills/**/*.mdand fails on any reference absent from the catalog, except those recorded in a baseline of known pre-existing drift.Existing drift (baseline)
The full scan surfaced 30 references across 5 domains that don't match the current CLI. They are recorded in
cmd/testdata/skill_command_baseline.tsvso they don't block PRs, while the gate hard-fails any newly introduced error:slides xml_presentation.slide get/replace(no such command)batch_modify_message→batch_modify;user_mailbox.templates(no such resource)files upload_prepare/upload_finish;permission.public patch+field-create/+field-update(two commands joined by/)vc meeting get --with-participants(no such flag)These are left for follow-up. After fixing, regenerate with
UPDATE_SKILL_BASELINE=1 go test ./cmd/ -run TestSkillCommandDrift.Triggers
skill-command-check.ymlruns on changes toskills/,cmd/, orshortcuts/— so both a bad doc edit and a command rename that breaks an unchanged doc are caught. The existingci.ymlunit-test job also runs the gate.Self-test
go build,go vet,gofmt,golangci-lintv2.1.6: clean.Summary by CodeRabbit
New Features
Tests