Skip to content

feat: add skill command drift check (CI gate)#1244

Open
sang-neo03 wants to merge 6 commits into
mainfrom
feat/skill-command-lint
Open

feat: add skill command drift check (CI gate)#1244
sang-neo03 wants to merge 6 commits into
mainfrom
feat/skill-command-lint

Conversation

@sang-neo03
Copy link
Copy Markdown
Collaborator

@sang-neo03 sang-neo03 commented Jun 3, 2026

Problem

Skill docs (skills/**) teach AI agents how to invoke lark-cli, but the command examples drift from the real CLI when commands are renamed, flags removed, or method names typo'd — so AI-generated commands silently fail. Nothing currently validates command / flag / --help correctness in skill docs.

How it works

  • Truth source: the catalog of valid command paths and their flags is built in-process from cmd.Build() — the exact command tree the binary uses (native commands + all +shortcuts). It cannot drift from reality and needs no hand-maintained list.
  • Checker (internal/skilllint): extracts command references from skill markdown (bash blocks, inline code, bare prose, frontmatter cliHelp), resolves each by longest-prefix match (separating real subcommands from positionals), and reports unknown commands / subcommands / flags. Placeholder syntax (<resource>, [flags], +<verb>) and prose are filtered to avoid false positives.
  • Gate (TestSkillCommandDrift): scans all skills/**/*.md and fails on any reference absent from the catalog, except those recorded in a baseline of known pre-existing drift.

Existing drift (baseline)

The full scan surfaced 30 references across 5 domains that don't match the current CLI. They are recorded in cmd/testdata/skill_command_baseline.tsv so they don't block PRs, while the gate hard-fails any newly introduced error:

domain count example
lark-slides 13 slides xml_presentation.slide get/replace (no such command)
lark-mail 10 batch_modify_messagebatch_modify; user_mailbox.templates (no such resource)
lark-drive 3 files upload_prepare/upload_finish; permission.public patch
lark-base 2 +field-create/+field-update (two commands joined by /)
lark-vc-agent 2 vc meeting get --with-participants (no such flag)

These are left for follow-up. After fixing, regenerate with UPDATE_SKILL_BASELINE=1 go test ./cmd/ -run TestSkillCommandDrift.

Triggers

skill-command-check.yml runs on changes to skills/, cmd/, or shortcuts/ — so both a bad doc edit and a command rename that breaks an unchanged doc are caught. The existing ci.yml unit-test job also runs the gate.

Self-test

  • Negative: injecting a fake command into a skill doc fails the gate with file:line + suggestion.
  • Positive: removing it passes; all 30 baselined findings are ignored.
  • go build, go vet, gofmt, golangci-lint v2.1.6: clean.

Summary by CodeRabbit

  • New Features

    • Added automated skill command validation workflow for pull requests, ensuring documentation remains accurate.
    • Implemented documentation drift detection that verifies documented commands and flags match the actual CLI interface.
    • Added intelligent suggestion system to recommend corrections when discrepancies are found.
  • Tests

    • Added comprehensive test suite to validate skill command and flag accuracy.

Skill docs reference lark-cli commands that can drift from the real CLI
(renamed commands, removed flags, typo'd methods), so AI-generated
commands silently fail. This adds a CI gate that catches such drift.

The catalog of valid commands is built in-process from cmd.Build() — the
same command tree the binary uses — so it can never drift from reality.
The checker (internal/skilllint) extracts command references from skill
markdown (bash blocks, inline code, bare prose, frontmatter cliHelp),
resolves each by longest-prefix match, and reports unknown
commands/subcommands/flags. Placeholder syntax (<resource>, [flags]) and
prose are filtered to avoid false positives.

TestSkillCommandDrift wires the catalog to a full skills/ scan and fails
on any reference absent from the catalog, except those recorded in a
baseline of known pre-existing drift (30 findings across 5 domains in
cmd/testdata/skill_command_baseline.tsv). This blocks newly introduced
errors while existing drift is cleaned up incrementally; regenerate with
UPDATE_SKILL_BASELINE=1 go test ./cmd/ -run TestSkillCommandDrift.

skill-command-check.yml runs the gate on changes to skills/, cmd/, or
shortcuts/, so a command rename that breaks an unchanged doc is caught
too; the existing ci.yml test job also covers it.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces internal/skilllint, a new validation package that extracts lark-cli command references from markdown skill documentation, validates them against a live Cobra CLI catalog, and reports unknown commands and flags. A GitHub Actions workflow runs a diff-scoped integration test that validates only changed documentation lines.

Changes

Skill command-lint validation system

Layer / File(s) Summary
Markdown command reference parsing
internal/skilllint/parse.go, internal/skilllint/skilllint_test.go
ParseRefs extracts lark-cli command references from markdown into Ref records, handling line continuations, command substitutions, placeholder filtering, and trailing punctuation. Tests validate extraction, prose filtering, edge cases, and robustness across shell features.
Live CLI command catalog and lookup
internal/skilllint/catalog.go, internal/skilllint/skilllint_test.go
Catalog stores known command paths, accepted flags, and group membership from the live Cobra command tree. Provides HasCommand, HasFlag, and LongestPrefix queries to resolve command matches and compute nearest-match suggestions using Levenshtein distance.
Validation of refs against catalog
internal/skilllint/check.go, internal/skilllint/skilllint_test.go
Check compares parsed refs to the catalog, producing Finding records for unknown commands and unknown flags with optional suggestions. Tests cover valid refs, unmatched commands, missing flags, and suggestion accuracy.
Git diff parsing and filtering
internal/skilllint/diff.go, internal/skilllint/skilllint_test.go
ParseDiffAddedLines extracts line numbers from unified diffs; FilterRefsByLines narrows ref validation to changed content only. Tests verify diff parsing and filtering behavior.
Integration test and CI workflow
cmd/skilllint_test.go, .github/workflows/skill-command-check.yml
TestSkillCommandDrift orchestrates catalog building, markdown parsing, diff filtering, and validation, reporting mismatches with suggestions. The GitHub Actions workflow runs the test on push/pull_request, gated by path filters and SKILL_LINT_BASE environment variable.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

feature

Suggested reviewers

  • liangshuo-1
  • evandance

Poem

A rabbit hops through markdown lines, 🐰
Parsing lark-cli references fine,
Checks them 'gainst the catalog true,
Reports the drift when commands are new,
In GitHub's workflow, lint runs through!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is comprehensive and well-structured, covering problem statement, implementation details, existing drift baseline, triggers, and testing. However, it does not follow the provided template structure (Summary, Changes, Test Plan, Related Issues sections). Restructure the description to follow the template: add a concise Summary section, reorganize Changes as a bulleted list, formalize the Test Plan section with checkboxes, and add a Related Issues section.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a skill command drift check CI gate to validate command examples in skill documentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/skill-command-lint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the size/L Large or sensitive change across domains or core paths label Jun 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@c76a244bd1a2f091eddb6c762d91228651792c38

🧩 Skill update

npx skills add larksuite/cli#feat/skill-command-lint -y -g

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 93.11741% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.00%. Comparing base (2bbab4d) to head (c76a244).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
internal/skilllint/parse.go 91.08% 6 Missing and 3 partials ⚠️
internal/skilllint/diff.go 86.48% 3 Missing and 2 partials ⚠️
internal/skilllint/catalog.go 96.38% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1244      +/-   ##
==========================================
+ Coverage   69.21%   70.00%   +0.78%     
==========================================
  Files         639      649      +10     
  Lines       59794    60482     +688     
==========================================
+ Hits        41389    42340     +951     
+ Misses      15059    14834     -225     
+ Partials     3346     3308      -38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/skill-command-check.yml:
- Line 36: Update the checkout step that uses "uses:
actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd" to disable persisted
credentials by adding a with block containing "persist-credentials: false";
specifically modify the checkout job step where actions/checkout is referenced
so it includes the with: persist-credentials: false setting to avoid leaving
authentication credentials available to subsequent steps.

In `@cmd/skilllint_test.go`:
- Around line 63-68: The filepath.WalkDir callback in collectSkillFindings
currently swallows callback errors by always returning nil; change the callback
to return the incoming err when it's non-nil (i.e. if err != nil { return err })
and only append files when err == nil and !d.IsDir() and strings.HasSuffix(p,
".md"), so unreadable/broken entries propagate up from WalkDir instead of being
silently ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e7af38bd-f640-4595-9682-c1ca14f0d160

📥 Commits

Reviewing files that changed from the base of the PR and between 85c7280 and e8150fb.

⛔ Files ignored due to path filters (1)
  • cmd/testdata/skill_command_baseline.tsv is excluded by !**/*.tsv
📒 Files selected for processing (7)
  • .github/workflows/skill-command-check.yml
  • cmd/skilllint_test.go
  • internal/skilllint/baseline.go
  • internal/skilllint/catalog.go
  • internal/skilllint/check.go
  • internal/skilllint/parse.go
  • internal/skilllint/skilllint_test.go

Comment thread .github/workflows/skill-command-check.yml
Comment thread cmd/skilllint_test.go Outdated
An independent re-verification found the command catalog could be built from
a stale ~/.lark-cli/cache/remote_meta.json, which misses service commands that
the repo's embedded metadata actually defines. That produced ~19 false
positives (slides xml_presentation.slide get/replace, mail
user_mailbox.templates, drive permission.public patch) on machines with an old
cache — non-reproducible and wrong.

Add a cmd-package TestMain pinning LARKSUITE_CLI_REMOTE_META=off so the catalog
is always built from embedded metadata (the repo's source of truth), identical
across machines and CI. No cmd/ test depends on remote meta.

Baseline corrected from 30 to 11 real findings (mail batch_modify/modify, mail
+triage --filter, drive upload_prepare/finish, base +field-create/+field-update
join, vc meeting get with_participants-in-params).
- skill-command-check.yml: set persist-credentials: false on checkout
  (job only reads the repo and runs tests; satisfies zizmor artipacked)
- cmd/skilllint_test.go: return WalkDir callback errors instead of silently
  skipping entries (fail closed, keeps the drift gate strict)
Address code-review findings on the command-reference parser:
- Blocker: join backslash line-continuations so flags on the continuation
  lines of a multi-line `lark-cli ... \` example are actually checked
- Major: allow underscores in flag names (--input_format was truncated to
  --input and misreported); strip $(...) command substitutions so inner-command
  flags (e.g. jq -n --arg) aren't taken as lark-cli flags; strip trailing
  sentence/CJK punctuation from command words; cut commands at glued separators
  (get; / foo|)
- Minor: LoadBaseline handles a nil reader and its error is checked at the
  call site; cache the sorted path list used for suggestions
- Tests: add parser robustness cases

Baseline unchanged: still 11 references / 10 deduped fingerprints.
(Deferred for discussion: leading global flags, config-dir isolation,
TestMain scope, CI paths for meta sources.)
Per design decision, the CI gate now only checks the command references a PR
adds/changes in skill docs, instead of scanning all skills against a baseline.

- TestSkillCommandDrift diffs against the PR base (SKILL_LINT_BASE, default
  origin/main), parses each changed skill file in full so backslash
  continuations join, and checks only refs whose start line changed
- new internal/skilllint/diff.go: extract changed line numbers from git diff,
  filter refs by line
- remove the baseline (internal/skilllint/baseline.go + the .tsv) — pre-existing
  drift is never scanned now, so no baseline is needed
- workflow: fetch full history and pass the PR base; local runs without a base
  skip instead of fail

Pre-existing drift stays tracked in the inventory doc; reverse drift (a command
rename breaking an unchanged doc) is out of scope for this gate.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/skilllint/diff.go (1)

74-82: Confirm intended behavior: refs are filtered by start line only, so changes on continuation lines are missed.

FilterRefsByLines keeps a Ref only when r.Line (the line where the command starts) is in the changed set. Since ParseRefs joins backslash-continuations into a single Ref anchored at the first line, a PR that adds/edits only a continuation line (e.g. appends --bad-flag on the next line) of a command whose first line is unchanged will not be re-checked — the gate silently passes that drift.

If catching continuation-line edits is in scope, Ref would need to expose its full line span (start..end) and the filter should match any line in that span. If this narrow gap is an accepted tradeoff, no change needed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/skilllint/diff.go` around lines 74 - 82, FilterRefsByLines currently
only checks Ref.Line (start line) so continuation-line edits are missed; update
the Ref type (e.g., add EndLine or LinesSpan fields populated by ParseRefs when
joining backslash continuations) and change FilterRefsByLines to keep a Ref if
any line in its span intersects the changed-lines map (iterate from Start to End
or check intersection) instead of only r.Line; alternatively, if you
intentionally accept the tradeoff, add a clarifying comment on FilterRefsByLines
referencing ParseRefs behavior and leave as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/skilllint_test.go`:
- Around line 77-92: The test TestSkillCommandDrift currently unconditionaly
calls t.Skipf when the git diff (gitCmd.Run()) fails; change this so that if
SKILL_LINT_BASE was explicitly provided (the variable base is not the
empty/default "origin/main") the test fails instead of skipping: detect failure
from gitCmd.Run(), and if base != "origin/main" (or equivalently SKILL_LINT_BASE
was set) call t.Fatalf (or t.Errorf+return) with a clear message including err
and strings.TrimSpace(stderr.String()); only call t.Skipf when base was the
default (origin/main) to preserve local dev behavior. Ensure references to base,
SKILL_LINT_BASE, gitCmd and the current t.Skipf call are updated accordingly.

---

Nitpick comments:
In `@internal/skilllint/diff.go`:
- Around line 74-82: FilterRefsByLines currently only checks Ref.Line (start
line) so continuation-line edits are missed; update the Ref type (e.g., add
EndLine or LinesSpan fields populated by ParseRefs when joining backslash
continuations) and change FilterRefsByLines to keep a Ref if any line in its
span intersects the changed-lines map (iterate from Start to End or check
intersection) instead of only r.Line; alternatively, if you intentionally accept
the tradeoff, add a clarifying comment on FilterRefsByLines referencing
ParseRefs behavior and leave as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4866fbf-6187-41f4-ad0c-348c7e9e12d6

📥 Commits

Reviewing files that changed from the base of the PR and between 4186d50 and c76a244.

📒 Files selected for processing (4)
  • .github/workflows/skill-command-check.yml
  • cmd/skilllint_test.go
  • internal/skilllint/diff.go
  • internal/skilllint/skilllint_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/skill-command-check.yml

Comment thread cmd/skilllint_test.go Outdated
Comment on lines +77 to +92
func TestSkillCommandDrift(t *testing.T) {
base := os.Getenv("SKILL_LINT_BASE")
if base == "" {
base = "origin/main"
}

// PR-style diff (merge-base..HEAD), zero context, limited to skill docs.
// Run from the repo root (the parent of cmd/).
gitCmd := exec.Command("git", "-C", "..", "diff", "-U0", base+"...HEAD", "--", "skills")
var stdout, stderr bytes.Buffer
gitCmd.Stdout, gitCmd.Stderr = &stdout, &stderr
if err := gitCmd.Run(); err != nil {
t.Skipf("diff-mode skill check skipped: cannot diff against %q (%v: %s). "+
"CI supplies the base via SKILL_LINT_BASE and fetched history.",
base, err, strings.TrimSpace(stderr.String()))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail (don't skip) when the diff base was explicitly provided.

When SKILL_LINT_BASE is set, this is a CI run where the base history is expected to be fetched. If git diff fails there (e.g. shallow clone, unfetched base), the gate t.Skipfs and silently passes — a misconfigured CI step disables the drift check without any signal. Keep the skip only for the local/default-base case where origin/main may genuinely be absent.

Suggested fix
 	base := os.Getenv("SKILL_LINT_BASE")
+	explicitBase := base != ""
 	if base == "" {
 		base = "origin/main"
 	}
@@
 	if err := gitCmd.Run(); err != nil {
+		if explicitBase {
+			t.Fatalf("skill check: cannot diff against explicitly provided base %q (%v: %s). "+
+				"CI must fetch the base history before running this gate.",
+				base, err, strings.TrimSpace(stderr.String()))
+		}
 		t.Skipf("diff-mode skill check skipped: cannot diff against %q (%v: %s). "+
 			"CI supplies the base via SKILL_LINT_BASE and fetched history.",
 			base, err, strings.TrimSpace(stderr.String()))
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestSkillCommandDrift(t *testing.T) {
base := os.Getenv("SKILL_LINT_BASE")
if base == "" {
base = "origin/main"
}
// PR-style diff (merge-base..HEAD), zero context, limited to skill docs.
// Run from the repo root (the parent of cmd/).
gitCmd := exec.Command("git", "-C", "..", "diff", "-U0", base+"...HEAD", "--", "skills")
var stdout, stderr bytes.Buffer
gitCmd.Stdout, gitCmd.Stderr = &stdout, &stderr
if err := gitCmd.Run(); err != nil {
t.Skipf("diff-mode skill check skipped: cannot diff against %q (%v: %s). "+
"CI supplies the base via SKILL_LINT_BASE and fetched history.",
base, err, strings.TrimSpace(stderr.String()))
}
func TestSkillCommandDrift(t *testing.T) {
base := os.Getenv("SKILL_LINT_BASE")
explicitBase := base != ""
if base == "" {
base = "origin/main"
}
// PR-style diff (merge-base..HEAD), zero context, limited to skill docs.
// Run from the repo root (the parent of cmd/).
gitCmd := exec.Command("git", "-C", "..", "diff", "-U0", base+"...HEAD", "--", "skills")
var stdout, stderr bytes.Buffer
gitCmd.Stdout, gitCmd.Stderr = &stdout, &stderr
if err := gitCmd.Run(); err != nil {
if explicitBase {
t.Fatalf("skill check: cannot diff against explicitly provided base %q (%v: %s). "+
"CI must fetch the base history before running this gate.",
base, err, strings.TrimSpace(stderr.String()))
}
t.Skipf("diff-mode skill check skipped: cannot diff against %q (%v: %s). "+
"CI supplies the base via SKILL_LINT_BASE and fetched history.",
base, err, strings.TrimSpace(stderr.String()))
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/skilllint_test.go` around lines 77 - 92, The test TestSkillCommandDrift
currently unconditionaly calls t.Skipf when the git diff (gitCmd.Run()) fails;
change this so that if SKILL_LINT_BASE was explicitly provided (the variable
base is not the empty/default "origin/main") the test fails instead of skipping:
detect failure from gitCmd.Run(), and if base != "origin/main" (or equivalently
SKILL_LINT_BASE was set) call t.Fatalf (or t.Errorf+return) with a clear message
including err and strings.TrimSpace(stderr.String()); only call t.Skipf when
base was the default (origin/main) to preserve local dev behavior. Ensure
references to base, SKILL_LINT_BASE, gitCmd and the current t.Skipf call are
updated accordingly.

Redirect the check at the real target: the example commands embedded in each
shortcut's Tips ("Example: lark-cli ..."), validated against the live command
tree. A shortcut without an example is skipped; a wrong example is a bug fixed
at the source. Full self-consistency check (examples and command definitions
are in the same Go code), so it needs no baseline, no diff, no skill-doc scan.

- TestShortcutExampleCommands iterates shortcuts.AllShortcuts(), parses example
  commands from Tips, and checks them against cmd.Build()'s tree
- drop skill-doc diff scanning and internal/skilllint/diff.go
- workflow: full run on shortcut/cmd/registry/checker changes (no diff base)

meta.json has no command examples today (only field values) so it is not
checked; the same logic covers it if command examples are ever added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large or sensitive change across domains or core paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant