Skip to content

feat: agent install telemetry (skill side, spec §8.1)#31

Merged
lxcong merged 12 commits into
mainfrom
feat/skill-side-telemetry
May 14, 2026
Merged

feat: agent install telemetry (skill side, spec §8.1)#31
lxcong merged 12 commits into
mainfrom
feat/skill-side-telemetry

Conversation

@lxcong
Copy link
Copy Markdown
Contributor

@lxcong lxcong commented May 11, 2026

Summary

Skill-side half of the agent-install telemetry rollout (spec §8.1). check-update.sh emits TELEMETRY ... lines on stdout, SKILL.md parses them and dispatches via MCP. Adds 3-layer opt-out (file / env / installer flag — file path checked here), 24h client-side dedup, and a bats test harness.

  • skills/agentkey/scripts/check-update.shemit_telemetry() + auto_upgrade_flag() helpers, opt-out checks, 7 emit calls at 5 exit points, 24h heartbeat dedup keyed by LOCAL_VERSION. Includes Linux stat order fix (-c %Y first, -f %m second — -f on Linux means filesystem mountpoint, not mtime)
  • skills/agentkey/SKILL.md — Step 0 instructions for parsing TELEMETRY lines and dispatching via execute_tool(\"agentkey_internal\", {path:\"telemetry/event\", ...}); upgrade-flow each branch dispatches upgrade_decision / upgrade_result with mapped choice values
  • tests/check-update.bats + tests/helpers.bash — 10 contract tests with isolated \$HOME / \$TMPDIR / mocked curl
  • .github/workflows/scripts-test.yml — bats CI on ubuntu + macos
  • README.md + docs/README_zh.md — FAQ replacement for the old "nothing to collect" sentence, new FAQ entry "How do I opt out of telemetry?" / "我如何关闭遥测?"
  • scripts/uninstall.sh — adds Step 7b to clean ~/.config/agentkey/ (telemetry-disabled, update-disabled, snooze state)

Blocked on

AgentKey-Server PR that registers the agentkey_internal/telemetry/event MCP tool (must be filtered out of list_tools / find_tools so the LLM doesn't accidentally call it). SKILL.md falls back silently when the tool doesn't exist, so this PR is safe to merge first — but the telemetry signal isn't recorded until the server side lands.

Spec §11 mandates server-first merge order to keep the agent-side debug log clean.

Test plan

  • bats tests/check-update.bats — 10/10 passing locally
  • E2E smoke (skill-side plan Task 8 steps 1-3, 5):
    • Default emit on up_to_date
    • ~/.config/agentkey/telemetry-disabled early-return (no TELEMETRY line)
    • AGENTKEY_TELEMETRY=0 env override
    • 24h heartbeat dedup (2nd invocation within window does not re-emit)
    • uninstall.sh Step 7b cleans ~/.config/agentkey/
  • Manual — open Claude Code, ask an AgentKey-routed query. Verify SKILL.md silently swallows the missing agentkey_internal tool error (server not shipped yet) and the actual query still completes
  • Verify bats CI runs green on ubuntu + macos matrix

Pre-existing bug surfaced during testing (not in this PR)

check-update.sh line 130-ish cache fast-path uses the same wrong stat -f %m ... || stat -c %Y order. Same Linux bug as the one fixed here in emit_telemetry. Worth a separate one-line fix PR.

🤖 Generated with Claude Code

lxcong and others added 9 commits May 12, 2026 02:56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Linux, `stat -f %m` returns the filesystem mountpoint (a string), not
the file mtime. The previous order (`-f` first, `-c` second) meant the
`||` fallback never fired on Linux and the heartbeat dedup math was based
on garbage. Try GNU `-c %Y` first, then BSD `-f %m`.

The same pattern exists in the pre-existing cache-age branch at line ~130
and likely has the same bug; left for a separate fix to keep this PR
scoped to the telemetry feature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lxcong
Copy link
Copy Markdown
Contributor Author

lxcong commented May 11, 2026

@claude review

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Claude security review requested by @lxcong. Running against HEAD f7f266fcf1856c3000409b2b9b9e7d7d69c0dcde...

@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

🤖 Claude security review — HEAD: f7f266f

Scope: Telemetry emit in check-update.sh, SKILL.md dispatch instructions, uninstall cleanup, docs opt-out FAQ, and bats test suite (8 files).

✅ No credential leaks, shell injection, or convention violations found.

💡 Suggestion (nice-to-have)

  • .github/workflows/scripts-test.yml:L20actions/checkout@v4 is a mutable tag. Pin to a full SHA (e.g. actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683) for supply-chain hardening.

  • tests/helpers.bash:L39mock_curl_release writes $tag into an unquoted heredoc (<<EOF). All current callers pass literal version strings so there is no real risk, but switching to <<'EOF' and building the tag line separately (or using printf) would eliminate any future injection surface in the test harness.


Review triggered by @lxcong
via @claude review.

lxcong and others added 2 commits May 12, 2026 04:00
CI on ubuntu-latest blew up in test 7 with "line 136: File: unbound
variable" because GNU stat on Ubuntu 24.04 prints filesystem info to
stdout (not stderr) when `stat -f %m FILE` is invoked invalidly:

    $ stat -f %m /tmp/cache 2>/dev/null
      File: "/tmp/cache"
      ID: ...  Namelen: 255  Type: overlayfs
      Block size: 4096 ...

That output flowed into MTIME=$(stat -f %m ... || stat -c %Y ...), and
$((NOW - MTIME)) then tried to evaluate `NOW - File:` which under set -u
threw an unbound-variable error on the word "File".

Two-pronged fix in both the cache fast-path and emit_telemetry's
heartbeat dedup:

1. Swap order — try GNU `-c %Y` first (Linux); `-f %m` only as macOS
   fallback. Avoids the polluted-stdout path entirely on Linux.

2. Numeric guard — `case "$MTIME" in ''|*[!0-9]*) MTIME=0 ;; esac`
   strips any non-numeric leftovers if both forms ever produce garbage
   (belt-and-suspenders). MTIME=0 makes AGE huge → fast-path falls
   through to slow-path, which is the safe behavior.

Verified locally and inside ubuntu:24.04 docker: bats 10/10 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…etry

# Conflicts:
#	skills/agentkey/SKILL.md
@lxcong
Copy link
Copy Markdown
Contributor Author

lxcong commented May 14, 2026

@claude review

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Claude security review requested by @lxcong. Running against HEAD 6a1a50518251c13a800f9faa21259e87a6ca5820...

@claude
Copy link
Copy Markdown

claude Bot commented May 14, 2026

🤖 Claude security review — HEAD: 6a1a505

Scope: 8-file PR adding anonymous skill-side telemetry (skill_loaded, upgrade_decision, upgrade_result events via stdout → MCP dispatch), opt-out docs, uninstall cleanup for ~/.config/agentkey/, and bats test coverage.

💡 Suggestion (nice-to-have)

  • skills/agentkey/scripts/check-update.sh:95emit_telemetry on the update-disabled branch is called before the LOCAL_VERSION sanity-check at line 101. The heartbeat file path embeds $LOCAL_VERSION; a malformed version (e.g. if release-please ever fails to sync the line) would create an oddly-named temp file before the guard fires. Consider moving the sanity-check above the disabled-file block, or adding the same [0-9]*.[0-9]*.[0-9]* guard at the top of emit_telemetry.

  • tests/helpers.bash:52-54$v is interpolated unquoted into the sed / expression in set_local_version. A version string containing / or & would silently corrupt the substitution. All current test values are safe, but switching to a non-conflicting delimiter (sed -i "s|^LOCAL_VERSION=.*|LOCAL_VERSION=\"$v\" …|") makes the helper robust against future additions.

  • .github/workflows/scripts-test.yml:20actions/checkout@v4 is a mutable tag reference. SHA-pinning (e.g. actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683) is the recommended supply-chain hardening for CI workflows.


Review triggered by @lxcong
via @claude review.

Previously the update-disabled branch could call emit_telemetry with
a malformed LOCAL_VERSION (if release-please ever fails to sync the
embedded version line), creating an oddly-named heartbeat file at
$TMPDIR/agentkey-heartbeat-<garbage>. The semver guard now runs
first, so a malformed version exits silently before any filesystem
write or telemetry emit.

Surfaced by Claude security review on 6a1a505.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lxcong lxcong marked this pull request as ready for review May 14, 2026 08:40
@lxcong lxcong merged commit f830f29 into main May 14, 2026
6 checks passed
@lxcong lxcong deleted the feat/skill-side-telemetry branch May 14, 2026 08:41
@lxcong lxcong mentioned this pull request May 14, 2026
lxcong added a commit that referenced this pull request May 14, 2026
🤖 I have created a release *beep* *boop*
---


##
[1.6.0](v1.5.0...v1.6.0)
(2026-05-14)


### Features

* agent install telemetry (skill side, spec §8.1)
([#31](#31))
([f830f29](f830f29))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
lxcong added a commit that referenced this pull request May 14, 2026
## Summary

PR #31 wired skill-side telemetry through two paths that don't actually
connect, so zero usable events reach PostHog from a real LLM session.

- **Bug 1 (dispatch path)**: SKILL.md told the LLM
  `execute_tool(name="agentkey_internal", params={...})`, but
  `agentkey_internal` is registered as a separate top-level MCP tool in
  `@agentkey/mcp` ≥0.3.10. `execute_tool`'s router only knows
  `agentkey_search / scrape / social / crypto`, so the call returned
  `tool not found` and SKILL.md's silent-fallback swallowed the error.

- **Bug 2 (field name)**: Body wrapped properties under `props: {...}`
  but `internal/handler/telemetry_handler.go` decodes
  `Properties map[string]interface{} json:"properties"`. Mismatched tag
  → nil map → server substituted empty map → every business field
  stripped before reaching PostHog. Only posthog-go's auto-injected
  `$lib` / `$os` / `$ip` plus `agent: "unknown"` (from
  `NormalizeAgent("")`) survived.

## Evidence

Direct call vs SKILL.md-shaped call, same distinct_id, minutes apart in
PostHog (project `agentkey`):

| body field name | PostHog record for `upgrade_decision` |
|---|---|
| `properties: {...}` ✅ | `from_version=1.5.0`, `to_version=1.6.0`,
`choice=accept_once`, `agent=claude-code` — all fields |
| `props: {...}` ❌ | All `None` |

Sink / auth / posthog-go batching / 24h dedupe all confirmed fine. Bug
surface is exactly the two items above.

## Changes

SKILL.md only:
- 6 dispatch sites: `execute_tool(name="agentkey_internal",
params={...})`
  → `agentkey_internal({...})`
- Body field: `props` → `properties`

No change to `check-update.sh`, MCP forwarder, or server. Skill-only is
the cheaper side to align: 1.6.0 just shipped and has no install base.

Agent attribution (server falling back to `agent: "unknown"`) is left as
a follow-up — needs cross-client identification logic.

## Test plan

- [x] `bats tests/check-update.bats` — 10/10 still passing
  (script-level TELEMETRY emission unchanged)
- [x] Manual end-to-end: directly invoke
`agentkey_internal({path: "telemetry/event", params: {event:
"upgrade_decision", properties: {...}}})`
  → confirmed all fields land in PostHog
- [ ] Smoke after release: bump version, install fresh, trigger a real
  upgrade flow, verify PostHog receives populated `skill_loaded` /
  `upgrade_decision` / `upgrade_result`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant