feat: agent install telemetry (skill side, spec §8.1)#31
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Linux, `stat -f %m` returns the filesystem mountpoint (a string), not the file mtime. The previous order (`-f` first, `-c` second) meant the `||` fallback never fired on Linux and the heartbeat dedup math was based on garbage. Try GNU `-c %Y` first, then BSD `-f %m`. The same pattern exists in the pre-existing cache-age branch at line ~130 and likely has the same bug; left for a separate fix to keep this PR scoped to the telemetry feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
|
🤖 Claude security review requested by @lxcong. Running against HEAD |
|
🤖 Claude security review — HEAD: f7f266f Scope: Telemetry emit in check-update.sh, SKILL.md dispatch instructions, uninstall cleanup, docs opt-out FAQ, and bats test suite (8 files). ✅ No credential leaks, shell injection, or convention violations found. 💡 Suggestion (nice-to-have)
Review triggered by @lxcong |
CI on ubuntu-latest blew up in test 7 with "line 136: File: unbound
variable" because GNU stat on Ubuntu 24.04 prints filesystem info to
stdout (not stderr) when `stat -f %m FILE` is invoked invalidly:
$ stat -f %m /tmp/cache 2>/dev/null
File: "/tmp/cache"
ID: ... Namelen: 255 Type: overlayfs
Block size: 4096 ...
That output flowed into MTIME=$(stat -f %m ... || stat -c %Y ...), and
$((NOW - MTIME)) then tried to evaluate `NOW - File:` which under set -u
threw an unbound-variable error on the word "File".
Two-pronged fix in both the cache fast-path and emit_telemetry's
heartbeat dedup:
1. Swap order — try GNU `-c %Y` first (Linux); `-f %m` only as macOS
fallback. Avoids the polluted-stdout path entirely on Linux.
2. Numeric guard — `case "$MTIME" in ''|*[!0-9]*) MTIME=0 ;; esac`
strips any non-numeric leftovers if both forms ever produce garbage
(belt-and-suspenders). MTIME=0 makes AGE huge → fast-path falls
through to slow-path, which is the safe behavior.
Verified locally and inside ubuntu:24.04 docker: bats 10/10 passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…etry # Conflicts: # skills/agentkey/SKILL.md
|
@claude review |
|
🤖 Claude security review requested by @lxcong. Running against HEAD |
|
🤖 Claude security review — HEAD: 6a1a505 Scope: 8-file PR adding anonymous skill-side telemetry ( 💡 Suggestion (nice-to-have)
Review triggered by @lxcong |
Previously the update-disabled branch could call emit_telemetry with a malformed LOCAL_VERSION (if release-please ever fails to sync the embedded version line), creating an oddly-named heartbeat file at $TMPDIR/agentkey-heartbeat-<garbage>. The semver guard now runs first, so a malformed version exits silently before any filesystem write or telemetry emit. Surfaced by Claude security review on 6a1a505. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🤖 I have created a release *beep* *boop* --- ## [1.6.0](v1.5.0...v1.6.0) (2026-05-14) ### Features * agent install telemetry (skill side, spec §8.1) ([#31](#31)) ([f830f29](f830f29)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
## Summary PR #31 wired skill-side telemetry through two paths that don't actually connect, so zero usable events reach PostHog from a real LLM session. - **Bug 1 (dispatch path)**: SKILL.md told the LLM `execute_tool(name="agentkey_internal", params={...})`, but `agentkey_internal` is registered as a separate top-level MCP tool in `@agentkey/mcp` ≥0.3.10. `execute_tool`'s router only knows `agentkey_search / scrape / social / crypto`, so the call returned `tool not found` and SKILL.md's silent-fallback swallowed the error. - **Bug 2 (field name)**: Body wrapped properties under `props: {...}` but `internal/handler/telemetry_handler.go` decodes `Properties map[string]interface{} json:"properties"`. Mismatched tag → nil map → server substituted empty map → every business field stripped before reaching PostHog. Only posthog-go's auto-injected `$lib` / `$os` / `$ip` plus `agent: "unknown"` (from `NormalizeAgent("")`) survived. ## Evidence Direct call vs SKILL.md-shaped call, same distinct_id, minutes apart in PostHog (project `agentkey`): | body field name | PostHog record for `upgrade_decision` | |---|---| | `properties: {...}` ✅ | `from_version=1.5.0`, `to_version=1.6.0`, `choice=accept_once`, `agent=claude-code` — all fields | | `props: {...}` ❌ | All `None` | Sink / auth / posthog-go batching / 24h dedupe all confirmed fine. Bug surface is exactly the two items above. ## Changes SKILL.md only: - 6 dispatch sites: `execute_tool(name="agentkey_internal", params={...})` → `agentkey_internal({...})` - Body field: `props` → `properties` No change to `check-update.sh`, MCP forwarder, or server. Skill-only is the cheaper side to align: 1.6.0 just shipped and has no install base. Agent attribution (server falling back to `agent: "unknown"`) is left as a follow-up — needs cross-client identification logic. ## Test plan - [x] `bats tests/check-update.bats` — 10/10 still passing (script-level TELEMETRY emission unchanged) - [x] Manual end-to-end: directly invoke `agentkey_internal({path: "telemetry/event", params: {event: "upgrade_decision", properties: {...}}})` → confirmed all fields land in PostHog - [ ] Smoke after release: bump version, install fresh, trigger a real upgrade flow, verify PostHog receives populated `skill_loaded` / `upgrade_decision` / `upgrade_result` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Skill-side half of the agent-install telemetry rollout (spec §8.1).
check-update.shemitsTELEMETRY ...lines on stdout, SKILL.md parses them and dispatches via MCP. Adds 3-layer opt-out (file / env / installer flag — file path checked here), 24h client-side dedup, and a bats test harness.skills/agentkey/scripts/check-update.sh—emit_telemetry()+auto_upgrade_flag()helpers, opt-out checks, 7 emit calls at 5 exit points, 24h heartbeat dedup keyed byLOCAL_VERSION. Includes Linuxstatorder fix (-c %Yfirst,-f %msecond —-fon Linux means filesystem mountpoint, not mtime)skills/agentkey/SKILL.md— Step 0 instructions for parsingTELEMETRYlines and dispatching viaexecute_tool(\"agentkey_internal\", {path:\"telemetry/event\", ...}); upgrade-flow each branch dispatchesupgrade_decision/upgrade_resultwith mapped choice valuestests/check-update.bats+tests/helpers.bash— 10 contract tests with isolated\$HOME/\$TMPDIR/ mocked curl.github/workflows/scripts-test.yml— bats CI on ubuntu + macosREADME.md+docs/README_zh.md— FAQ replacement for the old "nothing to collect" sentence, new FAQ entry "How do I opt out of telemetry?" / "我如何关闭遥测?"scripts/uninstall.sh— adds Step 7b to clean~/.config/agentkey/(telemetry-disabled, update-disabled, snooze state)Blocked on
AgentKey-Server PR that registers the
agentkey_internal/telemetry/eventMCP tool (must be filtered out oflist_tools/find_toolsso the LLM doesn't accidentally call it). SKILL.md falls back silently when the tool doesn't exist, so this PR is safe to merge first — but the telemetry signal isn't recorded until the server side lands.Spec §11 mandates server-first merge order to keep the agent-side debug log clean.
Test plan
bats tests/check-update.bats— 10/10 passing locallyup_to_date~/.config/agentkey/telemetry-disabledearly-return (no TELEMETRY line)AGENTKEY_TELEMETRY=0env overrideuninstall.shStep 7b cleans~/.config/agentkey/agentkey_internaltool error (server not shipped yet) and the actual query still completesPre-existing bug surfaced during testing (not in this PR)
check-update.shline 130-ish cache fast-path uses the same wrongstat -f %m ... || stat -c %Yorder. Same Linux bug as the one fixed here inemit_telemetry. Worth a separate one-line fix PR.🤖 Generated with Claude Code