[STG-2280] feat(cli): emit skill_id telemetry property by shrey150 · Pull Request #2251 · browserbase/stagehand

shrey150 · 2026-06-12T19:39:33Z

Summary

Adds two properties to the existing cli.command_completed telemetry event:

skill_id — the validated, catalog-public skill id (e.g. google.com/search-flights-ts4g1f, or bundled/browse for browse skills install), attached for skills add/install on success and every downstream failure path (skill_not_found, skill_install_failed, npx_missing, ...). null on all other commands.

Impact if merged

Implementation notes

Privacy rule: skill_id is set only after parseSkillId validation succeeds (ids are catalog-public and regex-constrained). The raw, unparsed argument — which can contain arbitrary user input or local paths — is never attached. A test asserts the raw argument does not appear anywhere in any payload.
skill_id flows through the existing RunTelemetryState (same channel as result_code/http_status), so it is captured by the one cli.command_completed emit point regardless of how the command ends.

E2E Test Matrix

Run against the locally built browse (node <local build>/bin/run.js) with a local HTTP capture server (BROWSERBASE_TELEMETRY_HOST=http://127.0.0.1:<port>) logging full POST bodies, and a fresh BROWSERBASE_TELEMETRY_INSTALL_ID_FILE dir. All payloads below are real captured cli.command_completed events.

Command / flow	Observed output	Confidence / sufficiency
`browse skills add google.com/search-flights-ts4g1f` (real catalog + real `npx skills add`, exit 0)	Captured event: `{"command_path":"skills.add", "success":true, "result_code":"ok", "skill_id":"google.com/search-flights-ts4g1f"}`	Proves the validated id is attached end-to-end on a real successful install against the live catalog.
`browse skills find flights` (real catalog)	Exit 0, returned `google.com/search-flights-ts4g1f`; captured event has `skill_id:null`	Supporting: `skill_id` stays null for non-install skills commands.
`pnpm --filter browse lint` (prettier + eslint + tsc)	clean	Supporting: format/lint/typecheck pass.

Changeset

🤖 Generated with Claude Code

Summary by cubic

New Features
- skill_id: attach the validated, catalog-public id for skills add/install on success and all failure paths; null for other commands. Never includes the raw argument.
Bug Fixes

^{Written for commit c87ee42. Summary will update on new commits.}

Why no first_success

The original revision also stamped a first_success property via a per-install marker file. Dropped in 2c526dac5 per maintainer decision: PostHog's first_time_for_user math on (cli.command_completed, success=true, driver commands) derives the same activation signal at query time — verified live (118→456 weekly first-activations over the last 4 weeks; ~2%→37% normalized per new install). Details: #2251 (comment). skill_id remains — that data exists in no event today.

skill_id: attach the validated, catalog-public skill id to cli.command_completed for browse skills add/install (success and all downstream failure paths). Never the raw argument — only the parsed, regex-validated id. first_success: emitted exactly once per anonymous install, the first time a browser-driver command completes successfully. Tracked via an 'activated' marker file alongside the existing anonymous install id. Best-effort: telemetry never affects command behavior. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

changeset-bot · 2026-06-12T19:39:42Z

🦋 Changeset detected

Latest commit: 2c526da

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 0 packages

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

cubic-dev-ai

1 issue found across 5 files

Confidence score: 4/5

In packages/cli/src/lib/telemetry.ts, the non-atomic first_success marker write can race across concurrent CLI runs, causing duplicate first_success telemetry and skewed analytics after merge; switch to an atomic create (flag: "wx") and handle EEXIST as "not first" before merging.

Architecture diagram

sequenceDiagram
    participant CLI as CLI Command
    participant Install as installSkill()
    participant Parse as parseSkillId()
    participant Telemetry as Telemetry Module
    participant Marker as Activated Marker
    participant PostHog as PostHog API

    Note over CLI,PostHog: SKILL_ID FLOW (skills add / install commands)

    CLI->>Install: rawSkillId (e.g. "google.com/search-flights-ts4g1f")
    Install->>Parse: parseSkillId(rawSkillId)
    Parse-->>Install: validated skill id or throws
    alt Validated successfully
        Install->>Telemetry: setRunTelemetryCompletion({ skillId: id.id })
    else Invalid (e.g. "../secret-local-path/oops")
        Install-->>CLI: fail()
        CLI->>Telemetry: captureCommandCompleted()
        Telemetry->>Telemetry: skill_id = null, result_code = "invalid_skill_id"
        Note over Telemetry,PostHog: Raw argument NEVER attached to payload
    end
    Install->>Install: npx skills add <skillId>
    alt Failure (e.g. skill_not_found)
        Install-->>CLI: fail()
        CLI->>Telemetry: captureCommandCompleted()
        Telemetry->>Telemetry: skill_id still set from earlier call
        Telemetry->>PostHog: { command_path: "skills.add", skill_id: "...", result_code: "skill_not_found" }
    else Success
        Install-->>CLI: exit 0
        CLI->>Telemetry: captureCommandCompleted()
        Telemetry->>PostHog: { command_path: "skills.add", skill_id: "...", result_code: "ok" }
    end

    Note over CLI,PostHog: FIRST_SUCCESS FLOW (browser-driver commands)

    CLI->>CLI: Command completes (success or failure)
    CLI->>Telemetry: captureCommandCompleted(command, success)
    Telemetry->>Telemetry: detectFirstSuccess(command, success)

    alt Non-driver command (cloud, skills, daemon, status, ...)
        Telemetry->>Telemetry: return false — no first_success property
    else Driver command + failure
        Telemetry->>Telemetry: return false — no first_success property
    else Driver command + success
        Telemetry->>Telemetry: Check isTelemetryDisabled()
        opt Telemetry disabled (DO_NOT_TRACK, CI, etc.)
            Telemetry->>Telemetry: return false — no marker check, no property
        end
        Telemetry->>Marker: access(<install-id-dir>/activated)
        alt Marker exists
            Marker-->>Telemetry: File found
            Telemetry->>Telemetry: return false — already activated
        else No marker yet
            Marker-->>Telemetry: File not found
            Telemetry->>Marker: mkdir + writeFile(<install-id-dir>/activated, ISO timestamp)
            Telemetry->>Telemetry: return true — emit first_success
        end
    end

    Telemetry->>PostHog: Package properties (includes first_success only when true)
    PostHog-->>Telemetry: (async, non-blocking)

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

Concurrent CLI runs could both pass the access() check and double-report first_success. Use an exclusive create so exactly one run can win; treat EEXIST (or any persistence failure) as not-first. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

shrey150 · 2026-06-12T22:24:56Z

Re: the first_success design question — what it is, whether the backend can derive it, and how much it matters. I tested the backend path live before answering.

What it is: a boolean stamped on exactly one cli.command_completed event per install — the first time a browser-driver command (open/get/click/snapshot/…) succeeds. It's the event-level marker for "this install reached a working browser and did something" — the activation proxy from the June 11 health report (28.5% of real users). Implementation here: ~40 LOC + a marker file beside the install id.

Can the backend derive it? Yes — verified live just now. PostHog's native first_time_for_user math on (cli.command_completed, success=true, driver-command list) works on our anonymous events (person-on-events mode). Real numbers from project 370621, weekly first-activations: 118 → 158 → 280 → 456 over the last 4 weeks — non-zero, and the rising curve matches the known June activation recovery. So the headline dashboard tile ("installs activating per week", breakdowns by version/platform/skill_present) needs zero CLI code.

What backend derivation costs: first_time_for_user is a math mode, not a filter — you can't use "was this the install's first success?" as a condition in other insights. Time-to-activate distributions, activated-cohort behavior, and the #2247 readout (skill-present installs whose first commands are cloud → do they later reach first driver success) all become multi-CTE HogQL instead of property filters. All doable — the health report did exactly this — just slower per question, and the definition lives in each query instead of being pinned in code. Counterpoint in backend's favor: a query-time definition can be changed retroactively; a client-stamped property is frozen at emission.

How important is it? The metric is the north star — #2247 is literally parked on it. The client-side property is a convenience/robustness layer, not the only way to get the metric.

Recommendation, given the live proof: consistent with the de-scope philosophy on #2200 — drop first_success from this PR and keep skill_id (which is not backend-derivable; the data doesn't exist in any event today). I'll build the activation views as saved PostHog insights (the first_time_for_user tile + a saved HogQL funnel for the cross-cutting questions) so nothing is hand-rebuilt per readout. That cuts this PR roughly in half (~35 LOC of telemetry plumbing, no marker file). Say the word and I'll amend — or if you'd rather keep the composable event property, that's defensible too; it's the same trade as ever: 40 LOC in the CLI vs. HogQL at query time.

… first_time_for_user PostHog's first_time_for_user math on (cli.command_completed, success=true, driver commands) reproduces the activation signal at query time (verified live: 118->456 weekly first-activations over the last 4 weeks), so the client-side marker file is unnecessary. skill_id stays — that data exists in no event today. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

shrey150 · 2026-06-12T22:39:00Z

Amended in 2c526dac5 per the decision above — first_success and its marker file are gone; the PR is now skill_id-only (+81 lines total incl. tests, down from ~160). Title, body, changeset, and Linear updated. 215/215 tests green. The activation signal lives backend-side via first_time_for_user (recipe + caveats saved to the team's notes).

cubic-dev-ai

1 issue found across 3 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic}

shrey150 · 2026-06-12T22:48:57Z

Design note for the record — why skill_id stays client-side even though the backend sees skill requests (raised in review: the Browse.sh API behind browse skills list/add already logs which skills are fetched).

Verified against the code: skills find/list hits GET api/skills and skills add hits the skill-files API with the parsed id — so the server does see demand. What it cannot see is outcome:

The dominant install failures happen after the file-list API call succeeds — the external npx skills chain (45% of adds fail overall; 3.2% success on Windows; timeout tails). Server-side, a failed and a successful install are indistinguishable: both fetched files. skill_id on cli.command_completed exists for the join to success/result_code — which skills fail to install, and why.
No join key: the CLI's catalog/file fetches are bare fetch(url) with no install id or client header, so API logs can't be connected to the PostHog funnel (find→add conversion per skill, retries, engagement by installed skill).
Coverage holes: skills install (bundled — no API call), the GitHub-clone fallback (bypasses the file API), and invalid_skill_id failures (die before any request).

Cost asymmetry vs the dropped first_success: that was ~40 LOC + filesystem state and fully backend-derivable (first_time_for_user) → cut. skill_id is 8 lines, zero state, and only its demand half exists server-side.

Complementary uses of the API logs: authoritative skill popularity (including non-CLI catalog clients) and cross-checking CLI-reported totals. If true server-side joinability is ever wanted, the path is adding User-Agent: browse-cli/<ver> + install-id headers to those fetches — a separate privacy/design decision tracked from the June 8 telemetry investigation.

cubic-dev-ai Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread packages/cli/src/lib/telemetry.ts Outdated

shrey150 mentioned this pull request Jun 12, 2026

[STG-2276] feat(cli): once-per-install open nudge after cloud search/fetch #2247

Draft

shrey150 commented Jun 12, 2026

View reviewed changes

Comment thread .changeset/telemetry-skill-id-first-success.md Outdated

shrey150 changed the title ~~[STG-2280] feat(cli): emit skill_id and first_success telemetry properties~~ [STG-2280] feat(cli): emit skill_id telemetry property Jun 12, 2026

cubic-dev-ai Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread packages/cli/src/lib/telemetry.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[STG-2280] feat(cli): emit skill_id telemetry property#2251

[STG-2280] feat(cli): emit skill_id telemetry property#2251
shrey150 wants to merge 3 commits into
mainfrom
shrey/telemetry-skill-id-first-success

shrey150 commented Jun 12, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

shrey150 commented Jun 12, 2026

Uh oh!

shrey150 commented Jun 12, 2026

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

shrey150 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shrey150 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Impact if merged

Implementation notes

E2E Test Matrix

Changeset

Summary by cubic

Why no first_success

Uh oh!

changeset-bot Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shrey150 commented Jun 12, 2026

Uh oh!

shrey150 commented Jun 12, 2026

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shrey150 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shrey150 commented Jun 12, 2026 •

edited

Loading

changeset-bot Bot commented Jun 12, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading