Skip to content

Commit 9aa1d39

Browse files
suryaiyer95claude
andauthored
feat: add core_failure telemetry with PII-safe input signatures (#245)
* feat: add `core_failure` telemetry with PII-safe masking Add a new `core_failure` event emitted on both soft failures (`metadata.success === false`) and uncaught tool exceptions, with privacy-preserving context for debugging: - `classifyError()` — keyword-based error classification (parse, connection, timeout, validation, permission, internal, unknown) - `computeInputSignature()` — records key names + value types/lengths, never actual values; truncates by dropping keys to preserve valid JSON - `maskArgs()` — PII masking aligned to Rust SDK: 19 sensitive keys redacted, string literals in SQL replaced with `?`, recursive object traversal Telemetry is fully isolated from tool execution — all tracking calls are wrapped in `try/catch` so telemetry failures never break tools. `Truncate.output()` runs outside the telemetry error boundary so I/O errors aren't misattributed as tool failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add `skill_used` telemetry event Tracks which skill is loaded and where it came from (`builtin`, `global`, or `project`) with duration. Wrapped in try/catch — cannot break skill loading. Docs table updated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add \`sql_execute_failure\` telemetry for SQL execution errors \`core_failure\` is for internal tool failures. SQL execution via the dispatcher is a separate concern — soft errors are returned as results (not thrown), so \`core_failure\` never fires for them. New \`sql_execute_failure\` event captures: warehouse type, query type, error message (truncated to 500 chars), and PII-masked SQL. Fires from the \`sql.execute\` handler catch path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add persistent machine ID from \`~/.altimate/machine-id\` Generated once as a random UUID and stored at \`~/.altimate/machine-id\` (alongside \`altimate.json\`, \`connections.json\`, etc.). Sent as \`machine_id\` in \`customDimensions\` on every App Insights event. No PII — pure random UUID, never tied to user identity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: correct `masked_sql` field and `ERROR_PATTERNS` ordering in telemetry - `sql_execute_failure`: use `Telemetry.maskString(params.sql)` instead of `Telemetry.maskArgs({ sql: params.sql })` — the latter serializes a JSON object string `{"sql":"..."}` rather than the raw masked SQL - `ERROR_PATTERNS`: move `permission` before `validation` so errors like "Invalid permission denied" are not misclassified as `validation_error` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * perf: skip success \`tool_call\` telemetry for file tools Read/write/edit/glob/grep/bash succeed constantly in normal operation — tracking every success is high-volume noise with no actionable signal. Failures (hard throws and soft failures) are still fully captured via \`tool_call\` (status=error) and \`core_failure\`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: clarify `core_failure` event description in telemetry docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: simplify `core_failure` description Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: mask error messages before sending to telemetry Error messages from SQL engines can embed data values (e.g. "Value 'john@email.com' does not match type INTEGER"). Apply maskString() to all error_message fields before transmission, consistent with how args are already masked. Affects: core_failure (tool.ts), sql_execute_failure (register.ts) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: security hardening for telemetry PII safety - Mask error messages in `native_call` (dispatcher.ts) and `warehouse_connect` (registry.ts) — these were sending raw error strings that could embed credentials or query fragments - Fix soft-failure `error_message` fallback: drop `result.output` as a source (raw tool output could contain file contents or secrets); fall back to `"unknown error"` instead - Strip `_retried` internal flag from App Insights payload — was leaking into `properties` on retried events - Add camelCase variants to `SENSITIVE_KEYS` (`authToken`, `bearerToken`, `jwtSecret`, etc.) — underscore prefix/suffix matching missed these - Document `machine_id` in telemetry privacy docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address major review findings in telemetry PII masking - Extend `maskString` to also mask double-quoted strings (`"John"`, `$$secret$$`-adjacent) — single-quoted-only regex was flagged as PII leak - Keep `connection` in `ERROR_PATTERNS` keywords (broad but intentional) - Truncate `masked_sql` to 2000 chars before sending — was unbounded unlike `error_message` (500) and `masked_args` (2000) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: update `core_failure` event description in telemetry reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore: add altimate_change markers to upstream-shared tool files Wrap all telemetry additions in `packages/opencode/src/tool/tool.ts` and `packages/opencode/src/tool/skill.ts` with `// altimate_change start/end` markers so the upstream marker-guard CI passes. - `tool.ts`: markers around `import { Telemetry }` and the full telemetry instrumentation block (startTime through soft-failure core_failure emission) - `skill.ts`: markers around `classifySkillSource` helper, `startTime` declaration, and the `Telemetry.track` try-catch for `skill_used` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 93eefed commit 9aa1d39

9 files changed

Lines changed: 639 additions & 20 deletions

File tree

docs/docs/reference/telemetry.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ We collect the following categories of events:
1111
| `session_start` | A new CLI session begins |
1212
| `session_end` | A CLI session ends (includes duration) |
1313
| `session_forked` | A session is forked from an existing one |
14-
| `generation` | An AI model generation completes (model ID, token counts, duration, but no prompt content) |
15-
| `tool_call` | A tool is invoked (tool name and category, but no arguments or output) |
16-
| `bridge_call` | A native tool call completes (method name and duration, but no arguments) |
14+
| `generation` | An AI model generation completes (model ID, token counts, duration no prompt content) |
15+
| `tool_call` | A tool is invoked (tool name and category no arguments or output) |
16+
| `native_call` | A native engine call completes (method name and duration no arguments) |
1717
| `command` | A CLI command is executed (command name only) |
1818
| `error` | An unhandled error occurs (error type and truncated message, but no stack traces) |
1919
| `auth_login` | Authentication succeeds or fails (provider and method, but no credentials) |
@@ -33,8 +33,11 @@ We collect the following categories of events:
3333
| `error_recovered` | Successful recovery from a transient error (error type, strategy, attempt count) |
3434
| `mcp_server_census` | MCP server capabilities after connect (tool and resource counts, but no tool names) |
3535
| `context_overflow_recovered` | Context overflow is handled (strategy) |
36+
| `skill_used` | A skill is loaded (skill name and source — `builtin`, `global`, or `project` — no skill content) |
37+
| `sql_execute_failure` | A SQL execution fails (warehouse type, query type, error message, PII-masked SQL — no raw values) |
38+
| `core_failure` | An internal tool error occurs (tool name, category, error class, truncated error message, PII-safe input signature, and optionally masked arguments — no raw values or credentials) |
3639

37-
Each event includes a timestamp, anonymous session ID, and the CLI version.
40+
Each event includes a timestamp, anonymous session ID, CLI version, and an anonymous machine ID (a random UUID stored in `~/.altimate/machine-id`, generated once and never tied to any personal information).
3841

3942
## Delivery & Reliability
4043

@@ -113,9 +116,9 @@ Event type names use **snake_case** with a `domain_action` pattern:
113116

114117
### Adding a New Event
115118

116-
1. **Define the type.** Add a new variant to the `Telemetry.Event` union in `packages/altimate-code/src/telemetry/index.ts`
117-
2. **Emit the event.** Call `Telemetry.track()` at the appropriate location
118-
3. **Update docs.** Add a row to the event table above
119+
1. **Define the type** Add a new variant to the `Telemetry.Event` union in `packages/opencode/src/altimate/telemetry/index.ts`
120+
2. **Emit the event** Call `Telemetry.track()` at the appropriate location
121+
3. **Update docs** Add a row to the event table above
119122

120123
### Privacy Checklist
121124

packages/drivers/src/sqlserver.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ import type { ConnectionConfig, Connector, ConnectorResult, SchemaColumn } from
77
export async function connect(config: ConnectionConfig): Promise<Connector> {
88
let mssql: any
99
try {
10-
// @ts-expect-error — optional dependency, loaded at runtime
1110
mssql = await import("mssql")
1211
mssql = mssql.default || mssql
1312
} catch {

packages/opencode/src/altimate/native/connections/register.ts

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,8 @@ register("sql.execute", async (params: SqlExecuteParams): Promise<SqlExecuteResu
228228
} catch {}
229229
return result
230230
} catch (e) {
231+
const errorMsg = String(e)
232+
const maskedErrorMsg = Telemetry.maskString(errorMsg).slice(0, 500)
231233
try {
232234
Telemetry.track({
233235
type: "warehouse_query",
@@ -239,11 +241,21 @@ register("sql.execute", async (params: SqlExecuteParams): Promise<SqlExecuteResu
239241
duration_ms: Date.now() - startTime,
240242
row_count: 0,
241243
truncated: false,
242-
error: String(e).slice(0, 500),
244+
error: maskedErrorMsg,
243245
error_category: categorizeQueryError(e),
244246
})
247+
Telemetry.track({
248+
type: "sql_execute_failure",
249+
timestamp: Date.now(),
250+
session_id: Telemetry.getContext().sessionId,
251+
warehouse_type: warehouseType,
252+
query_type: detectQueryType(params.sql),
253+
error_message: maskedErrorMsg,
254+
masked_sql: Telemetry.maskString(params.sql).slice(0, 2000),
255+
duration_ms: Date.now() - startTime,
256+
})
245257
} catch {}
246-
return { columns: [], rows: [], row_count: 0, truncated: false, error: String(e) } as SqlExecuteResult & { error: string }
258+
return { columns: [], rows: [], row_count: 0, truncated: false, error: errorMsg } as SqlExecuteResult & { error: string }
247259
}
248260
})
249261

packages/opencode/src/altimate/native/connections/registry.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -291,7 +291,7 @@ export async function get(name: string): Promise<Connector> {
291291
auth_method: detectAuthMethod(config),
292292
success: false,
293293
duration_ms: Date.now() - startTime,
294-
error: String(e).slice(0, 500),
294+
error: Telemetry.maskString(String(e)).slice(0, 500),
295295
error_category: categorizeConnectionError(e),
296296
})
297297
} catch {}

packages/opencode/src/altimate/native/dispatcher.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ export async function call<M extends BridgeMethod>(
7575
method: method as string,
7676
status: "error",
7777
duration_ms: Date.now() - startTime,
78-
error: String(e).slice(0, 500),
78+
error: Telemetry.maskString(String(e)).slice(0, 500),
7979
})
8080
} catch {
8181
// Telemetry must never prevent error propagation

packages/opencode/src/altimate/telemetry/index.ts

Lines changed: 181 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@ import { Account } from "@/account"
22
import { Config } from "@/config/config"
33
import { Installation } from "@/installation"
44
import { Log } from "@/util/log"
5-
import { createHash } from "crypto"
5+
import { createHash, randomUUID } from "crypto"
6+
import fs from "fs"
7+
import path from "path"
8+
import os from "os"
69

710
const log = Log.create({ service: "telemetry" })
811

@@ -63,6 +66,7 @@ export namespace Telemetry {
6366
duration_ms: number
6467
sequence_index: number
6568
previous_tool: string | null
69+
input_signature?: string
6670
error?: string
6771
}
6872
| {
@@ -331,6 +335,166 @@ export namespace Telemetry {
331335
has_ssh_tunnel: boolean
332336
has_keychain: boolean
333337
}
338+
| {
339+
type: "skill_used"
340+
timestamp: number
341+
session_id: string
342+
message_id: string
343+
skill_name: string
344+
skill_source: "builtin" | "global" | "project"
345+
duration_ms: number
346+
}
347+
| {
348+
type: "sql_execute_failure"
349+
timestamp: number
350+
session_id: string
351+
warehouse_type: string
352+
query_type: string
353+
error_message: string
354+
masked_sql: string
355+
duration_ms: number
356+
}
357+
| {
358+
type: "core_failure"
359+
timestamp: number
360+
session_id: string
361+
tool_name: string
362+
tool_category: string
363+
error_class:
364+
| "parse_error"
365+
| "connection"
366+
| "timeout"
367+
| "validation"
368+
| "internal"
369+
| "permission"
370+
| "unknown"
371+
error_message: string
372+
input_signature: string
373+
masked_args?: string
374+
duration_ms: number
375+
}
376+
377+
const ERROR_PATTERNS: Array<{
378+
class: Telemetry.Event & { type: "core_failure" } extends { error_class: infer C } ? C : never
379+
keywords: string[]
380+
}> = [
381+
{ class: "parse_error", keywords: ["parse", "syntax", "binder", "unexpected token", "sqlglot"] },
382+
{
383+
class: "connection",
384+
keywords: ["econnrefused", "connection", "socket", "enotfound", "econnreset"],
385+
},
386+
{ class: "timeout", keywords: ["timeout", "etimedout", "bridge timeout", "timed out"] },
387+
{ class: "permission", keywords: ["permission", "denied", "unauthorized", "forbidden"] },
388+
{ class: "validation", keywords: ["invalid params", "invalid", "missing", "required"] },
389+
{ class: "internal", keywords: ["internal", "assertion"] },
390+
]
391+
392+
export function classifyError(
393+
message: string,
394+
): Telemetry.Event & { type: "core_failure" } extends { error_class: infer C } ? C : never {
395+
const lower = message.toLowerCase()
396+
for (const { class: cls, keywords } of ERROR_PATTERNS) {
397+
if (keywords.some((kw) => lower.includes(kw))) return cls
398+
}
399+
return "unknown"
400+
}
401+
402+
export function computeInputSignature(args: Record<string, unknown>): string {
403+
const sig: Record<string, string> = {}
404+
for (const [k, v] of Object.entries(args)) {
405+
if (v === null || v === undefined) {
406+
sig[k] = "null"
407+
} else if (typeof v === "string") {
408+
sig[k] = `string:${v.length}`
409+
} else if (typeof v === "number") {
410+
sig[k] = "number"
411+
} else if (typeof v === "boolean") {
412+
sig[k] = "boolean"
413+
} else if (Array.isArray(v)) {
414+
sig[k] = `array:${v.length}`
415+
} else if (typeof v === "object") {
416+
sig[k] = `object:${Object.keys(v).length}`
417+
} else {
418+
sig[k] = typeof v
419+
}
420+
}
421+
const result = JSON.stringify(sig)
422+
if (result.length <= 1000) return result
423+
// Drop keys from the end until the JSON fits, preserving valid JSON structure
424+
const keys = Object.keys(sig)
425+
while (keys.length > 0) {
426+
keys.pop()
427+
const truncated: Record<string, string> = {}
428+
for (const k of keys) truncated[k] = sig[k]
429+
truncated["..."] = `${Object.keys(sig).length - keys.length} more`
430+
const out = JSON.stringify(truncated)
431+
if (out.length <= 1000) return out
432+
}
433+
return JSON.stringify({ "...": `${Object.keys(sig).length} keys` })
434+
}
435+
436+
// Mirrors altimate-sdk (Rust) SENSITIVE_KEYS — keep in sync.
437+
const SENSITIVE_KEYS: string[] = [
438+
"key", "api_key", "apikey", "apiKey", "token", "access_token", "refresh_token",
439+
"secret", "secret_key", "password", "passwd", "pwd",
440+
"credential", "credentials", "authorization", "auth",
441+
"signature", "sig", "private_key", "connection_string",
442+
// camelCase variants not caught by prefix/suffix matching
443+
"authtoken", "accesstoken", "refreshtoken", "bearertoken", "jwttoken",
444+
"jwtsecret", "clientsecret", "appsecret",
445+
]
446+
447+
function isSensitiveKey(key: string): boolean {
448+
const lower = key.toLowerCase()
449+
return SENSITIVE_KEYS.some(
450+
(k) => lower === k || lower.endsWith(`_${k}`) || lower.startsWith(`${k}_`),
451+
)
452+
}
453+
454+
export function maskString(s: string): string {
455+
return s
456+
.replace(/'(?:[^'\\]|\\.)*'/g, "?")
457+
.replace(/"(?:[^"\\]|\\.)*"/g, "?")
458+
.replace(/\s+/g, " ")
459+
.trim()
460+
}
461+
462+
function maskValue(value: unknown, key?: string): unknown {
463+
if (key && isSensitiveKey(key)) return "****"
464+
if (typeof value === "string") return maskString(value)
465+
if (Array.isArray(value)) return value.map((v) => maskValue(v, key))
466+
if (value !== null && typeof value === "object") {
467+
const masked: Record<string, unknown> = {}
468+
for (const [k, v] of Object.entries(value as Record<string, unknown>)) {
469+
masked[k] = maskValue(v, k)
470+
}
471+
return masked
472+
}
473+
return value
474+
}
475+
476+
/** PII-mask tool arguments for failure telemetry.
477+
* Mirrors altimate-sdk mask_value: sensitive keys → "****",
478+
* string literals in SQL → ?, whitespace collapsed. Truncates to 2000 chars. */
479+
export function maskArgs(args: Record<string, unknown>): string {
480+
const masked: Record<string, unknown> = {}
481+
for (const [k, v] of Object.entries(args)) {
482+
masked[k] = maskValue(v, k)
483+
}
484+
const result = JSON.stringify(masked)
485+
if (result.length <= 2000) return result
486+
// Drop keys from the end until valid JSON fits, same approach as computeInputSignature
487+
const keys = Object.keys(masked)
488+
while (keys.length > 0) {
489+
keys.pop()
490+
const truncated: Record<string, unknown> = {}
491+
for (const k of keys) truncated[k] = masked[k]
492+
truncated["..."] = `${Object.keys(masked).length - keys.length} more`
493+
const out = JSON.stringify(truncated)
494+
if (out.length <= 2000) return out
495+
}
496+
return JSON.stringify({ "...": `${Object.keys(masked).length} keys` })
497+
}
334498

335499
const FILE_TOOLS = new Set(["read", "write", "edit", "glob", "grep", "bash"])
336500

@@ -373,6 +537,7 @@ export namespace Telemetry {
373537
let buffer: Event[] = []
374538
let flushTimer: ReturnType<typeof setInterval> | undefined
375539
let userEmail = ""
540+
let machineId = ""
376541
let sessionId = ""
377542
let projectId = ""
378543
let appInsights: AppInsightsConfig | undefined
@@ -402,12 +567,13 @@ export namespace Telemetry {
402567
const properties: Record<string, string> = {
403568
cli_version: Installation.VERSION,
404569
project_id: fields.project_id ?? projectId,
570+
...(machineId && { machine_id: machineId }),
405571
}
406572
const measurements: Record<string, number> = {}
407573

408574
// Flatten all fields — nested `tokens` object gets prefixed keys
409575
for (const [k, v] of Object.entries(fields)) {
410-
if (k === "session_id" || k === "project_id") continue
576+
if (k === "session_id" || k === "project_id" || k === "_retried") continue
411577
if (k === "tokens" && typeof v === "object" && v !== null) {
412578
for (const [tk, tv] of Object.entries(v as Record<string, unknown>)) {
413579
if (typeof tv === "number") measurements[`tokens_${tk}`] = tv
@@ -490,6 +656,18 @@ export namespace Telemetry {
490656
} catch {
491657
// Account unavailable — proceed without user ID
492658
}
659+
try {
660+
const machineIdPath = path.join(os.homedir(), ".altimate", "machine-id")
661+
try {
662+
machineId = fs.readFileSync(machineIdPath, "utf8").trim()
663+
} catch {
664+
machineId = randomUUID()
665+
fs.mkdirSync(path.dirname(machineIdPath), { recursive: true })
666+
fs.writeFileSync(machineIdPath, machineId, "utf8")
667+
}
668+
} catch {
669+
// Machine ID unavailable — proceed without it
670+
}
493671
enabled = true
494672
log.info("telemetry initialized", { mode: "appinsights" })
495673
const timer = setInterval(flush, FLUSH_INTERVAL_MS)
@@ -591,6 +769,7 @@ export namespace Telemetry {
591769
droppedEvents = 0
592770
sessionId = ""
593771
projectId = ""
772+
machineId = ""
594773
initPromise = undefined
595774
initDone = false
596775
}

packages/opencode/src/tool/skill.ts

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,18 @@ import { iife } from "@/util/iife"
99
import { Fingerprint } from "../altimate/fingerprint"
1010
import { Config } from "../config/config"
1111
import { selectSkillsWithLLM } from "../altimate/skill-selector"
12+
import { Telemetry } from "../altimate/telemetry"
13+
import os from "os"
1214

1315
const MAX_DISPLAY_SKILLS = 50
16+
17+
// altimate_change start — classifySkillSource helper for skill telemetry
18+
function classifySkillSource(location: string): "builtin" | "global" | "project" {
19+
if (location.includes("node_modules") || location.includes(".altimate/builtin")) return "builtin"
20+
if (location.startsWith(os.homedir())) return "global"
21+
return "project"
22+
}
23+
// altimate_change end
1424
// altimate_change end
1525

1626
export const SkillTool = Tool.define("skill", async (ctx) => {
@@ -83,6 +93,9 @@ export const SkillTool = Tool.define("skill", async (ctx) => {
8393
description,
8494
parameters,
8595
async execute(params: z.infer<typeof parameters>, ctx) {
96+
// altimate_change start — telemetry: startTime for skill_used duration
97+
const startTime = Date.now()
98+
// altimate_change end
8699
// altimate_change start - use upstream Skill.get() for exact name lookup
87100
const skill = await Skill.get(params.name)
88101

@@ -122,6 +135,22 @@ export const SkillTool = Tool.define("skill", async (ctx) => {
122135
return arr
123136
}).then((f) => f.map((file) => `<file>${file}</file>`).join("\n"))
124137

138+
// altimate_change start — telemetry instrumentation for skill loading
139+
try {
140+
Telemetry.track({
141+
type: "skill_used",
142+
timestamp: Date.now(),
143+
session_id: ctx.sessionID,
144+
message_id: ctx.messageID,
145+
skill_name: skill.name,
146+
skill_source: classifySkillSource(skill.location),
147+
duration_ms: Date.now() - startTime,
148+
})
149+
} catch {
150+
// Telemetry must never break skill loading
151+
}
152+
// altimate_change end
153+
125154
return {
126155
title: `Loaded skill: ${skill.name}`,
127156
output: [

0 commit comments

Comments
 (0)