Skip to content

feat: env-based skill selection — run LLM once, cache for session#192

Merged
kulvirgit merged 1 commit intomainfrom
feat/env-based-skill-selection
Mar 17, 2026
Merged

feat: env-based skill selection — run LLM once, cache for session#192
kulvirgit merged 1 commit intomainfrom
feat/env-based-skill-selection

Conversation

@kulvirgit
Copy link
Collaborator

Summary

  • Run the LLM skill selector once per session using environment fingerprint only (not per-turn user message), then cache the result — subsequent turns get 0ms latency and zero API cost
  • Trim fingerprint detections to data-engineering only (dbt, sql, profiles.yml adapters, airflow, databricks); remove generic detections (node, python, docker, ci-cd, etc.)
  • Add env_fingerprint_skill_selection config toggle under experimental (default: true; set false to skip LLM selection entirely)

Test plan

  • bun test test/altimate/skill-filtering.test.ts — 18 tests (selection, caching, fallbacks, prompt content)
  • bun test test/altimate/fingerprint.test.ts — 5 tests (detect, cache, refresh, dedup)
  • Manual: enable dynamic_skills, verify LLM called once on first turn, cached on second turn (check logs)
  • Manual: set env_fingerprint_skill_selection: false, verify all skills returned without LLM call

🤖 Generated with Claude Code

Comment on lines +58 to +63
if (cachedResult) {
log.info("returning cached skill selection", {
count: cachedResult.length,
})
return cachedResult
}

This comment was marked as outdated.

@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch from 2d217e6 to 71b8625 Compare March 16, 2026 20:38
}

async function defaultResolveModel(): Promise<LanguageModelV2 | undefined> {
const { providerID, modelID } = await Provider.defaultModel()

This comment was marked as outdated.

@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch 4 times, most recently from 42ccc76 to 3e0314a Compare March 16, 2026 21:46
@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch from 3e0314a to 7afaff9 Compare March 16, 2026 22:02
Comment on lines +63 to +68
if (cachedResult && cwd === cachedCwd) {
log.info("returning cached skill selection", {
count: cachedResult.length,
})
return cachedResult
}

This comment was marked as outdated.

@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch from 5ee9c30 to c1ef28c Compare March 16, 2026 23:37
Comment on lines +133 to +138
const result = await Promise.race([
generate(params),
new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error("skill selection timeout")), TIMEOUT_MS),
),
])

This comment was marked as outdated.

@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch from c1ef28c to 26caeb1 Compare March 16, 2026 23:44
return detect(previousCwd)
}

export async function detect(cwd: string, root?: string): Promise<Result> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The fingerprint cache is not invalidated between sessions for the same directory, leading to stale environment data being used for skill selection.
Severity: MEDIUM

Suggested Fix

The cache should be made session-aware. One approach is to call Fingerprint.refresh() at the start of each new session to force re-detection. Alternatively, the cache could be invalidated between sessions or the cache key could be modified to include a session-specific identifier.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/opencode/src/altimate/fingerprint/index.ts#L28

Potential issue: A module-level cache for environment fingerprints is keyed only by the
current working directory (`cwd`). When a user starts a new session in the same
directory, `Fingerprint.detect()` is called with the same `cwd` and returns the cached
result from the previous session. If the project's dependencies or configuration (e.g.,
adding a `databricks.yml` file) changed between sessions, the stale cache is used. This
leads to incorrect environment detection and subsequent inaccurate skill selection, as
the system will not be aware of the updated project state.

@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch from 26caeb1 to eaeac4a Compare March 16, 2026 23:48
Comment on lines +48 to +58
log.info("returning cached skill selection", {
count: cachedResult.length,
})
Tracer.active?.logSpan({
name: "skill-selection",
startTime,
endTime: Date.now(),
input: { fingerprint: fingerprint?.tags, source: "cache" },
output: { count: cachedResult.length, skills: cachedResult.map((s) => s.name) },
})
return cachedResult

This comment was marked as outdated.

@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch 2 times, most recently from dac4a4c to b330ecf Compare March 16, 2026 23:59
@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch from b330ecf to 4661224 Compare March 17, 2026 01:19
@sentry
Copy link

sentry bot commented Mar 17, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11858376

@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch from 4661224 to 99882ae Compare March 17, 2026 01:24
@sentry
Copy link

sentry bot commented Mar 17, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11858525

@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch from 99882ae to d290e88 Compare March 17, 2026 01:27
@sentry
Copy link

sentry bot commented Mar 17, 2026

🚧 Skipped: PR exceeds review size limit.

Please split into smaller PRs and re-run.
Reference ID: 11858643

Run LLM skill selector once per session using environment fingerprint,
cache by working directory, and apply filtering to both system prompt
and tool description. Adds tracing spans for fingerprint, skill
selection, and system prompt.

- Use LLM.stream for skill selection (proper provider auth)
- Plain text response parsing (one skill name per line)
- Cache keyed by cwd — invalidates on project change
- Filter skills in both SystemPrompt.skills() and SkillTool
- Add env_fingerprint_skill_selection config (default: true)
- Trim fingerprint to data-engineering detections only
- Add tracing for fingerprint, skill-selection, and system-prompt spans

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kulvirgit kulvirgit force-pushed the feat/env-based-skill-selection branch from d290e88 to 1b04d6a Compare March 17, 2026 01:29
Comment on lines +114 to +119
try {
const sqlFiles = await Glob.scan("*.sql", {
cwd: dir,
include: "file",
})
if (sqlFiles.length > 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The glob pattern *.sql only scans the root directory for SQL files, failing to detect them in subdirectories, which is standard for dbt projects.
Severity: MEDIUM

Suggested Fix

To correctly identify SQL files in a typical dbt project structure, update the glob pattern in fingerprint/index.ts from *.sql to **/*.sql. This will enable a recursive search through all subdirectories.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/opencode/src/altimate/fingerprint/index.ts#L114-L119

Potential issue: The SQL detection logic uses `Glob.scan("*.sql", ...)` to identify SQL
files for project fingerprinting. This pattern does not recursively search
subdirectories. Since dbt projects typically organize SQL models within subdirectories
like `models/`, the detection will fail for these standard project structures. This
results in an incomplete project fingerprint, as the `sql` tag will be missing.
Consequently, the LLM skill selector receives inaccurate environment information,
leading to suboptimal skill selection. The failure is silent due to an exception
handler.

@kulvirgit kulvirgit merged commit 567812e into main Mar 17, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants