Skip to content

fix(engine): tighten input reference scanning to reduce false matches#61

Open
mvanhorn wants to merge 3 commits intodanshapiro:mainfrom
mvanhorn:osc/48-input-materialization-overmatch
Open

fix(engine): tighten input reference scanning to reduce false matches#61
mvanhorn wants to merge 3 commits intodanshapiro:mainfrom
mvanhorn:osc/48-input-materialization-overmatch

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Fixes #48

Summary

Input materialization over-classified arbitrary tokens as path/glob references, producing ~1397 warnings per run and materializing irrelevant files. This fixes three root causes.

Changes

1. Reject tokens with spaces (input_reference_scan.go)

Natural language fragments like "you are wielding ([ch]) [weapon" contain spaces and are clearly not file paths. Added an early rejection for tokens containing whitespace in looksLikeReferenceToken().

2. Validate bracket syntax (input_reference_scan.go)

Replaced the broad ContainsAny(token, "*?[") check with a new looksLikeGlobPattern() function that:

  • Accepts * and ? as glob metacharacters unconditionally
  • For [, validates that every [ has a matching ] (rejects DEFAULT_TOOL_LIMITS[tool_name)
  • Requires a path separator (/ or \) when brackets are the only metacharacter (rejects map[string]any style programming constructs)

3. Expand artifact exclusions (input_materialization.go)

Added .worktrees and .cargo-target to isLikelyArtifactInputPath(), preventing materialization of worktree and Cargo build cache artifacts.

Test plan

  • TestInputReferenceScan_RejectsNaturalLanguageWithBrackets - regression test for issue's bad-token examples
  • TestInputReferenceScan_AcceptsValidGlobBrackets - valid globs like src/[abc]/*.go still work
  • TestIsLikelyArtifactInputPath_ExcludesWorktrees - .worktrees and .cargo-target excluded
  • All 25 existing materialization/reference tests pass
  • go build ./... compiles cleanly

This contribution was developed with AI assistance (Claude Code).

Fixes three sources of false-positive reference classification:

1. Reject tokens containing spaces as non-path natural language
2. Replace broad ContainsAny("*?[") with looksLikeGlobPattern() that
   validates bracket syntax: unmatched [ is rejected, and matched [...]
   requires a path separator to disambiguate from programming constructs
3. Add .worktrees and .cargo-target to artifact path exclusions

Fixes danshapiro#48

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mvanhorn and others added 2 commits March 9, 2026 20:40
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…i/ paths

- Add gpt-5.3-codex-spark to cliOnlyModelIDs map (fixes TestIsCLIOnlyModel)
- Replace root .ai/verify_errors.log and .ai/test-evidence/latest/ with
  run-scoped .ai/runs/$KILROY_RUN_ID/ paths in demo spec (fixes
  TestReferenceSurfaces_NoLegacyRootAIScratchPaths)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

input materialization over-matches references, creating warning storms and irrelevant materialization

1 participant