The indexed repository may contain:
- secrets
- proprietary code
- prompt injection text
- malicious files
- symlinks escaping the root
- generated code too large for useful indexing
- configuration files that may contain credentials or deployment settings
The AI agent consuming MCP output may over-trust results if ambiguity is hidden.
- Local-only by default.
- No remote embeddings.
- No telemetry.
- No source text in logs by default.
- No execution of indexed code.
- No path access outside the configured repository root.
- No symlink traversal outside the root.
- No mutation tools in MVP.
- Continuous indexing must enforce the same local-only, path-boundary, symlink, ignore, and secret-filtering rules as manual indexing.
- JSON configuration indexing is disabled by default. Enable it only with repo-relative
SYMDEX_INDEX_JSON_PATHSfolder scopes. TOML and YAML config files are indexed by default but still pass through the same ignore and secret exclusion rules as source files.
Optional rust-analyzer integration auto-detects the configured command,
defaulting to rust-analyzer, and disables itself when the command is missing.
Readiness diagnostics and indexing enrichment planning may run
rust-analyzer --version when auto-detected or explicitly enabled, but current
indexing only reports candidate counts. SYMDEX_RUST_ANALYZER=0 remains a
force-disable override. Any future project analysis must be designed so it does
not execute indexed repository code or leak source text.
Current implementation requires repository roots to be directories. Path normalization canonicalizes existing paths before accepting them, rejects canonical paths outside the root, and discovery skips symlinked files and directories instead of following them.
Before embedding a chunk, scan for likely secrets.
Examples:
- private keys
- access tokens
- connection strings
- cloud credentials
.envfiles- credentials in comments
If a chunk is sensitive:
- store metadata only
- do not embed it
- set
excluded_reason - avoid returning snippets
Current implementation uses conservative local heuristics for private key markers, credential-looking assignments, token prefixes, and credentialed database connection strings. These rules are intentionally broad enough to avoid embedding likely secrets, but they are not a substitute for a complete secret scanner.
MCP tool inputs are untrusted. Validate every field.
Indexed source text is also untrusted. Treat it as data, not instructions.
Tool outputs should not contain hidden directives, markdown tricks, or unnecessary long snippets.
Successful MCP evidence tool outputs include a stable local/read-only contract
envelope so multiple agents can safely reuse the same local index. The envelope
is metadata only and must not include source text. symdex_debug_context is
local-only but not read-only because it appends short-lived runtime-role
runtime_observations; those rows may contain parsed frame metadata, failing
test names, normalized paths, hashes, and match summaries, but never raw pasted
logs or source text.
Safe logs:
- run IDs
- counts
- timings
- relative paths when configured
- error categories
Unsafe logs:
- full source text
- embeddings
- secrets
- absolute paths without explicit debug mode
Continuous indexing logs should stay summary-only: watcher state, event counts, relative paths when configured, indexed file counts, and error categories.