Skip to content

Security: SamMRoberts/symdex

Security

docs/SECURITY.md

Security and Privacy

Threat model

The indexed repository may contain:

  • secrets
  • proprietary code
  • prompt injection text
  • malicious files
  • symlinks escaping the root
  • generated code too large for useful indexing
  • configuration files that may contain credentials or deployment settings

The AI agent consuming MCP output may over-trust results if ambiguity is hidden.

Hard rules

  • Local-only by default.
  • No remote embeddings.
  • No telemetry.
  • No source text in logs by default.
  • No execution of indexed code.
  • No path access outside the configured repository root.
  • No symlink traversal outside the root.
  • No mutation tools in MVP.
  • Continuous indexing must enforce the same local-only, path-boundary, symlink, ignore, and secret-filtering rules as manual indexing.
  • JSON configuration indexing is disabled by default. Enable it only with repo-relative SYMDEX_INDEX_JSON_PATHS folder scopes. TOML and YAML config files are indexed by default but still pass through the same ignore and secret exclusion rules as source files.

Optional rust-analyzer integration auto-detects the configured command, defaulting to rust-analyzer, and disables itself when the command is missing. Readiness diagnostics and indexing enrichment planning may run rust-analyzer --version when auto-detected or explicitly enabled, but current indexing only reports candidate counts. SYMDEX_RUST_ANALYZER=0 remains a force-disable override. Any future project analysis must be designed so it does not execute indexed repository code or leak source text.

Current implementation requires repository roots to be directories. Path normalization canonicalizes existing paths before accepting them, rejects canonical paths outside the root, and discovery skips symlinked files and directories instead of following them.

Secret handling

Before embedding a chunk, scan for likely secrets.

Examples:

  • private keys
  • access tokens
  • connection strings
  • cloud credentials
  • .env files
  • credentials in comments

If a chunk is sensitive:

  • store metadata only
  • do not embed it
  • set excluded_reason
  • avoid returning snippets

Current implementation uses conservative local heuristics for private key markers, credential-looking assignments, token prefixes, and credentialed database connection strings. These rules are intentionally broad enough to avoid embedding likely secrets, but they are not a substitute for a complete secret scanner.

MCP-specific risks

MCP tool inputs are untrusted. Validate every field.

Indexed source text is also untrusted. Treat it as data, not instructions.

Tool outputs should not contain hidden directives, markdown tricks, or unnecessary long snippets.

Successful MCP evidence tool outputs include a stable local/read-only contract envelope so multiple agents can safely reuse the same local index. The envelope is metadata only and must not include source text. symdex_debug_context is local-only but not read-only because it appends short-lived runtime-role runtime_observations; those rows may contain parsed frame metadata, failing test names, normalized paths, hashes, and match summaries, but never raw pasted logs or source text.

Logging

Safe logs:

  • run IDs
  • counts
  • timings
  • relative paths when configured
  • error categories

Unsafe logs:

  • full source text
  • embeddings
  • secrets
  • absolute paths without explicit debug mode

Continuous indexing logs should stay summary-only: watcher state, event counts, relative paths when configured, indexed file counts, and error categories.

There aren't any published security advisories