Prefilter-aware regex / OR / files-without-match search (bounded memory)

## Summary

agentgrep prunes file-backed sources before parsing them with a ripgrep
prefilter (`prefilter_sources_by_root`): one `rg`/`ag` pass per
discovered search root keeps only the sources whose root matched the
query terms. Three opt-in modes defeat or strain that prefilter and fall
back to a (near) full-corpus parse that materialises every matching
record in memory (~1.3 GB RSS observed on a full-scan run):

- **OR / any-term** — unions the per-term match sets, so the surviving
  source set balloons toward the whole corpus.
- **regex terms** — broad patterns match a large fraction of roots.
- **files-without-match (`-L`)** — by definition must consider every
  planned source, so nothing can be pruned.

Because these can't yet be served within bounded memory, their public
toggles are being removed: the CLI `search --regex` / `--any` flags, the
`regex` / `any_term` parameters on the MCP `search` and `validate_query`
tools, and `grep -L`. `grep` and `find` stay regex-by-default — that
path runs through ripgrep itself and is unaffected. This issue tracks
reintroducing the three modes without the memory cliff.

## Where it lives (v0.1.0a8)

- prefilter: https://github.com/tony/agentgrep/blob/v0.1.0a8/src/agentgrep/__init__.py#L2659-L2694
- per-root grep + OR union: https://github.com/tony/agentgrep/blob/v0.1.0a8/src/agentgrep/__init__.py#L2697-L2737
- non-prefiltered source inclusion: https://github.com/tony/agentgrep/blob/v0.1.0a8/src/agentgrep/__init__.py#L2740-L2774
- record matcher (regex / any_term): https://github.com/tony/agentgrep/blob/v0.1.0a8/src/agentgrep/__init__.py#L4023-L4037

## Proposed approach

- **Regex:** derive literal atoms from each pattern to seed the
  prefilter; fall back to a bounded scan only when no atom can be
  extracted.
- **OR / any-term:** prefilter per term and stream the union, capping
  the number of sources parsed concurrently instead of materialising all
  records at once.
- **`-L` (files-without-match):** compute the no-match complement
  against the enumerated / prefiltered source set rather than parsing
  every source's contents.
- **Memory:** stream records through dedupe with a bounded working set
  (and/or an explicit `--max-sources` guard) instead of building one big
  in-memory dict.

## Acceptance criteria

- The three modes return correct results with peak RSS bounded
  regardless of corpus size.
- Re-expose the toggles (CLI + MCP) and restore the docs and the MCP
  server-instruction lines only once the bounded-memory path lands.
- Regression coverage over a large synthetic corpus asserting a memory
  ceiling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefilter-aware regex / OR / files-without-match search (bounded memory) #32

Summary

Where it lives (v0.1.0a8)

Proposed approach

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Prefilter-aware regex / OR / files-without-match search (bounded memory) #32

Description

Summary

Where it lives (v0.1.0a8)

Proposed approach

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions