fix: enforce 3000-file cap on _auto_setup fallback path (#34)#86
Merged
Conversation
Issue #34: _auto_setup's subprocess scan passes --max-files 3000 on the CLI, but commands/scan.add_args did not register --max-files, so argparse rejected it (exit 2) every time. The fallback cmd_scan(workspace, incremental=False) was therefore ALWAYS taken — with no cap and no timeout — while the result hint still claimed 'Auto-setup capped at 3000 files'. On huge repos this could hang indefinitely. Fix: - Add max_files: Optional[int] param to cmd_scan + _cap_discovered_files helper that truncates per-category file lists so total <= max_files. - Register --max-files in commands/scan.add_args so the subprocess path actually works (previously dead code). - Rework _auto_setup so the in-process fallback calls cmd_scan(..., max_files=_AUTO_SETUP_MAX_FILES) — same cap as the subprocess path. - Surface capped and fallback flags on result['_auto_setup'] so MCP clients/agents can tell which path produced the registry and whether the cap was hit (explicitly requested in issue #34). Tests (tests/test_cli.py::TestAutoSetupFallbackCap): - test_fallback_passes_max_files_cap: monkeypatch subprocess.run to raise, spy on cmd_scan, assert max_files=3000 is passed. - test_fallback_sets_capped_and_fallback_flags: 3001-file workspace + forced fallback, drive full codelens.main() flow, assert result['_auto_setup']['capped'] is True and ['fallback'] is True. - test_main_path_no_fallback_when_subprocess_succeeds: sanity guard that the main path still works (fallback=False, capped=False for a small workspace) and the flags are always present in the schema.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
# Conflicts: # tests/test_cli.py
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Fixes #34.
_auto_setupinscripts/codelens.pyruns the scan via a subprocess with--max-files 3000as a timeout guard (documented in its own docstring). But when that subprocess failed / exited non-zero, the fallback calledcmd_scan(workspace, incremental=False)with no cap and no timeout — so on huge repos (tens of thousands of files) auto-setup could hang indefinitely, while the result hint still claimed "Auto-setup capped at 3000 files". That was a lie.Worse, while investigating I found that
commands/scan.add_argsdid not register--max-filesat all, so argparse rejected the subprocess command withunrecognized arguments: --max-files 3000(exit 2) every time. The "main" subprocess path was effectively dead code, and the uncapped fallback was the only path actually exercised.Fix
Cap 3000 files (or equivalent) now applies consistently to both the subprocess path and the in-process fallback — no silent unprotected path.
Files changed
scripts/commands/scan.py--max-filesargument toadd_argsso the subprocess path actually accepts it (previously rejected by argparse → dead code).max_files: Optional[int] = Noneparameter tocmd_scan._cap_discovered_files(files, max_files)helper that truncates per-category file lists so the total ≤max_files. Applied afterdiscover_files, before parsing —os.walkcost is unchanged but parsing/registry-build cost is bounded.execute()to passargs.max_filesthrough tocmd_scan.scripts/codelens.py(_auto_setup+main())cmd_scan(workspace, incremental=False, max_files=_AUTO_SETUP_MAX_FILES)— same 3000-file cap as the subprocess path.fallbackflag (True iff the in-process fallback was taken).capped = total_files >= _AUTO_SETUP_MAX_FILESconsistently across both paths.cappedandfallbackin the_auto_setup()result dict.main()propagatescappedandfallbackfrom_auto_setup()'s return value intoauto_setup_info, which becomesresult["_auto_setup"]— so MCP clients / agents can tell which path produced the registry and whether the cap was hit (explicitly requested in issue [BUG-06] Auto-setup fallback ignores --max-files cap — defeats timeout protection #34).__import__("subprocess")to a normalimport subprocess(KISS, no dead code).tests/test_cli.py— newTestAutoSetupFallbackCapclass with 3 regression tests (see below).Constraints satisfied
max_files=_AUTO_SETUP_MAX_FILEStocmd_scan, same as the subprocess main path.result["_auto_setup"]["capped"]andresult["_auto_setup"]["fallback"]flags exposed for MCP clients.cmd_scan.--max-filesis a registered arg), instead of silently failing every time.Definition of Done
1. Regression test: subprocess fails -> fallback calls
cmd_scanwithmax_files=3000tests/test_cli.py::TestAutoSetupFallbackCap::test_fallback_passes_max_files_capMonkeypatches
subprocess.runto raiseSubprocessError, spies oncmd_scanto capture themax_fileskwarg, callscodelens._auto_setup(ws), and assertscaptured["max_files"] == 3000(notNone).2. Test verifies
result["_auto_setup"]containscapped=Trueandfallback=Truewhen fallback is takentests/test_cli.py::TestAutoSetupFallbackCap::test_fallback_sets_capped_and_fallback_flagsBuilds a workspace with 3001
.pyfiles, forcessubprocess.runto raise (-> fallback path), drives the fullcodelens.main()flow in-process withsys.argv = ["codelens.py", "list", ws, "--format", "json"], captures stdout viacapsys, parses the JSON, and asserts:result["_auto_setup"]["fallback"] is Trueresult["_auto_setup"]["capped"] is True(Plus a third sanity test,
test_main_path_no_fallback_when_subprocess_succeeds, that confirms the main path still works withfallback=False/capped=Falsefor a small workspace, and that the flags are always present in the schema.)3. Full test suite
PYTHONPATH=scripts python3 -m pytest tests/ -vThe new tests all pass and introduce zero regressions — the same set of pre-existing environmental failures (missing tree-sitter / SQLite graph_model dependencies in this sandbox) occurs identically before and after this PR. Diff of the FAILED lists is empty.
New tests (verbose)
Full
tests/test_cli.py(verbose)Full suite summary (excluding
tests/test_integration.pywhich requires a live network/grammar setup)The 37 failures are pre-existing and environmental (this sandbox lacks
tree-sittergrammars and the SQLitegraph_modelextension that thetest_graph_model/test_graph_incremental/test_architecture/test_hybrid_type_resolversuites require). Verified by stashing this PR's changes and re-running onorigin/main:diff <(baseline FAILED list) <(after-change FAILED list)is empty — zero regressions. The +3 in passed count (801 -> 804) is exactly the 3 newTestAutoSetupFallbackCaptests.Notes
skill.jsonversion (no new command/engine added — bug fix only, per CONTRIBUTING.md "Update skill.json version if adding new commands").CHANGELOG.md(top-levelCHANGELOG.mdis for release notes; the maintainers' release process bundles changes per release).