fix: enforce 3000-file cap on _auto_setup fallback path (#34) by Wolfvin · Pull Request #86 · Wolfvin/CodeLens

Wolfvin · 2026-06-29T15:48:20Z

Summary

Fixes #34.

_auto_setup in scripts/codelens.py runs the scan via a subprocess with --max-files 3000 as a timeout guard (documented in its own docstring). But when that subprocess failed / exited non-zero, the fallback called cmd_scan(workspace, incremental=False) with no cap and no timeout — so on huge repos (tens of thousands of files) auto-setup could hang indefinitely, while the result hint still claimed "Auto-setup capped at 3000 files". That was a lie.

Worse, while investigating I found that commands/scan.add_args did not register --max-files at all, so argparse rejected the subprocess command with unrecognized arguments: --max-files 3000 (exit 2) every time. The "main" subprocess path was effectively dead code, and the uncapped fallback was the only path actually exercised.

Fix

Cap 3000 files (or equivalent) now applies consistently to both the subprocess path and the in-process fallback — no silent unprotected path.

Files changed

scripts/commands/scan.py
- Added --max-files argument to add_args so the subprocess path actually accepts it (previously rejected by argparse → dead code).
- Added max_files: Optional[int] = None parameter to cmd_scan.
- Added _cap_discovered_files(files, max_files) helper that truncates per-category file lists so the total ≤ max_files. Applied after discover_files, before parsing — os.walk cost is unchanged but parsing/registry-build cost is bounded.
- Wired execute() to pass args.max_files through to cmd_scan.
scripts/codelens.py (_auto_setup + main())
- The in-process fallback now calls cmd_scan(workspace, incremental=False, max_files=_AUTO_SETUP_MAX_FILES) — same 3000-file cap as the subprocess path.
- Tracks a fallback flag (True iff the in-process fallback was taken).
- Computes capped = total_files >= _AUTO_SETUP_MAX_FILES consistently across both paths.
- Returns both capped and fallback in the _auto_setup() result dict.
- main() propagates capped and fallback from _auto_setup()'s return value into auto_setup_info, which becomes result["_auto_setup"] — so MCP clients / agents can tell which path produced the registry and whether the cap was hit (explicitly requested in issue [BUG-06] Auto-setup fallback ignores --max-files cap — defeats timeout protection #34).
- Switched __import__("subprocess") to a normal import subprocess (KISS, no dead code).
- Updated docstring to reflect actual behavior.
tests/test_cli.py — new TestAutoSetupFallbackCap class with 3 regression tests (see below).

Constraints satisfied

Fallback passes max_files=_AUTO_SETUP_MAX_FILES to cmd_scan, same as the subprocess main path.
result["_auto_setup"]["capped"] and result["_auto_setup"]["fallback"] flags exposed for MCP clients.
KISS — single cap helper, single cap enforcement point in cmd_scan.
No dead code — the subprocess path now actually works (--max-files is a registered arg), instead of silently failing every time.

Definition of Done

1. Regression test: subprocess fails -> fallback calls `cmd_scan` with `max_files=3000`

tests/test_cli.py::TestAutoSetupFallbackCap::test_fallback_passes_max_files_cap

Monkeypatches subprocess.run to raise SubprocessError, spies on cmd_scan to capture the max_files kwarg, calls codelens._auto_setup(ws), and asserts captured["max_files"] == 3000 (not None).

2. Test verifies `result["_auto_setup"]` contains `capped=True` and `fallback=True` when fallback is taken

tests/test_cli.py::TestAutoSetupFallbackCap::test_fallback_sets_capped_and_fallback_flags

Builds a workspace with 3001 .py files, forces subprocess.run to raise (-> fallback path), drives the full codelens.main() flow in-process with sys.argv = ["codelens.py", "list", ws, "--format", "json"], captures stdout via capsys, parses the JSON, and asserts:

result["_auto_setup"]["fallback"] is True
result["_auto_setup"]["capped"] is True

(Plus a third sanity test, test_main_path_no_fallback_when_subprocess_succeeds, that confirms the main path still works with fallback=False / capped=False for a small workspace, and that the flags are always present in the schema.)

3. Full test suite

PYTHONPATH=scripts python3 -m pytest tests/ -v

The new tests all pass and introduce zero regressions — the same set of pre-existing environmental failures (missing tree-sitter / SQLite graph_model dependencies in this sandbox) occurs identically before and after this PR. Diff of the FAILED lists is empty.

New tests (verbose)

$ PYTHONPATH=scripts python3 -m pytest tests/test_cli.py::TestAutoSetupFallbackCap -v -o "addopts="
============================= test session starts ==============================
platform linux -- Python 3.12.13, pytest-9.0.2, pluggy-1.6.0 -- /home/z/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/z/my-project/repos/CodeLens
configfile: pytest.ini
plugins: Faker-40.1.2, metadata-3.1.1, asyncio-1.3.0, ddtrace-4.2.2, cov-7.0.0, json-report-1.5.0, anyio-4.13.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 3 items

tests/test_cli.py::TestAutoSetupFallbackCap::test_fallback_passes_max_files_cap PASSED [ 33%]
tests/test_cli.py::TestAutoSetupFallbackCap::test_fallback_sets_capped_and_fallback_flags PASSED [ 66%]
tests/test_cli.py::TestAutoSetupFallbackCap::test_main_path_no_fallback_when_subprocess_succeeds PASSED [100%]

============================== 3 passed in 1.62s ===============================

Full `tests/test_cli.py` (verbose)

$ PYTHONPATH=scripts python3 -m pytest tests/test_cli.py -v -o "addopts="
============================= test session starts ==============================
platform linux -- Python 3.12.13, pytest-9.0.2, pluggy-1.6.0 -- /home/z/.venv/bin/python3
collecting ... collected 21 items

tests/test_cli.py::TestCmdInit::test_init_creates_codelens_dir PASSED    [  4%]
tests/test_cli.py::TestCmdInit::test_init_creates_config PASSED          [  9%]
tests/test_cli.py::TestCmdScan::test_scan_workspace PASSED               [ 14%]
tests/test_cli.py::TestCmdScan::test_scan_creates_registry PASSED        [ 19%]
tests/test_cli.py::TestCmdScan::test_scan_finds_classes PASSED           [ 23%]
tests/test_cli.py::TestCmdScan::test_scan_finds_ids PASSED               [ 28%]
tests/test_cli.py::TestCmdQuery::test_query_existing_id PASSED           [ 33%]
tests/test_cli.py::TestCmdQuery::test_query_existing_class PASSED        [ 38%]
tests/test_cli.py::TestCmdQuery::test_query_nonexistent PASSED           [ 42%]
tests/test_cli.py::TestCmdQuery::test_query_backend_function PASSED      [ 47%]
tests/test_cli.py::TestCmdQuery::test_query_auto_detect_domain PASSED    [ 52%]
tests/test_cli.py::TestCmdList::test_list_all PASSED                     [ 57%]
tests/test_cli.py::TestCmdList::test_list_dead PASSED                    [ 61%]
tests/test_cli.py::TestCmdList::test_list_frontend_only PASSED           [ 66%]
tests/test_cli.py::TestCmdList::test_list_backend_only PASSED            [ 71%]
tests/test_cli.py::TestCheckCommandArgs::test_check_accepts_positional_workspace PASSED [ 76%]
tests/test_cli.py::TestCheckCommandArgs::test_check_workspace_optional PASSED          [ 80%]
tests/test_cli.py::TestCheckCommandArgs::test_check_full_cli_invocation_with_positional PASSED [ 85%]
tests/test_cli.py::TestAutoSetupFallbackCap::test_fallback_passes_max_files_cap PASSED [ 90%]
tests/test_cli.py::TestAutoSetupFallbackCap::test_fallback_sets_capped_and_fallback_flags PASSED [ 95%]
tests/test_cli.py::TestAutoSetupFallbackCap::test_main_path_no_fallback_when_subprocess_succeeds PASSED [100%]

============================== 21 passed in 2.57s ==============================

Full suite summary (excluding `tests/test_integration.py` which requires a live network/grammar setup)

$ PYTHONPATH=scripts python3 -m pytest tests/ --ignore=tests/test_integration.py --tb=no -q
...
================= 37 failed, 804 passed, 14 skipped in 15.14s ==================

The 37 failures are pre-existing and environmental (this sandbox lacks tree-sitter grammars and the SQLite graph_model extension that the test_graph_model / test_graph_incremental / test_architecture / test_hybrid_type_resolver suites require). Verified by stashing this PR's changes and re-running on origin/main:

$ git stash && PYTHONPATH=scripts python3 -m pytest tests/ --ignore=tests/test_integration.py --tb=no -q
...
# same 37 failures, 801 passed, 14 skipped
$ git stash pop

diff <(baseline FAILED list) <(after-change FAILED list) is empty — zero regressions. The +3 in passed count (801 -> 804) is exactly the 3 new TestAutoSetupFallbackCap tests.

Notes

Did not update skill.json version (no new command/engine added — bug fix only, per CONTRIBUTING.md "Update skill.json version if adding new commands").
Did not update CHANGELOG.md (top-level CHANGELOG.md is for release notes; the maintainers' release process bundles changes per release).
PAT used to push was temporary and is not embedded in any committed file.

Issue #34: _auto_setup's subprocess scan passes --max-files 3000 on the CLI, but commands/scan.add_args did not register --max-files, so argparse rejected it (exit 2) every time. The fallback cmd_scan(workspace, incremental=False) was therefore ALWAYS taken — with no cap and no timeout — while the result hint still claimed 'Auto-setup capped at 3000 files'. On huge repos this could hang indefinitely. Fix: - Add max_files: Optional[int] param to cmd_scan + _cap_discovered_files helper that truncates per-category file lists so total <= max_files. - Register --max-files in commands/scan.add_args so the subprocess path actually works (previously dead code). - Rework _auto_setup so the in-process fallback calls cmd_scan(..., max_files=_AUTO_SETUP_MAX_FILES) — same cap as the subprocess path. - Surface capped and fallback flags on result['_auto_setup'] so MCP clients/agents can tell which path produced the registry and whether the cap was hit (explicitly requested in issue #34). Tests (tests/test_cli.py::TestAutoSetupFallbackCap): - test_fallback_passes_max_files_cap: monkeypatch subprocess.run to raise, spy on cmd_scan, assert max_files=3000 is passed. - test_fallback_sets_capped_and_fallback_flags: 3001-file workspace + forced fallback, drive full codelens.main() flow, assert result['_auto_setup']['capped'] is True and ['fallback'] is True. - test_main_path_no_fallback_when_subprocess_succeeds: sanity guard that the main path still works (fallback=False, capped=False for a small workspace) and the flags are always present in the schema.

chatgpt-codex-connector · 2026-06-29T15:48:25Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

# Conflicts: # tests/test_cli.py

sonarqubecloud · 2026-06-29T17:13:53Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Wolfvin merged commit d371f55 into main Jun 29, 2026
0 of 6 checks passed

Wolfvin deleted the fix/auto-setup-fallback-cap-34 branch June 29, 2026 17:13

Merge remote-tracking branch 'origin/main' into fix/86-merge2

56e396b

# Conflicts: # tests/test_cli.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enforce 3000-file cap on _auto_setup fallback path (#34)#86

fix: enforce 3000-file cap on _auto_setup fallback path (#34)#86
Wolfvin merged 2 commits into
mainfrom
fix/auto-setup-fallback-cap-34

Wolfvin commented Jun 29, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 29, 2026

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Wolfvin commented Jun 29, 2026

Summary

Fix

Files changed

Constraints satisfied

Definition of Done

1. Regression test: subprocess fails -> fallback calls cmd_scan with max_files=3000

2. Test verifies result["_auto_setup"] contains capped=True and fallback=True when fallback is taken

3. Full test suite

New tests (verbose)

Full tests/test_cli.py (verbose)

Full suite summary (excluding tests/test_integration.py which requires a live network/grammar setup)

Notes

Uh oh!

chatgpt-codex-connector Bot commented Jun 29, 2026

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 29, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Regression test: subprocess fails -> fallback calls `cmd_scan` with `max_files=3000`

2. Test verifies `result["_auto_setup"]` contains `capped=True` and `fallback=True` when fallback is taken

Full `tests/test_cli.py` (verbose)

Full suite summary (excluding `tests/test_integration.py` which requires a live network/grammar setup)