Skip to content

fix(sieve): depth-limited file discovery via opt-in max_depth (closes #221)#259

Merged
mlieberman85 merged 1 commit into
kusari-oss:mainfrom
mlieberman85:fix/221-nested-manifest-discovery
May 14, 2026
Merged

fix(sieve): depth-limited file discovery via opt-in max_depth (closes #221)#259
mlieberman85 merged 1 commit into
kusari-oss:mainfrom
mlieberman85:fix/221-nested-manifest-discovery

Conversation

@mlieberman85
Copy link
Copy Markdown
Contributor

Summary

Closes #221. Projects with nested manifests — microservices under `app-code//`, `frontend/` + `backend/` splits, monorepos with distinct components — used to fail dependency-detection controls like OSPS-BR-05.01 even when standard manifests were clearly present one or two directories deep. The handler only checked the repo root.

What changed

Handler (`packages/darnit/src/darnit/sieve/builtin_handlers.py`)

New opt-in config field on `file_exists` handler:

```toml
[[controls."OSPS-BR-05.01".passes]]
handler = "file_exists"
max_depth = 2 # NEW — walks up to N levels deep for non-glob patterns
files = ["go.mod", "pyproject.toml", "package.json", ...]
```

  • Default `max_depth = 0` preserves backward compatibility for every other control that doesn't opt in.
  • Glob patterns are unchanged — they're still evaluated by `glob.glob` exactly as before. `max_depth` applies only to non-glob patterns.
  • Noise-directory pruning during the walk: `.git`, `node_modules`, `pycache`, `.venv`, `venv`, `.tox`, `.mypy_cache`, `.ruff_cache`, `.pytest_cache`, `target`, `build`, `dist`, `out`, `.idea`, `.vscode`, etc. A misconfigured deep walk through a 50k-file `node_modules` tree was the biggest "make this the default" risk; explicit opt-in + pruning sidesteps that.
  • First match wins; root match is naturally preferred over deeper matches because `os.walk` yields breadth-first by depth.

TOML (`packages/darnit-baseline/openssf-baseline.toml`)

OSPS-BR-05.01 (`StandardizedDependencyTools`) opts in with `max_depth = 2`. No other control's behavior changes.

Issue #226 still open

#226 ("Evaluate making depth-limited file discovery the default in sieve handlers") remains a separate discussion. This PR is deliberately the smallest change that fixes the user-reported bug: opt-in only, conservative default. Flipping the global default would change behavior for every control in every implementation and deserves its own evaluation.

Test plan

7 new regression tests in `TestFileExistsHandlerDepthLimited`:

  • `test_default_max_depth_root_only` — without `max_depth`, nested files are still missed (regression: this is the bug from Bug: Dependency detection (e.g., go.mod, pyproject.toml) fails if not at absolute repo root #221)
  • `test_max_depth_2_finds_nested_manifest` — `app-code/keys-v2/go.mod` is found with `max_depth = 2`
  • `test_max_depth_finds_root_first` — root match preferred over nested when both exist
  • `test_max_depth_skips_noise_directories` — `node_modules/some-pkg/package.json` is pruned; real `app/package.json` wins
  • `test_max_depth_respects_limit` — file 3 levels deep is NOT found with `max_depth = 2`
  • `test_max_depth_zero_explicit_is_root_only` — explicit `= 0` equivalent to default
  • `test_max_depth_with_glob_pattern_unchanged` — globs are not depth-walked

Verification:

  • `uv run ruff check .` — clean
  • `uv run pytest tests/ --ignore=tests/integration/ -q` — 2115 passed (7 new) / 6 skipped / 0 failed
  • `uv run python scripts/validate_sync.py --verbose` — PASS

🤖 Generated with Claude Code

…usari-oss#221)

The file_exists handler used to check only the repo root for non-glob
patterns. Projects with nested manifests — microservices under
app-code/<service>/, frontend/ + backend/ splits, monorepos with
distinct components — failed dependency-detection controls like
OSPS-BR-05.01 even when standard manifests were clearly present, just
one or two directories deep.

Fix introduces an opt-in `max_depth: int = 0` config field on the
file_exists handler. When > 0, the handler walks up to that many
levels deep checking each non-glob pattern. Glob patterns are
unchanged. Default 0 preserves backward compatibility for every other
control that does not opt in.

The walk prunes well-known noise directories (.git, node_modules,
__pycache__, .venv, target, build, dist, .tox, .mypy_cache, etc.) so
performance stays bounded on monorepos. A misconfigured deep walk
through a 50k-file node_modules tree was the biggest "make this the
default" risk; explicit opt-in + pruning sidesteps that. Issue kusari-oss#226
(make depth-limited the default) remains a separate discussion.

TOML side: OSPS-BR-05.01 (StandardizedDependencyTools) opts in with
max_depth = 2, finding manifests in conventional nested layouts
without descending into unrelated subtrees.

Adds 7 regression tests in TestFileExistsHandlerDepthLimited covering:
default-still-root-only (bug regression), depth-2-finds-nested,
root-preferred-over-nested, noise-directory-pruning, depth-respects-
limit, explicit-zero-equivalent-to-default, globs-still-unaffected.

Verification:
- ruff check: clean
- full suite: 2115 passed (7 new) / 6 skipped / 0 failed
- validate_sync.py: PASS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mlieberman85 mlieberman85 merged commit 2e2d55b into kusari-oss:main May 14, 2026
9 checks passed
@mlieberman85 mlieberman85 deleted the fix/221-nested-manifest-discovery branch May 14, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Dependency detection (e.g., go.mod, pyproject.toml) fails if not at absolute repo root

1 participant