Skip to content

fix: use deep rglob search for module path discovery#119

Merged
double16 merged 14 commits intobvcyber:developfrom
lalithkaamala:feature/deep-module-path-search
Mar 5, 2026
Merged

fix: use deep rglob search for module path discovery#119
double16 merged 14 commits intobvcyber:developfrom
lalithkaamala:feature/deep-module-path-search

Conversation

@lalithkaamala
Copy link
Copy Markdown
Contributor

Closes #100

Problem

Module path search used a shallow lookup base / module_name, which prevented modules from being discovered when stored inside sub-directories — for example, cloning a repo of multiple modules into external_plugins/:

external_plugins/
  my-collection/
    web/          ← this was invisible to the old search
      module.yaml
      tools/

Solution

All three path-resolution methods in ModulePromptLoader now perform a deep search using Path.rglob() for **/module_name/module.yaml or **/module_name/module.yml, exactly as proposed in the issue. A shallow fallback is retained for backwards compatibility with modules that lack a module.yaml manifest.

Affected methods:

  • _find_module_dir
  • _find_prompt_path
  • _find_tools_dir

Testing Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Shallow-path modules (without module.yaml) continue to work via fallback
  • Nested module structure (e.g. external_plugins/collection/web/) is now discovered correctly

double16 and others added 9 commits January 9, 2026 19:34
Tool ID fixes for Ollama and Gemini
Context size improvments
- Estimate tokens for system prompt and tools instead of using constants
- Rename 'general' module to 'web'
- swarm tool allows model selection using selected provider or ollama
- Allow modules to specify which built-in tools to use
- Refactor XBOW benchmark script to python
- add back erroneously removed `python_repl` and `sleep` tools
- fix incorrect model parameters (i.e., max output tokens) when swarm model == main model
- validate swarm agent model and fall back to primary model
- fix broken tool calling (ollama, gemini) in report, validation_specialist agents
- relax prompt optimizer validation for line count increase
- minor efficiency updates
Prompt budget consider output tokens (bvcyber#62)
… guidance, improved reporting.

- dependency updates
- add web_recon module for reconnaissance without exploitation
- make reporting work with only observations for non-exploitation use cases
- reporting uses all findings when MEMORY_ISOLATION=shared
- increase PROMPT_TELEMETRY_THRESHOLD to more reasonable value of 85% to allow for more input context
- fix sliding conversation manager to preserve first messages: initial user prompt was getting lost
- improve handling of failure cases
- patch OllamaModel usage reporting: input and output tokens are swapped
- apply CYBER_AGENT_OUTPUT_DIR everywhere instead of hardcoded “outputs” directory
- set context window message limit based on prompt token limit: 100 lines default, 200 lines for >= 128,000, 300 lines for >= 400,000 
- use full paths with LLM content, some models prepend hallucinated filesystem roots
- add operation_paths information to system prompt to control LLM filesystem scope
- add reflection_snapshot information to system prompt (was already referenced by execution prompts)
- run execution prompt optimizer before system prompt rebuilding to load the optimized prompt in the same step
- improve agent continuation message with budget, check point and actions
- update bedrock models to global.anthropic.claude-opus-4-5-20251101-v1:0 / us.anthropic.claude-sonnet-4-5-20250929-v1:0
Module features and misc. fixes
**NOTE:** Requires rebuilding the cyber-autoagent-tools image

- fix Dockerfile.tools build, tool check was not working, so several tools were not working
- Rewrite advanced_payload_coordinator.py using dalfox, sstimap and commix, optimize for model usage
- Refactor auth_chain_analyzer.py and specialized_recon_coordinator.py for correctness and optimize for model usage
- Improve tool guidance in system prompt
- Change tool_catalog to include all tool information and help text from shell commands
- Token usage estimation is closer to reality
- Apply reasoning loop workaround to all agents
Module path search previously used a shallow lookup (base/module_name),
which prevented modules from being discovered when nested inside
sub-directories (e.g. external_plugins/collection/web/).

The three key methods now use Python's Path.rglob() to search for
**/module_name/module.yaml or **/module_name/module.yml, exactly as
proposed in the issue. A shallow fallback is retained for modules that
do not have a module.yaml manifest.

Affected methods in ModulePromptLoader:
- _find_module_dir
- _find_prompt_path
- _find_tools_dir

Closes bvcyber#100
@double16
Copy link
Copy Markdown
Contributor

Thanks! Can you update test_prompt_loader.py with test cases?

@double16 double16 changed the base branch from main to develop February 20, 2026 10:44
@lalithkaamala
Copy link
Copy Markdown
Contributor Author

I have added the requested test cases in tests/test_prompt_loader.py verifying both deep path search and shallow path search for manifest-less modules. Tests have been pushed to the branch.

@double16
Copy link
Copy Markdown
Contributor

Changes requested:

  1. The module manifest is required, so there is no "manifest-less' modules.
  2. I was looking for tests for the deep search, I don't see tests for _find_module_dir for this.
  3. The module dir search code is duplicated in all three functions. _find_prompt_path and _find_tools_dir should re-use _find_module_dir.

@double16
Copy link
Copy Markdown
Contributor

Thanks, those last changes look good!

The react UI /plugins command is not correctly listing modules nested in directories.

  1. Modify src/modules/interfaces/react/src/contexts/ModuleContext.tsx to apply the deep search algorithm. Remember that the manifest is required.

The user documentation needs updated to reflect the ability to nest modules in directories:
2. docs/operation_plugins.md

@lalithkaamala
Copy link
Copy Markdown
Contributor Author

I have implemented the deep search module path discovery algorithm in ModuleContext.tsx and explicitly removed the fallback, making module.yaml strictly required. The documentation in docs/operation_plugins.md has also been updated to reflect nested directory support.

@double16
Copy link
Copy Markdown
Contributor

double16 commented Mar 2, 2026

test_prompt_loader.py tests are failing because the existing tests did not all have a module.yaml file. Add a placeholder file to the failing tests.

@lalithkaamala
Copy link
Copy Markdown
Contributor Author

Added the missing module.yaml placeholder files to the failing tests in test_prompt_loader.py. Module discovery in Pytest should now pass successfully without the fallback.


web_dir = tmp_path / "operation_plugins" / "web"
web_dir.mkdir(parents=True)
web_dir.mkdir(parents=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indent.


ctf_dir = tmp_path / "operation_plugins" / "ctf"
ctf_dir.mkdir(parents=True)
ctf_dir.mkdir(parents=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indent.


ctf_dir = tmp_path / "operation_plugins" / "ctf"
ctf_dir.mkdir(parents=True)
ctf_dir.mkdir(parents=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indent.

@double16 double16 merged commit 3d754a2 into bvcyber:develop Mar 5, 2026
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Module path search should do a deep search

3 participants