Skip to content

Agent Loop Implementation + Smart Retrieval + Attachment Uploading#4

Open
sweettastebuds wants to merge 39 commits intorelease/v2.0.0from
feature/agent-loop
Open

Agent Loop Implementation + Smart Retrieval + Attachment Uploading#4
sweettastebuds wants to merge 39 commits intorelease/v2.0.0from
feature/agent-loop

Conversation

@sweettastebuds
Copy link
Owner

This pull request updates the documentation and configuration to reflect a major refactor and feature expansion of the forge-bot project, transitioning from a simple webhook handler with sandboxed code execution to a sophisticated AI bot platform supporting tool-calling, per-event workspace containers, YAML-driven API clients, and advanced verification systems. The .env.example and CLAUDE.md files are updated for clarity and accuracy, and a new Dockerfile.workspace is added for workspace container support. The changes also introduce improved test structure documentation, environment variable descriptions, and update legacy references to match the new architecture.

Documentation and Architecture Updates:

  • Expanded CLAUDE.md to describe new features: tool-calling loop, per-event workspace containers, YAML-driven API client, tool system, status comment manager, verification system, and container lifecycle. Added details on new modules, critical implementation details, and command patterns. [1] [2]
  • Added a "Test Structure" section to CLAUDE.md, listing key test files and their purposes, and clarified the style guide for code and tests.

Configuration and Environment Variables:

  • Updated .env.example to include new variables for containerized workspace support (CONTAINER_*), clarified LLM and provider settings, and removed deprecated sandbox options.
  • Improved documentation of optional environment variables in CLAUDE.md with a table of defaults and descriptions for new and existing settings.

Containerization Support:

  • Added Dockerfile.workspace to define the workspace container image used for per-event isolation, supporting repo cloning and tool execution.
  • Updated build instructions in CLAUDE.md to include workspace image creation.

Deprecation and Refactoring:

  • Deprecated legacy sandbox subsystem in favor of new container manager, updating documentation and removing outdated references.

Minor Cleanup:

  • Removed unnecessary copy of the prompts/ directory in the main Dockerfile (likely moved to a new location per refactor).

sweettastebuds and others added 30 commits February 15, 2026 23:26
Add a generic API client that loads endpoint definitions from YAML files
instead of hardcoding one Python method per Gitea/Forgejo API call.

- api/schema.py: Pydantic models for YAML parsing (EndpointDef, EndpointParam)
- api/loader.py: YAML file loader with caching, provider-based file selection
- api/client.py: GenericForgeClient with call(), search(), list_endpoints()
- api/definitions/gitea.yaml: 20 Gitea API endpoint definitions
- api/definitions/forgejo.yaml: Separate Forgejo definitions (initially identical)
- config.py: Add forge_provider setting (gitea/forgejo)
- requirements.txt: Add pyyaml dependency

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
39 tests covering schema validation, YAML loader caching/provider
selection, and GenericForgeClient (call, search, error handling).

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Replace fire-and-forget sandbox with persistent containers that live for
the entire webhook event. The container clones the repo on creation and
supports arbitrary command execution via exec().

- container/manager.py: ContainerManager (async context manager)
  - create(): clone repo, wait for ready marker
  - exec(): run commands with timeout and output truncation
  - destroy(): force-remove container
- Dockerfile.workspace: python:3.12-slim + git/curl/bash/jq/pytest
- config.py: add container_workspace_image, container_network_enabled;
  remove sandbox_prepull_images, sandbox_images_file
- 16 tests covering lifecycle, exec, token injection, network modes

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Status comment posted immediately and edited as the bot works, showing
todo list with checkmarks and collapsible tool call history. Response
comment posted separately when done.

- status/formatter.py: TodoItem, ToolCallRecord, markdown rendering
- status/manager.py: StatusCommentManager (post, edit, finalize)
- 23 tests covering formatting and manager lifecycle

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- tools/base.py: BaseTool, ToolResult, ToolParameter (OpenAI + prompt modes)
- tools/registry.py: ToolRegistry for per-request tool management
- tools/search_api.py: Search API definitions by keyword
- tools/api_call.py: Execute any YAML-defined endpoint with auto owner/repo
- tools/exec_tool.py: Run commands in workspace container with safety blocks
- tools/todo.py: Manage visible todo list in status comment
- 31 tests covering all tools, registry, and edge cases

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Rewrite IssueCommentHandler with LLM-driven context gathering via tool
loop (search_api, api_call, exec, todo). Add verification module with
relevance checking, hallucination detection, progress/loop tracking,
and pre-post response validation with retry-then-warn strategy.

Update PullRequestHandler to use GenericForgeClient.call() instead of
old ForgeClient methods. Add chat_with_tools to LLMClient for native
OpenAI tool calling support. Create Jinja2 prompt templates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ForgeClient with GenericForgeClient in server.py startup
(bot identity via api.call("get_authenticated_user")), remove old
sandbox pre-pull logic. Simplify router.py by removing /run and
/index command dispatch (now handled by LLM tools). Update server
tests to mock the new client.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prompts moved to forge_bot/prompts/ in the v2 rewrite and are already
included by COPY forge_bot/ — the old COPY prompts/ line caused build
failures since the top-level prompts/ directory no longer exists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add FORGE_PROVIDER, CONTAINER_WORKSPACE_IMAGE, CONTAINER_NETWORK_ENABLED,
  LLM_CONTEXT_WINDOW, LLM_TIMEOUT environment variables
- Fix DOCKER_CERT_PATH to /certs/client/client (DinD nests certs)
- Add DOCKER_TLS_SAN=DNS:dind so DinD generates valid TLS certs
- Remove old SANDBOX_PREPULL_IMAGES and embedded ollama service
- Update .env.example to match v2 config vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a model returns tool calls as ```tool blocks in content (with
finish_reason=stop) instead of native tool_calls, the prompt-mode
parser used str(response) which serialized the Pydantic ChatCompletion
object repr instead of extracting message.content. The regex could
never match tool blocks in the repr string.

Fix: extract response.choices[0].message.content for prompt-mode parsing.

Also improves logging:
- Add logger.exception to previously silent except in tool loop fallback
- Add per-round logging for tool executions and text responses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
git clone --depth=50 implicitly uses --single-branch, which only
fetches the default branch. When the LLM tries to checkout other
branches (e.g. v2/rewrite) they don't exist in the clone.

Add --no-single-branch to ensure all remote branch tips are fetched
even in shallow mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove hardcoded git clone from container init. The container now
starts with an empty /workspace and the LLM decides how to clone
via the exec tool, choosing depth and flags based on the task:
- shallow clone for quick lookups
- --no-single-branch for branch comparisons
- full clone for history analysis

This eliminates clone-related bugs (like the --single-branch issue)
and lets the bot skip cloning entirely for API-only questions.

Changes:
- container/manager.py: simplified init, added clone_url/default_branch
  properties
- prompts/issue_respond.j2: workspace section with clone instructions
- handlers/issue_comment.py: pass clone info through to template
- tests: added property tests, no-clone-in-init test, updated signatures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- updated embedder to async def
- file fetching is parallelized in ingester
- added dynamic token budgeting to the retriever
- integrated rag into the PullRequestHandler
… adjust related tests. Fixed the container manager failing on initialization.
…rieval), Level 2 (parallel chunk relevance scanning), and Level 3 (multi-hop reasoning agent). Introduce SmartRetriever orchestrator for dynamic retrieval based on query complexity and input size. Add TokenBudget management for efficient token usage across retrieval operations. Create RetrievalTool for self-contained codebase search, integrating the retrieval hierarchy with workspace container.
- Introduced tests for BM25Index and tokenization in test_bm25.py.
- Added chunking tests covering basic functionality, overlaps, and empty inputs in test_chunking.py.
- Implemented Level 1 retriever tests to validate chunk retrieval and budget management in test_level1.py.
- Created Level 2 scanner tests to ensure relevance filtering and answer generation in test_level2.py.
- Developed Level 3 agent tests to verify multi-hop retrieval and budget handling in test_level3.py.
- Added pipeline tests for SmartRetriever to check search and scan functionalities in test_pipeline.py.
- Included token budget tests to validate budget calculations and token estimations in test_token_budget.py.
- Implemented tool tests for RetrievalTool to ensure proper file searching and keyword extraction in test_tool.py.
…he agent loop integration and error handling
- Consolidated multi-line strings in `conftest.py` for cleaner code.
- Adjusted argument formatting in `test_embedder.py` for consistency.
- Simplified list comprehensions in `test_ingester.py` for better clarity.
- Enhanced readability by breaking long lines into multiple lines in `test_pipeline.py`.
- Removed unnecessary blank lines in `test_router.py`.
- Improved context management in `test_server.py` for better structure.
- Streamlined assertions and formatting in `test_status.py` for clarity.
- Deleted redundant `test_tools.py` and `test_verification.py` files to reduce clutter.
Copilot AI review requested due to automatic review settings March 3, 2026 07:04
@sweettastebuds sweettastebuds changed the base branch from master to release/v2.0.0 March 3, 2026 07:05
@sweettastebuds sweettastebuds self-assigned this Mar 3, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades forge-bot from a simple webhook + sandbox approach to a more capable platform: a YAML-driven Forge API client, per-event workspace containers, tool-calling agent loop, and a new “status + response” comment system, with a large accompanying test suite refactor/additions.

Changes:

  • Replace the hardcoded Forge client with a YAML-defined GenericForgeClient and new API schema/loader.
  • Introduce per-event persistent ContainerManager and status comment observability (StatusCommentManager + formatter).
  • Add a multi-level “smart retrieval” subsystem (BM25 + parallel scanning + multi-hop) and extensive new tests; remove legacy sandbox implementation/tests.

Reviewed changes

Copilot reviewed 81 out of 83 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_status.py Adds unit tests for status comment formatting + manager behavior.
tests/test_server.py Updates server lifecycle mocking to patch GenericForgeClient.call() instead of legacy client.
tests/test_sandbox/test_parser.py Removes legacy sandbox parser tests.
tests/test_sandbox/test_orchestrator.py Removes legacy sandbox orchestrator tests.
tests/test_sandbox/test_images.py Removes legacy sandbox image registry tests.
tests/test_sandbox/conftest.py Removes sandbox-specific shared fixtures.
tests/test_router.py Minor cleanup in router tests file formatting.
tests/test_retrieval/test_tool.py Adds tests for the new workspace-backed retrieval tool behavior.
tests/test_retrieval/test_token_budget.py Adds tests for retrieval token budgeting helpers.
tests/test_retrieval/test_pipeline.py Adds tests for the SmartRetriever orchestration logic.
tests/test_retrieval/test_level3.py Adds tests for multi-hop Level 3 agent tool-calling flow.
tests/test_retrieval/test_level2.py Adds tests for Level 2 parallel scanning + answering behavior.
tests/test_retrieval/test_level1.py Adds tests for Level 1 BM25-based retrieval behavior.
tests/test_retrieval/test_chunking.py Adds tests for retrieval chunking utilities.
tests/test_retrieval/test_bm25.py Adds tests for the pure-Python BM25 implementation.
tests/test_retrieval/init.py Adds retrieval test package marker.
tests/test_rag/test_pipeline.py Updates RAG pipeline tests to use the generic API client mock.
tests/test_rag/test_ingester.py Updates ingester tests to assert against GenericForgeClient.call() dispatch.
tests/test_rag/test_embedder.py Small formatting update to embedder test expectations.
tests/test_rag/conftest.py Replaces mock_forge fixture with mock_api_client using .call() dispatch.
tests/test_pull_request_handler.py Rewrites PR handler tests for new agent loop + status + container lifecycle.
tests/test_forge_client.py Removes legacy hardcoded Forge client tests.
tests/test_container_manager.py Adds tests for the new per-event container manager.
tests/test_config.py Updates config tests for container settings replacing sandbox settings.
tests/test_api_schema.py Adds tests for Pydantic models backing YAML API definitions.
tests/test_api_loader.py Adds tests for YAML loader caching + provider resolution.
tests/test_api_client.py Adds tests for GenericForgeClient call/param handling + error propagation.
sandbox-images.json Removes legacy sandbox image override file.
requirements.txt Adds PyYAML dependency for API definition loading.
pyproject.toml Bumps project version to 0.2.0.
forge_bot/utils/token_budget.py Adds a (separate) token budgeting utility for LLM sessions.
forge_bot/utils/thinking.py Adds helper to strip/log <think>...</think> blocks.
forge_bot/tools/base.py Introduces base classes for tool-calling (schema + prompt text).
forge_bot/tools/init.py Adds tools package marker.
forge_bot/status/manager.py Implements status + response comment manager with attachment support.
forge_bot/status/formatter.py Adds markdown formatting helpers for todos and tool call history.
forge_bot/status/init.py Adds status package marker.
forge_bot/server.py Switches server lifecycle to generic API client; adds error-comment attempt on failures; bumps API wiring.
forge_bot/sandbox/parser.py Removes legacy sandbox command parsing.
forge_bot/sandbox/orchestrator.py Removes legacy sandbox orchestrator.
forge_bot/sandbox/images.py Removes legacy sandbox image registry.
forge_bot/sandbox/init.py Removes sandbox package marker.
forge_bot/router.py Updates dispatch plumbing to use GenericForgeClient; removes legacy /run and /index paths.
forge_bot/retrieval/token_budget.py Adds retrieval-specific token budgeting and truncation helpers.
forge_bot/retrieval/pipeline.py Adds SmartRetriever facade and diff review routing.
forge_bot/retrieval/level3.py Adds Level 3 multi-hop agent that calls Level 2 as a tool.
forge_bot/retrieval/level2.py Adds Level 2 parallel chunk relevance scanning + refinement + answer generation.
forge_bot/retrieval/level1.py Adds Level 1 BM25 retrieval with LLM keyword expansion.
forge_bot/retrieval/chunking.py Adds chunking utilities for text and per-file diffs.
forge_bot/retrieval/bm25.py Adds pure-Python BM25 implementation.
forge_bot/retrieval/init.py Adds retrieval package marker.
forge_bot/rag/store.py Formatting-only changes in vector store operations.
forge_bot/rag/retriever.py Adds optional max token truncation control for PR context retrieval.
forge_bot/rag/pipeline.py Updates RAG pipeline to use GenericForgeClient.
forge_bot/rag/ingester.py Refactors ingester to use .call() endpoints + adds concurrent file fetching.
forge_bot/rag/embedder.py Makes local embedding execution async via asyncio.to_thread.
forge_bot/rag/chunker.py Formatting updates / signature formatting for readability.
forge_bot/prompts/agent_system.j2 Adds new agent system prompt template for tool loop usage.
forge_bot/prompts/agent_pr_review.j2 Adds new PR review prompt template for the agent loop.
forge_bot/handlers/pull_request.py Replaces old diff-fetch + single LLM call with container + agent loop + status updates + artifacts.
forge_bot/handlers/base.py Switches handler base to generic API client and stable prompts directory resolution.
forge_bot/container/manager.py Adds the per-event persistent container manager (create/exec/destroy).
forge_bot/container/init.py Adds container package marker.
forge_bot/config.py Replaces sandbox settings with container settings + adds FORGE_PROVIDER.
forge_bot/clients/llm.py Adds chat_with_tools() for function-calling flows.
forge_bot/clients/forge.py Removes legacy Forge client implementation.
forge_bot/api/schema.py Adds Pydantic schema models for YAML endpoint definitions.
forge_bot/api/loader.py Adds YAML loader with caching and provider resolution.
forge_bot/api/client.py Adds GenericForgeClient implementation and multipart handling.
forge_bot/api/init.py Adds API package marker.
docker-compose.yml Updates Compose config for .env loading and DinD TLS wiring.
README.md Updates docs to reflect new architecture (agent loop, containers, YAML API client).
Dockerfile.workspace Adds workspace image Dockerfile for per-event containers.
Dockerfile Removes legacy prompts/ copy step (prompts are now under forge_bot/prompts).
.env.example Updates env var docs for new provider/container settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +174 to +179
error_body = (
"⚠️ **Processing Error**\n\n"
f"Failed to process this event due to an internal error:\n"
f"```\n{type(e).__name__}: {str(e)}\n```\n\n"
f"Please check the bot logs or contact your administrator."
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error comment body includes the raw exception message (str(e)). This can leak internal details (and potentially sensitive info) into a public issue/PR thread. Consider logging the full exception server-side but posting a generic error message (or a redacted/hashed error ID) to the forge.

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +48
name: str = ""
description: str = ""
parameters: list[ToolParameter] = field(default_factory=list)

@abstractmethod
async def execute(self, **kwargs: object) -> ToolResult:
"""Execute the tool with the given arguments."""

def to_openai_schema(self) -> dict:
"""Convert to OpenAI function-calling tools format."""
properties: dict[str, dict[str, str]] = {}
required: list[str] = []
for p in self.parameters:
properties[p.name] = {
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseTool.parameters is initialized with dataclasses.field(...), but BaseTool is not a @DataClass. That means parameters becomes a dataclasses.Field object, and iterating over self.parameters in to_openai_schema()/to_prompt_text() will fail unless every subclass overrides it. Use a normal class attribute (e.g., [] or ClassVar[list[ToolParameter]]) or make BaseTool a dataclass if instance-level parameters are intended.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +44
"printf '%s\\n' '#!/bin/sh' "
"'METHOD=\"${1:-GET}\"' "
"'ENDPOINT=\"$2\"' "
"'shift 2 2>/dev/null' "
"'exec curl -sf "
'-H "Authorization: token $FORGE_TOKEN" '
'-H "Content-Type: application/json" '
'-X "$METHOD" "$FORGE_URL/api/v1$ENDPOINT" "$@"\' '
"> /usr/local/bin/forge-api && "
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_INIT_SCRIPT's construction of /usr/local/bin/forge-api via a single giant printf string has mismatched quoting (the final argument includes a stray "'"), which is likely to produce an invalid shell script and prevent the container from ever emitting FORGE_READY. Consider switching to a heredoc (cat <<'EOF' ... EOF) or writing the script as a small embedded multi-line string with clear quoting.

Suggested change
"printf '%s\\n' '#!/bin/sh' "
"'METHOD=\"${1:-GET}\"' "
"'ENDPOINT=\"$2\"' "
"'shift 2 2>/dev/null' "
"'exec curl -sf "
'-H "Authorization: token $FORGE_TOKEN" '
'-H "Content-Type: application/json" '
'-X "$METHOD" "$FORGE_URL/api/v1$ENDPOINT" "$@"\' '
"> /usr/local/bin/forge-api && "
"cat <<'EOF' > /usr/local/bin/forge-api\n"
"#!/bin/sh\n"
'METHOD="${1:-GET}"\n'
'ENDPOINT="$2"\n'
"shift 2 2>/dev/null\n"
'exec curl -sf \\\n'
' -H "Authorization: token $FORGE_TOKEN" \\\n'
' -H "Content-Type: application/json" \\\n'
' -X "$METHOD" "$FORGE_URL/api/v1$ENDPOINT" "$@"\n'
"EOF\n"

Copilot uses AI. Check for mistakes.
- SANDBOX_PREPULL_IMAGES=${SANDBOX_PREPULL_IMAGES:-python,node}
- RAG_ENABLED=${RAG_ENABLED:-false}
- LOG_LEVEL=${LOG_LEVEL:-INFO}
- DOCKER_CERT_PATH=/certs/client/client
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOCKER_CERT_PATH is set to /certs/client/client, but the volume is mounted at /certs/client. With Docker-in-Docker TLS this path typically needs to point at the directory containing ca.pem/cert.pem/key.pem (i.e., /certs/client). As-is, the bot container may fail to connect to the dind daemon.

Suggested change
- DOCKER_CERT_PATH=/certs/client/client
- DOCKER_CERT_PATH=/certs/client

Copilot uses AI. Check for mistakes.
Comment on lines +181 to +187
await api_client.call(
"create_issue_comment",
owner=owner,
repo=repo_name,
index=issue_number,
body=error_body,
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the error handler, the API endpoint name "create_issue_comment" isn't defined in the YAML API definitions (they use "post_issue_comment"). This will cause the attempt to notify users about failures to always error. Rename the endpoint to the correct defined name (and keep params aligned with the YAML schema).

Copilot uses AI. Check for mistakes.
Comment on lines +110 to +118
repo_data = payload.get("repository", {})
if not repo_data:
return None

owner = repo_data.get("owner", {}).get("login")
repo = repo_data.get("name")

if not owner or not repo:
return None
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_extract_comment_target() requires repository.owner.login and repository.name, but the webhook payloads used elsewhere (and in tests) only provide repository.full_name. As a result, this function will return None and the bot won't post an error comment when it crashes. Consider falling back to parsing repository.full_name ("owner/repo") when owner/name fields are absent.

Copilot uses AI. Check for mistakes.
Copilot AI added a commit that referenced this pull request Mar 3, 2026
- Fix path injection in collect_artifacts: use shlex.quote() for filenames
- Add bounds check for empty response.choices across LLM call sites
- Validate full_name format before split in both handlers
- Use configurable container_timeout instead of hardcoded 120s cap
- Add tests for all fixes

Co-authored-by: sweettastebuds <49539676+sweettastebuds@users.noreply.github.com>
Copilot AI and others added 7 commits March 4, 2026 19:45
…instructions

Co-authored-by: sweettastebuds <49539676+sweettastebuds@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…w-instructions

Add .github/copilot-instructions.md for repo-wide Copilot PR review context
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants