Agent Loop Implementation + Smart Retrieval + Attachment Uploading#4
Agent Loop Implementation + Smart Retrieval + Attachment Uploading#4sweettastebuds wants to merge 39 commits intorelease/v2.0.0from
Conversation
Add a generic API client that loads endpoint definitions from YAML files instead of hardcoding one Python method per Gitea/Forgejo API call. - api/schema.py: Pydantic models for YAML parsing (EndpointDef, EndpointParam) - api/loader.py: YAML file loader with caching, provider-based file selection - api/client.py: GenericForgeClient with call(), search(), list_endpoints() - api/definitions/gitea.yaml: 20 Gitea API endpoint definitions - api/definitions/forgejo.yaml: Separate Forgejo definitions (initially identical) - config.py: Add forge_provider setting (gitea/forgejo) - requirements.txt: Add pyyaml dependency Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
39 tests covering schema validation, YAML loader caching/provider selection, and GenericForgeClient (call, search, error handling). Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Replace fire-and-forget sandbox with persistent containers that live for the entire webhook event. The container clones the repo on creation and supports arbitrary command execution via exec(). - container/manager.py: ContainerManager (async context manager) - create(): clone repo, wait for ready marker - exec(): run commands with timeout and output truncation - destroy(): force-remove container - Dockerfile.workspace: python:3.12-slim + git/curl/bash/jq/pytest - config.py: add container_workspace_image, container_network_enabled; remove sandbox_prepull_images, sandbox_images_file - 16 tests covering lifecycle, exec, token injection, network modes Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Status comment posted immediately and edited as the bot works, showing todo list with checkmarks and collapsible tool call history. Response comment posted separately when done. - status/formatter.py: TodoItem, ToolCallRecord, markdown rendering - status/manager.py: StatusCommentManager (post, edit, finalize) - 23 tests covering formatting and manager lifecycle Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
- tools/base.py: BaseTool, ToolResult, ToolParameter (OpenAI + prompt modes) - tools/registry.py: ToolRegistry for per-request tool management - tools/search_api.py: Search API definitions by keyword - tools/api_call.py: Execute any YAML-defined endpoint with auto owner/repo - tools/exec_tool.py: Run commands in workspace container with safety blocks - tools/todo.py: Manage visible todo list in status comment - 31 tests covering all tools, registry, and edge cases Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
Rewrite IssueCommentHandler with LLM-driven context gathering via tool loop (search_api, api_call, exec, todo). Add verification module with relevance checking, hallucination detection, progress/loop tracking, and pre-post response validation with retry-then-warn strategy. Update PullRequestHandler to use GenericForgeClient.call() instead of old ForgeClient methods. Add chat_with_tools to LLMClient for native OpenAI tool calling support. Create Jinja2 prompt templates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ForgeClient with GenericForgeClient in server.py startup
(bot identity via api.call("get_authenticated_user")), remove old
sandbox pre-pull logic. Simplify router.py by removing /run and
/index command dispatch (now handled by LLM tools). Update server
tests to mock the new client.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prompts moved to forge_bot/prompts/ in the v2 rewrite and are already included by COPY forge_bot/ — the old COPY prompts/ line caused build failures since the top-level prompts/ directory no longer exists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add FORGE_PROVIDER, CONTAINER_WORKSPACE_IMAGE, CONTAINER_NETWORK_ENABLED, LLM_CONTEXT_WINDOW, LLM_TIMEOUT environment variables - Fix DOCKER_CERT_PATH to /certs/client/client (DinD nests certs) - Add DOCKER_TLS_SAN=DNS:dind so DinD generates valid TLS certs - Remove old SANDBOX_PREPULL_IMAGES and embedded ollama service - Update .env.example to match v2 config vars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a model returns tool calls as ```tool blocks in content (with finish_reason=stop) instead of native tool_calls, the prompt-mode parser used str(response) which serialized the Pydantic ChatCompletion object repr instead of extracting message.content. The regex could never match tool blocks in the repr string. Fix: extract response.choices[0].message.content for prompt-mode parsing. Also improves logging: - Add logger.exception to previously silent except in tool loop fallback - Add per-round logging for tool executions and text responses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
git clone --depth=50 implicitly uses --single-branch, which only fetches the default branch. When the LLM tries to checkout other branches (e.g. v2/rewrite) they don't exist in the clone. Add --no-single-branch to ensure all remote branch tips are fetched even in shallow mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove hardcoded git clone from container init. The container now starts with an empty /workspace and the LLM decides how to clone via the exec tool, choosing depth and flags based on the task: - shallow clone for quick lookups - --no-single-branch for branch comparisons - full clone for history analysis This eliminates clone-related bugs (like the --single-branch issue) and lets the bot skip cloning entirely for API-only questions. Changes: - container/manager.py: simplified init, added clone_url/default_branch properties - prompts/issue_respond.j2: workspace section with clone instructions - handlers/issue_comment.py: pass clone info through to template - tests: added property tests, no-clone-in-init test, updated signatures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… GenericForgeClient
…osting on failed bot identity resolution
- updated embedder to async def - file fetching is parallelized in ingester - added dynamic token budgeting to the retriever - integrated rag into the PullRequestHandler
… adjust related tests. Fixed the container manager failing on initialization.
…rieval), Level 2 (parallel chunk relevance scanning), and Level 3 (multi-hop reasoning agent). Introduce SmartRetriever orchestrator for dynamic retrieval based on query complexity and input size. Add TokenBudget management for efficient token usage across retrieval operations. Create RetrievalTool for self-contained codebase search, integrating the retrieval hierarchy with workspace container.
- Introduced tests for BM25Index and tokenization in test_bm25.py. - Added chunking tests covering basic functionality, overlaps, and empty inputs in test_chunking.py. - Implemented Level 1 retriever tests to validate chunk retrieval and budget management in test_level1.py. - Created Level 2 scanner tests to ensure relevance filtering and answer generation in test_level2.py. - Developed Level 3 agent tests to verify multi-hop retrieval and budget handling in test_level3.py. - Added pipeline tests for SmartRetriever to check search and scan functionalities in test_pipeline.py. - Included token budget tests to validate budget calculations and token estimations in test_token_budget.py. - Implemented tool tests for RetrievalTool to ensure proper file searching and keyword extraction in test_tool.py.
…he agent loop integration and error handling
…moving outdated ones
…ed readability and maintainability
- Consolidated multi-line strings in `conftest.py` for cleaner code. - Adjusted argument formatting in `test_embedder.py` for consistency. - Simplified list comprehensions in `test_ingester.py` for better clarity. - Enhanced readability by breaking long lines into multiple lines in `test_pipeline.py`. - Removed unnecessary blank lines in `test_router.py`. - Improved context management in `test_server.py` for better structure. - Streamlined assertions and formatting in `test_status.py` for clarity. - Deleted redundant `test_tools.py` and `test_verification.py` files to reduce clutter.
There was a problem hiding this comment.
Pull request overview
This PR upgrades forge-bot from a simple webhook + sandbox approach to a more capable platform: a YAML-driven Forge API client, per-event workspace containers, tool-calling agent loop, and a new “status + response” comment system, with a large accompanying test suite refactor/additions.
Changes:
- Replace the hardcoded Forge client with a YAML-defined
GenericForgeClientand new API schema/loader. - Introduce per-event persistent
ContainerManagerand status comment observability (StatusCommentManager+ formatter). - Add a multi-level “smart retrieval” subsystem (BM25 + parallel scanning + multi-hop) and extensive new tests; remove legacy sandbox implementation/tests.
Reviewed changes
Copilot reviewed 81 out of 83 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_status.py | Adds unit tests for status comment formatting + manager behavior. |
| tests/test_server.py | Updates server lifecycle mocking to patch GenericForgeClient.call() instead of legacy client. |
| tests/test_sandbox/test_parser.py | Removes legacy sandbox parser tests. |
| tests/test_sandbox/test_orchestrator.py | Removes legacy sandbox orchestrator tests. |
| tests/test_sandbox/test_images.py | Removes legacy sandbox image registry tests. |
| tests/test_sandbox/conftest.py | Removes sandbox-specific shared fixtures. |
| tests/test_router.py | Minor cleanup in router tests file formatting. |
| tests/test_retrieval/test_tool.py | Adds tests for the new workspace-backed retrieval tool behavior. |
| tests/test_retrieval/test_token_budget.py | Adds tests for retrieval token budgeting helpers. |
| tests/test_retrieval/test_pipeline.py | Adds tests for the SmartRetriever orchestration logic. |
| tests/test_retrieval/test_level3.py | Adds tests for multi-hop Level 3 agent tool-calling flow. |
| tests/test_retrieval/test_level2.py | Adds tests for Level 2 parallel scanning + answering behavior. |
| tests/test_retrieval/test_level1.py | Adds tests for Level 1 BM25-based retrieval behavior. |
| tests/test_retrieval/test_chunking.py | Adds tests for retrieval chunking utilities. |
| tests/test_retrieval/test_bm25.py | Adds tests for the pure-Python BM25 implementation. |
| tests/test_retrieval/init.py | Adds retrieval test package marker. |
| tests/test_rag/test_pipeline.py | Updates RAG pipeline tests to use the generic API client mock. |
| tests/test_rag/test_ingester.py | Updates ingester tests to assert against GenericForgeClient.call() dispatch. |
| tests/test_rag/test_embedder.py | Small formatting update to embedder test expectations. |
| tests/test_rag/conftest.py | Replaces mock_forge fixture with mock_api_client using .call() dispatch. |
| tests/test_pull_request_handler.py | Rewrites PR handler tests for new agent loop + status + container lifecycle. |
| tests/test_forge_client.py | Removes legacy hardcoded Forge client tests. |
| tests/test_container_manager.py | Adds tests for the new per-event container manager. |
| tests/test_config.py | Updates config tests for container settings replacing sandbox settings. |
| tests/test_api_schema.py | Adds tests for Pydantic models backing YAML API definitions. |
| tests/test_api_loader.py | Adds tests for YAML loader caching + provider resolution. |
| tests/test_api_client.py | Adds tests for GenericForgeClient call/param handling + error propagation. |
| sandbox-images.json | Removes legacy sandbox image override file. |
| requirements.txt | Adds PyYAML dependency for API definition loading. |
| pyproject.toml | Bumps project version to 0.2.0. |
| forge_bot/utils/token_budget.py | Adds a (separate) token budgeting utility for LLM sessions. |
| forge_bot/utils/thinking.py | Adds helper to strip/log <think>...</think> blocks. |
| forge_bot/tools/base.py | Introduces base classes for tool-calling (schema + prompt text). |
| forge_bot/tools/init.py | Adds tools package marker. |
| forge_bot/status/manager.py | Implements status + response comment manager with attachment support. |
| forge_bot/status/formatter.py | Adds markdown formatting helpers for todos and tool call history. |
| forge_bot/status/init.py | Adds status package marker. |
| forge_bot/server.py | Switches server lifecycle to generic API client; adds error-comment attempt on failures; bumps API wiring. |
| forge_bot/sandbox/parser.py | Removes legacy sandbox command parsing. |
| forge_bot/sandbox/orchestrator.py | Removes legacy sandbox orchestrator. |
| forge_bot/sandbox/images.py | Removes legacy sandbox image registry. |
| forge_bot/sandbox/init.py | Removes sandbox package marker. |
| forge_bot/router.py | Updates dispatch plumbing to use GenericForgeClient; removes legacy /run and /index paths. |
| forge_bot/retrieval/token_budget.py | Adds retrieval-specific token budgeting and truncation helpers. |
| forge_bot/retrieval/pipeline.py | Adds SmartRetriever facade and diff review routing. |
| forge_bot/retrieval/level3.py | Adds Level 3 multi-hop agent that calls Level 2 as a tool. |
| forge_bot/retrieval/level2.py | Adds Level 2 parallel chunk relevance scanning + refinement + answer generation. |
| forge_bot/retrieval/level1.py | Adds Level 1 BM25 retrieval with LLM keyword expansion. |
| forge_bot/retrieval/chunking.py | Adds chunking utilities for text and per-file diffs. |
| forge_bot/retrieval/bm25.py | Adds pure-Python BM25 implementation. |
| forge_bot/retrieval/init.py | Adds retrieval package marker. |
| forge_bot/rag/store.py | Formatting-only changes in vector store operations. |
| forge_bot/rag/retriever.py | Adds optional max token truncation control for PR context retrieval. |
| forge_bot/rag/pipeline.py | Updates RAG pipeline to use GenericForgeClient. |
| forge_bot/rag/ingester.py | Refactors ingester to use .call() endpoints + adds concurrent file fetching. |
| forge_bot/rag/embedder.py | Makes local embedding execution async via asyncio.to_thread. |
| forge_bot/rag/chunker.py | Formatting updates / signature formatting for readability. |
| forge_bot/prompts/agent_system.j2 | Adds new agent system prompt template for tool loop usage. |
| forge_bot/prompts/agent_pr_review.j2 | Adds new PR review prompt template for the agent loop. |
| forge_bot/handlers/pull_request.py | Replaces old diff-fetch + single LLM call with container + agent loop + status updates + artifacts. |
| forge_bot/handlers/base.py | Switches handler base to generic API client and stable prompts directory resolution. |
| forge_bot/container/manager.py | Adds the per-event persistent container manager (create/exec/destroy). |
| forge_bot/container/init.py | Adds container package marker. |
| forge_bot/config.py | Replaces sandbox settings with container settings + adds FORGE_PROVIDER. |
| forge_bot/clients/llm.py | Adds chat_with_tools() for function-calling flows. |
| forge_bot/clients/forge.py | Removes legacy Forge client implementation. |
| forge_bot/api/schema.py | Adds Pydantic schema models for YAML endpoint definitions. |
| forge_bot/api/loader.py | Adds YAML loader with caching and provider resolution. |
| forge_bot/api/client.py | Adds GenericForgeClient implementation and multipart handling. |
| forge_bot/api/init.py | Adds API package marker. |
| docker-compose.yml | Updates Compose config for .env loading and DinD TLS wiring. |
| README.md | Updates docs to reflect new architecture (agent loop, containers, YAML API client). |
| Dockerfile.workspace | Adds workspace image Dockerfile for per-event containers. |
| Dockerfile | Removes legacy prompts/ copy step (prompts are now under forge_bot/prompts). |
| .env.example | Updates env var docs for new provider/container settings. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| error_body = ( | ||
| "⚠️ **Processing Error**\n\n" | ||
| f"Failed to process this event due to an internal error:\n" | ||
| f"```\n{type(e).__name__}: {str(e)}\n```\n\n" | ||
| f"Please check the bot logs or contact your administrator." | ||
| ) |
There was a problem hiding this comment.
The error comment body includes the raw exception message (str(e)). This can leak internal details (and potentially sensitive info) into a public issue/PR thread. Consider logging the full exception server-side but posting a generic error message (or a redacted/hashed error ID) to the forge.
| name: str = "" | ||
| description: str = "" | ||
| parameters: list[ToolParameter] = field(default_factory=list) | ||
|
|
||
| @abstractmethod | ||
| async def execute(self, **kwargs: object) -> ToolResult: | ||
| """Execute the tool with the given arguments.""" | ||
|
|
||
| def to_openai_schema(self) -> dict: | ||
| """Convert to OpenAI function-calling tools format.""" | ||
| properties: dict[str, dict[str, str]] = {} | ||
| required: list[str] = [] | ||
| for p in self.parameters: | ||
| properties[p.name] = { |
There was a problem hiding this comment.
BaseTool.parameters is initialized with dataclasses.field(...), but BaseTool is not a @DataClass. That means parameters becomes a dataclasses.Field object, and iterating over self.parameters in to_openai_schema()/to_prompt_text() will fail unless every subclass overrides it. Use a normal class attribute (e.g., [] or ClassVar[list[ToolParameter]]) or make BaseTool a dataclass if instance-level parameters are intended.
| "printf '%s\\n' '#!/bin/sh' " | ||
| "'METHOD=\"${1:-GET}\"' " | ||
| "'ENDPOINT=\"$2\"' " | ||
| "'shift 2 2>/dev/null' " | ||
| "'exec curl -sf " | ||
| '-H "Authorization: token $FORGE_TOKEN" ' | ||
| '-H "Content-Type: application/json" ' | ||
| '-X "$METHOD" "$FORGE_URL/api/v1$ENDPOINT" "$@"\' ' | ||
| "> /usr/local/bin/forge-api && " |
There was a problem hiding this comment.
_INIT_SCRIPT's construction of /usr/local/bin/forge-api via a single giant printf string has mismatched quoting (the final argument includes a stray "'"), which is likely to produce an invalid shell script and prevent the container from ever emitting FORGE_READY. Consider switching to a heredoc (cat <<'EOF' ... EOF) or writing the script as a small embedded multi-line string with clear quoting.
| "printf '%s\\n' '#!/bin/sh' " | |
| "'METHOD=\"${1:-GET}\"' " | |
| "'ENDPOINT=\"$2\"' " | |
| "'shift 2 2>/dev/null' " | |
| "'exec curl -sf " | |
| '-H "Authorization: token $FORGE_TOKEN" ' | |
| '-H "Content-Type: application/json" ' | |
| '-X "$METHOD" "$FORGE_URL/api/v1$ENDPOINT" "$@"\' ' | |
| "> /usr/local/bin/forge-api && " | |
| "cat <<'EOF' > /usr/local/bin/forge-api\n" | |
| "#!/bin/sh\n" | |
| 'METHOD="${1:-GET}"\n' | |
| 'ENDPOINT="$2"\n' | |
| "shift 2 2>/dev/null\n" | |
| 'exec curl -sf \\\n' | |
| ' -H "Authorization: token $FORGE_TOKEN" \\\n' | |
| ' -H "Content-Type: application/json" \\\n' | |
| ' -X "$METHOD" "$FORGE_URL/api/v1$ENDPOINT" "$@"\n' | |
| "EOF\n" |
| - SANDBOX_PREPULL_IMAGES=${SANDBOX_PREPULL_IMAGES:-python,node} | ||
| - RAG_ENABLED=${RAG_ENABLED:-false} | ||
| - LOG_LEVEL=${LOG_LEVEL:-INFO} | ||
| - DOCKER_CERT_PATH=/certs/client/client |
There was a problem hiding this comment.
DOCKER_CERT_PATH is set to /certs/client/client, but the volume is mounted at /certs/client. With Docker-in-Docker TLS this path typically needs to point at the directory containing ca.pem/cert.pem/key.pem (i.e., /certs/client). As-is, the bot container may fail to connect to the dind daemon.
| - DOCKER_CERT_PATH=/certs/client/client | |
| - DOCKER_CERT_PATH=/certs/client |
| await api_client.call( | ||
| "create_issue_comment", | ||
| owner=owner, | ||
| repo=repo_name, | ||
| index=issue_number, | ||
| body=error_body, | ||
| ) |
There was a problem hiding this comment.
In the error handler, the API endpoint name "create_issue_comment" isn't defined in the YAML API definitions (they use "post_issue_comment"). This will cause the attempt to notify users about failures to always error. Rename the endpoint to the correct defined name (and keep params aligned with the YAML schema).
| repo_data = payload.get("repository", {}) | ||
| if not repo_data: | ||
| return None | ||
|
|
||
| owner = repo_data.get("owner", {}).get("login") | ||
| repo = repo_data.get("name") | ||
|
|
||
| if not owner or not repo: | ||
| return None |
There was a problem hiding this comment.
_extract_comment_target() requires repository.owner.login and repository.name, but the webhook payloads used elsewhere (and in tests) only provide repository.full_name. As a result, this function will return None and the bot won't post an error comment when it crashes. Consider falling back to parsing repository.full_name ("owner/repo") when owner/name fields are absent.
- Fix path injection in collect_artifacts: use shlex.quote() for filenames - Add bounds check for empty response.choices across LLM call sites - Validate full_name format before split in both handlers - Use configurable container_timeout instead of hardcoded 120s cap - Add tests for all fixes Co-authored-by: sweettastebuds <49539676+sweettastebuds@users.noreply.github.com>
…instructions Co-authored-by: sweettastebuds <49539676+sweettastebuds@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…w-instructions Add .github/copilot-instructions.md for repo-wide Copilot PR review context
This pull request updates the documentation and configuration to reflect a major refactor and feature expansion of the forge-bot project, transitioning from a simple webhook handler with sandboxed code execution to a sophisticated AI bot platform supporting tool-calling, per-event workspace containers, YAML-driven API clients, and advanced verification systems. The
.env.exampleandCLAUDE.mdfiles are updated for clarity and accuracy, and a newDockerfile.workspaceis added for workspace container support. The changes also introduce improved test structure documentation, environment variable descriptions, and update legacy references to match the new architecture.Documentation and Architecture Updates:
CLAUDE.mdto describe new features: tool-calling loop, per-event workspace containers, YAML-driven API client, tool system, status comment manager, verification system, and container lifecycle. Added details on new modules, critical implementation details, and command patterns. [1] [2]CLAUDE.md, listing key test files and their purposes, and clarified the style guide for code and tests.Configuration and Environment Variables:
.env.exampleto include new variables for containerized workspace support (CONTAINER_*), clarified LLM and provider settings, and removed deprecated sandbox options.CLAUDE.mdwith a table of defaults and descriptions for new and existing settings.Containerization Support:
Dockerfile.workspaceto define the workspace container image used for per-event isolation, supporting repo cloning and tool execution.CLAUDE.mdto include workspace image creation.Deprecation and Refactoring:
Minor Cleanup:
prompts/directory in the mainDockerfile(likely moved to a new location per refactor).