Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
1b601a9
feat: add YAML-driven API definition layer
sweettastebuds Feb 16, 2026
a829fb3
test: add tests for API definition layer
sweettastebuds Feb 16, 2026
d13f575
feat: add per-event container manager with repo cloning
sweettastebuds Feb 16, 2026
6aeee1f
feat: add two-comment status system for real-time observability
sweettastebuds Feb 16, 2026
b017483
feat: add tool system with search_api, api_call, exec, and todo tools
sweettastebuds Feb 16, 2026
91e2eb5
feat: rewrite handlers with tool loop, verification, and LLM tools
sweettastebuds Feb 16, 2026
951fa86
feat: wire up GenericForgeClient in server and router
sweettastebuds Feb 16, 2026
45f1acc
feat: update README for improved setup instructions and configuration…
sweettastebuds Feb 16, 2026
ef14200
fix: remove stale COPY prompts/ from Dockerfile
sweettastebuds Feb 16, 2026
1a31477
fix: update docker-compose and env config for v2 architecture
sweettastebuds Feb 16, 2026
df9f922
fix: tool call parsing for models using prompt-mode fallback
sweettastebuds Feb 16, 2026
7cb8efd
fix: shallow clone missing remote branches in workspace container
sweettastebuds Feb 16, 2026
7eaaade
feat: let LLM choose its own git clone strategy
sweettastebuds Feb 16, 2026
7f15850
fix: update project documentation and architecture details for clarit…
sweettastebuds Feb 20, 2026
2384739
feat: add extract_thinking function to process <think> tags in text
sweettastebuds Feb 20, 2026
ef9325a
feat: implement TokenBudget class for tracking token consumption in L…
sweettastebuds Feb 20, 2026
027b200
refactor: remove deprecated ForgeClient and update RAGPipeline to use…
sweettastebuds Feb 20, 2026
7a3290c
feat: implement chunked review process for pull requests and enhance …
sweettastebuds Feb 20, 2026
da1ce4c
fix: update Dockerfile.workspace to allow installing packages
sweettastebuds Feb 22, 2026
486a81c
refactor: removed deprecated sandbox-related files and classes for cl…
sweettastebuds Feb 22, 2026
c1c72a0
feat: add template for code review feedback on pull request chunks
sweettastebuds Feb 22, 2026
872b90e
fix: improve bot identity resolution error handling and add comment p…
sweettastebuds Feb 22, 2026
8a53498
refactor: rag performance fixes _ pr review integration
sweettastebuds Mar 2, 2026
cf79770
refactor: update configuration from sandbox to container settings and…
sweettastebuds Mar 2, 2026
652b834
feat: implement core agent loop with LLM and shell execution capabili…
sweettastebuds Mar 3, 2026
c24324d
Implement multi-level retrieval system with Level 1 (BM25 keyword ret…
sweettastebuds Mar 3, 2026
2fbd597
Add comprehensive tests for retrieval components
sweettastebuds Mar 3, 2026
786486d
feat: add support for file attachments in issue comments and update A…
sweettastebuds Mar 3, 2026
2f45a80
refactor: streamline issue comment and pull request handlers to use t…
sweettastebuds Mar 3, 2026
c0f5704
feat: add new prompt templates for code review and issue response, re…
sweettastebuds Mar 3, 2026
15c50e1
refactor: clean up code formatting and remove unused tools for improv…
sweettastebuds Mar 3, 2026
b350f3a
Refactor test files for improved readability and maintainability
sweettastebuds Mar 3, 2026
fb8011e
Initial plan
Copilot Mar 4, 2026
042a998
add .github/copilot-instructions.md with repo-wide Copilot PR review …
Copilot Mar 4, 2026
f734124
Update .github/copilot-instructions.md
sweettastebuds Mar 4, 2026
099f134
Update .github/copilot-instructions.md
sweettastebuds Mar 4, 2026
cc9b991
Merge pull request #7 from sweettastebuds/copilot/set-custom-pr-revie…
sweettastebuds Mar 4, 2026
1b5da5a
feat: add 'review_requested' action to pull request event handling
sweettastebuds Mar 4, 2026
9930588
chore: removed old docs
sweettastebuds Mar 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,25 @@ FORGE_API_TOKEN=your-bot-api-token
FORGE_WEBHOOK_SECRET=your-webhook-secret
LLM_API_KEY=sk-your-key

# === Forge provider ===
# FORGE_PROVIDER=gitea # "gitea" or "forgejo"

# === LLM ===
# LLM_BASE_URL=https://api.openai.com/v1 # Or http://ollama:11434/v1
# LLM_BASE_URL=https://api.openai.com/v1 # Or http://localhost:11434/v1 for Ollama
# LLM_MODEL=gpt-4o
# LLM_TEMPERATURE=0.2
# LLM_MAX_TOKENS=4096
# LLM_TIMEOUT=120
# LLM_MAX_CONCURRENT=3
# LLM_CONTEXT_WINDOW=8192 # Match your model's context window size

# === Sandbox ===
# SANDBOX_ENABLED=true
# SANDBOX_TIMEOUT=60
# SANDBOX_MEMORY=512m
# SANDBOX_CPUS=1.0
# SANDBOX_PREPULL_IMAGES=python,node # "all", "none", or comma-separated keys
# SANDBOX_IMAGES_FILE= # Path to custom sandbox-images.json override
# === Container / workspace ===
# CONTAINER_ENABLED=true
# CONTAINER_TIMEOUT=150
# CONTAINER_MEMORY=512m
# CONTAINER_CPUS=1.0
# CONTAINER_WORKSPACE_IMAGE=forge-bot-workspace:latest
# CONTAINER_NETWORK_ENABLED=true

# === RAG (optional) ===
# RAG_ENABLED=false
Expand All @@ -37,4 +40,3 @@ LLM_API_KEY=sk-your-key
# === General ===
# WEBHOOK_PORT=8080
# LOG_LEVEL=INFO
# BOT_COMMAND_PREFIX=/
184 changes: 184 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# GitHub Copilot Instructions for forge-bot

forge-bot is a Python 3.12 webhook service that connects Gitea/Forgejo repositories to an
OpenAI-compatible LLM. It receives webhook events, routes them to handlers, runs an agentic
tool-calling loop in a per-event Docker workspace container, and posts results back via the
Forge REST API. Every PR should be reviewed through this lens.

---

## Architecture

```
Webhook POST → HMAC-SHA256 verify → return 200 → background task
→ router.py (event type + action → handler)
→ PullRequestHandler — AgentLoop reviews the diff
→ IssueCommentHandler — AgentLoop answers @mentions
```

Key invariants:

- **Return 200 before processing.** Gitea/Forgejo has a 5 s webhook timeout. The handler
must be dispatched as a `BackgroundTask`; never `await` it on the request path.
- **Self-loop guard.** `router.py` must drop any event where `sender.login == bot_username`.
Removing or weakening this check causes infinite webhook loops.
- **HMAC verification.** `server.py` reads raw bytes *before* JSON parsing and uses
`hmac.compare_digest()`. Any shortcut (e.g. comparing strings directly, parsing first)
is a security regression.
- **Container destroyed in `finally`.** Every handler creates a `ContainerManager` and must
destroy it in a `finally` block regardless of success or failure.

---

## Stack & Style Rules

- **Python 3.12** — use modern syntax: `X | Y` unions, `match`, `f`-strings with `=`.
- **Async everywhere.** Every function that touches the network, filesystem, or Docker
must be `async def` / `await`. Never call sync-blocking code on the event loop directly;
use `asyncio.to_thread()` if unavoidable.
- **Type hints on all public signatures.** Return types included. Avoid `Any` unless
unavoidable (e.g. JSON blobs).
- **Pydantic v2 models** for all external data (webhook payloads, API responses, config).
Use `model_validate()`, `Field()`, and `field_validator(mode="before")` for null-coercion
(Gitea sends `null` for empty lists like `assignees` and `requested_reviewers`).
- **`pydantic-settings` `Settings` class** (`config.py`) for all environment variables.
Never read `os.environ` directly elsewhere.
- **Jinja2 `.j2` templates** in `forge_bot/prompts/` for all system / multi-line /
structured LLM prompts. Short, simple user messages (e.g. one-liners) may be built
inline in handlers (including with `f`-strings).
- **YAML-driven API client.** Endpoint definitions live in
`forge_bot/api/definitions/gitea.yaml` (and `forgejo.yaml`). Do not hard-code API URLs
or add new `httpx` calls outside `GenericForgeClient`. New endpoints belong in the YAML.
- **Ruff** is the linter/formatter (target Python 3.12, line length 100, rules E F I N W UP
B SIM). All new code must pass `ruff check` and `ruff format --check`.

---

## AgentLoop & Tool System

- The `AgentLoop` (`forge_bot/agent.py`) is the core agentic primitive. It drives an
LLM in a tool-calling loop (max rounds until stuck), with budget-aware context trimming,
`/workspace/.notes` external memory, `FORGE_TODO` progress markers, and a stuck-detection
fallback.
- Tools are `BaseTool` subclasses (`forge_bot/tools/base.py`) with `name`, `description`,
`parameters: list[ToolParameter]`, and `async execute(**kwargs) -> ToolResult`.
`to_openai_schema()` must remain in sync with `parameters`.
- The `execute` built-in tool is defined inline in `agent.py` as an OpenAI schema dict;
it runs shell commands in the workspace container. Blocked patterns (`rm -rf /`, `mkfs`,
`dd if=`, `> /dev/`) must not be removed or weakened.
- Extra tools (e.g. `RetrievalTool`) are injected at construction time; they must implement
`BaseTool` and be passed as `extra_tools=[...]` to `AgentLoop`.

---

## ContainerManager

- One container per webhook event, destroyed in `finally`.
- Resource limits (`--memory=512m`, `--cpus=1.0`, `--pids-limit=256`) must not be
increased without justification.
- The container's network can be disabled via `CONTAINER_NETWORK_ENABLED=false`; code
must not assume network is always available.
- Readiness is detected by polling for `FORGE_READY` in container logs; do not skip this
poll or replace it with a fixed `sleep`.
- The API token is embedded in the clone URL for authenticated access. Never log the
clone URL at INFO or above.

---

## StatusCommentManager

- Maintains **two** Gitea comments per interaction: a live-updating status comment and a
final response comment. Never collapse them into one.
- `post_initial_status()` → `update_phase(msg)` → `record_tool_call(record)` →
`post_response(text)` → `finalize_status(label)` is the canonical lifecycle. Do not
call these out of order.
- Status updates are best-effort: all calls must be wrapped in `try/except` and failures
logged at WARNING, never re-raised.

---

## Testing Conventions

- **pytest-asyncio** with `asyncio_mode = "auto"` — async tests run without extra
configuration; `@pytest.mark.asyncio` is acceptable and commonly used in this repo.
- **pytest-httpx** for mocking outbound HTTP (`httpx.AsyncClient`).
- Use `unittest.mock.AsyncMock` for coroutine dependencies (`api.call`, `llm.chat`, etc.).
Use `MagicMock` for synchronous objects (settings, container, status).
- Handler tests patch `ContainerManager`, `StatusCommentManager`, and `AgentLoop` with
`patch(...)` context managers — see `tests/test_pull_request_handler.py` for the
canonical pattern.
- Settings in tests are `MagicMock()` objects with only the attributes under test set
explicitly. Do not instantiate the real `Settings` class in unit tests.
- Test files live in `tests/` and mirror the module they test
(e.g. `test_pull_request_handler.py` ↔ `forge_bot/handlers/pull_request.py`).

---

## What to Look For in PRs

### Correctness & Reliability
- Any `await` on the webhook request path (before `BackgroundTasks.add_task`) will block
the 200 response and risk Gitea timeouts — flag as critical.
- Missing `finally: await container.destroy()` in a handler leaks Docker containers.
- Swallowed exceptions in the agent loop (`except Exception: pass`) hide failures from
operators; require at least `logger.warning(..., exc_info=True)`.
- New LLM calls must go through `LLMClient` with its `asyncio.Semaphore` concurrency
guard — never construct `AsyncOpenAI` directly in handler code.

### Security
- HMAC verification must remain on raw bytes before JSON parsing. Flag any PR that moves
the signature check after `json.loads()`.
- The `execute` tool's blocked-pattern list (`_BLOCKED_PATTERNS` in `agent.py`) must not
shrink.
- API tokens and secrets must never be logged. Check any new `logger.*` lines that
reference `settings`, `clone_url`, or `token`.
- New endpoints added to `gitea.yaml`/`forgejo.yaml` that expose write operations
(POST/PATCH/DELETE) warrant extra scrutiny.

### Context Window & LLM Budget
- Every new field added to a Jinja2 prompt template increases the system-prompt size.
Large fields (diffs, file contents) must be truncated before injection.
- Context trimming in `AgentLoop._prepare_messages()` assumes the system prompt is at
index 0 and the initial user message at index 1 — changes to message ordering break it.

### API Client / YAML Definitions
- New API calls must be added as YAML endpoint definitions, not as raw `httpx` calls.
- Path parameters use `{name}` placeholders; they must have a matching `params` entry
with `location: path`.
- `response_type: text` endpoints return `str`; `response_type: json` return `dict|list`.
Callers must handle both types correctly.

### Async & Concurrency
- Docker SDK calls (`docker.from_env()`, `container.exec_run()`) are synchronous and must
be wrapped with `asyncio.to_thread()`.
- Shared mutable state accessed from multiple background tasks must be protected (e.g.
`DeliveryTracker` uses an `OrderedDict` — confirm thread-safety for any new shared state).

### Prompt Templates
- Templates must not concatenate untrusted user input without escaping; the `autoescape`
is disabled in the Jinja2 environment (prompts are not HTML), so Markdown injection is
possible. Severity: 🟡 Suggestion unless the content can break tool-calling JSON.
- Low temperature (0.2) is intentional for determinism. PRs raising it above 0.5 need
justification.

---

## File Map (quick reference)

| Path | Purpose |
|------|---------|
| `forge_bot/server.py` | FastAPI app, HMAC verification, background dispatch |
| `forge_bot/router.py` | Event routing, self-loop guard, @mention detection |
| `forge_bot/agent.py` | AgentLoop: tool-calling loop, context trimming, stuck detection |
| `forge_bot/config.py` | All settings via pydantic-settings |
| `forge_bot/models.py` | Pydantic webhook payload models |
| `forge_bot/api/client.py` | YAML-driven HTTP client |
| `forge_bot/api/definitions/` | Gitea & Forgejo endpoint YAML definitions |
| `forge_bot/handlers/pull_request.py` | PR review handler |
| `forge_bot/handlers/issue_comment.py` | @mention reply handler |
| `forge_bot/container/manager.py` | Per-event Docker workspace |
| `forge_bot/status/manager.py` | Two-comment status system |
| `forge_bot/tools/base.py` | BaseTool ABC, ToolResult, ToolParameter |
| `forge_bot/prompts/` | Jinja2 prompt templates |
| `forge_bot/retrieval/` | SmartRetriever: BM25 + LLM-ranked context retrieval |
| `forge_bot/rag/` | Optional ChromaDB-based RAG pipeline (disabled by default) |
Loading