Skip to content

Matas/fix/fix actions#43

Merged
dkargatzis merged 25 commits intowarestack:mainfrom
MT-superdev:matas/fix/fix-actions
Jan 31, 2026
Merged

Matas/fix/fix actions#43
dkargatzis merged 25 commits intowarestack:mainfrom
MT-superdev:matas/fix/fix-actions

Conversation

@MT-superdev
Copy link
Contributor

@MT-superdev MT-superdev commented Jan 24, 2026

Fixed pre-commit configuration and GitHub Actions workflow for Matas/fix/fix actions. This PR updates Ruff, enforces conventional commit messages, and enhances repository analysis with rate limiting and unauthenticated access for public repos. This improves code quality, governance, and usability.

Architecture & Core Changes

  • Frontend now fetches data from the backend, replacing mock data.
  • Implemented rate limiting for the /v1/rules/recommend endpoint, limiting anonymous users to 5 requests per hour and authenticated users to 100 requests per hour.
  • Added explicit typing to **kwargs in agent execute methods for MyPy strict mode compatibility.

API & Data Changes

  • The repository analysis endpoint /v1/rules/recommend now supports unauthenticated access for public GitHub repositories.
  • Added new AnalysisState model to repository_analysis_agent for shared memory between agent nodes.
  • Replaced structured models for CONTRIBUTING_GUIDELINES_ANALYSIS_PROMPT and REPOSITORY_ANALYSIS_PROMPT for explicit output and validation.

Performance & Reliability

  • Changed default timeout for agent execution to 60s for showcase stability.
  • Added retry logic to GitHub GraphQL requests.

Dependencies & Configuration

  • Replaced black, isort, and flake8 with ruff for linting and formatting.
  • Updated pre-commit hooks to use ruff for linting and formatting.
  • Updated pyproject.toml to use Ruff configuration.
  • Added linting_commands.txt with useful linting commands.
  • Added dependencies: giturlparse, structlog, gql[all].

Summary by CodeRabbit

  • New Features

    • Rules API (analyze + proceed-with-pr), token validation endpoint, and repository analysis that generates YAML rules, PR plans, and analysis reports.
  • Enhancements

    • Anonymous public-repo analysis with server-side rate limiting.
    • Rich hygiene metrics, per-rule explanations, PR-ready rule payloads, improved webhook reliability, background task queue, and enhanced GitHub check/reporting (including acknowledgments).
  • Documentation

    • Major README and docs rewrite with streamlined onboarding, rule reference, and quick-start.
  • Tests

    • Added extensive unit and integration tests for analysis, webhooks, and rules.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 24, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds a conditions-based rule engine, a LangGraph-driven repository analysis pipeline, expanded GitHub REST/GraphQL integrations (including anonymous access), a new /rules API with auth and rate-limiting, typed Pydantic models and check-run/reporting tooling, a TaskQueue/scheduler redesign, structlog migration, and extensive documentation and tests.

Changes

Cohort / File(s) Summary
CI & Tooling
\.github/workflows/docs.yaml, \.github/workflows/tests.yaml, .pre-commit-config.yaml, pyproject.toml, linting_commands.txt, justfile
Tightened GH Pages deploy gating; bumped pre-commit hooks to Ruff-centric config; modernized pyproject dev deps and linting; added lint/run helper commands and CI env defaults.
Docs & Contribution
README.md, DEVELOPMENT.md, CONTRIBUTING.md, LOCAL_SETUP.md, docs/..., mkdocs.yml
Substantial documentation rewrite to a maintainer-first, YAML-rule focus; reorganized quick-start/config pages; removed extra JS asset.
API & Routing
src/api/...
src/api/recommendations.py, src/api/rules.py, src/api/auth.py, src/api/repos.py, src/api/dependencies.py, src/api/rate_limit.py, src/api/errors.py, src/api/scheduler.py
Added /rules endpoints and request/response models, proceed-with-pr flow, parse_repo helpers, auth DI, in-memory rate limiter, structured API errors, scheduler type hints.
Agents & Engine
src/agents/...
src/agents/base.py, src/agents/engine_agent/..., src/agents/acknowledgment_agent/..., src/agents/feasibility_agent/..., src/agents/repository_analysis_agent/..., src/agents/repository_analysis_agent/metrics.py
Unified AgentResult as Pydantic model; standardized execute(**kwargs) signatures; engine accepts EngineRequest; repository analysis migrated to LangGraph multi-step pipeline with new models, nodes, prompts, and metrics.
Rules & Conditions
src/rules/...
src/rules/models.py, src/rules/__init__.py, src/rules/registry.py, src/rules/acknowledgment.py, src/rules/conditions/*, src/rules/loaders/*, src/rules/utils/*
Introduced BaseCondition API, a Condition registry and AVAILABLE_CONDITIONS, many new condition modules (access_control, filesystem, pull_request, temporal, workflow), acknowledgment parsing utilities, and mapping from rule parameters → conditions.
Removed legacy validators
src/rules/validators.py
Removed legacy validator module and its registry; replaced by new condition/registry architecture.
GitHub integrations
src/integrations/github/*
src/integrations/github/api.py, graphql.py, graphql_client.py, models.py, schemas.py, service.py, rule_loader.py, rules_service.py, check_runs.py
Added GraphQL client/models and PR-context fetcher, anonymous/public file access flows, richer REST/GraphQL operations, GitHubService, repo-based rule loader/validation, and CheckRunManager.
Event processors & PR flow
src/event_processors/*
src/event_processors/pull_request/..., push.py, check_run.py, deployment*, violation_acknowledgment.py, rule_creation.py, base.py, factory.py
New PullRequestEnricher/Processor and processor refactors: typed Violation/Acknowledgment models, CheckRunManager integration, stronger installation_id validation, and consolidated processor flows (removals/additions).
Task queue & Scheduler
src/tasks/task_queue.py, src/tasks/scheduler/deployment_scheduler.py
New in-memory Task/TaskQueue with SHA256 deduplication, worker model, retry/backoff, UTC-aware scheduler, typed start/stop APIs, and status reporting.
Webhooks & Handlers
src/webhooks/*
src/webhooks/router.py, dispatcher.py, models.py, handlers/*, auth.py
Dispatcher now enqueues to TaskQueue; handlers return typed WebhookResponse; added Pydantic webhook models, enum-based event checks, header redaction, and structlog usage.
Presentation & Formatting
src/presentation/github_formatter.py
New GitHub-focused formatters for check-run payloads, PR comments, and acknowledgment summaries.
Core models, errors & utils
src/core/*
src/core/models.py, errors.py, constants.py, config/*, utils/*, patterns.py
Introduced Severity/Violation/Acknowledgment/User/HygieneMetrics/WebhookEvent models, GitHub-specific exceptions, DEFAULT_TEAM_MEMBERSHIPS, config fields for analysis/rate-limits, typed utilities and glob-pattern helpers.
Integrations providers & wrappers
src/integrations/providers/*
Typing fixes and compatibility wrappers for Bedrock/Anthropic/OpenAI providers and minor refactors.
Tests
tests/**, tests/conftest.py
Large additions/updates to unit and integration tests for agents, API, GitHub integrations, conditions, task queue, and webhooks; added fixtures and autouse mocks; removed some legacy tests.
Scripts
scripts/migrate_to_structlog.py
Added migration helper to convert logging usage to structlog.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant API as "FastAPI /rules/recommend"
  participant Rate as "RateLimiter (IP/User)"
  participant Agent as "RepositoryAnalysisAgent"
  participant GitHub as "GitHub (REST/GraphQL)"
  participant LLM as "LLM Service"

  Client->>API: POST /rules/recommend (repo or URL, optional auth)
  API->>Rate: rate_limiter(user/IP)
  Rate-->>API: allowed / 429
  API->>Agent: analyze(repo_full_name, user, options)
  Agent->>GitHub: list repo files / get file contents (any-auth)
  GitHub-->>Agent: files & contents
  Agent->>GitHub: fetch_recent_pull_requests / fetch_pr_hygiene_stats
  GitHub-->>Agent: PRs & hygiene data
  Agent->>LLM: prompts (analysis -> rule generation -> reasonings)
  LLM-->>Agent: structured recommendations (rules, reasonings)
  Agent-->>API: AnalysisResponse (rules_yaml, pr_plan, analysis_report)
  API-->>Client: 200 OK (AnalysisResponse)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Matas/fix/fix actions #43: Mirrors the change to .github/workflows/docs.yaml gating condition by adding github.repository == 'warestack/watchflow'.

Poem

"I hopped through code with a twitching nose,
LangGraph nodes and condition rows,
Rules in YAML, checks take flight,
Structlog hums into the night,
A rabbit cheers: the repo grows!" 🐇

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title 'Matas/fix/fix actions' is vague and does not clearly convey the main change. It uses a branch-naming convention rather than a descriptive commit-style title. Consider using a more descriptive title that summarizes the primary change, such as 'Migrate linting to Ruff and implement rate limiting' or 'Fix pre-commit configuration and add repository analysis features'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89fc0c4 and 07e3918.

📒 Files selected for processing (2)
  • src/event_processors/pull_request/processor.py
  • src/webhooks/router.py

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Jan 24, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 71.84619% with 1252 lines in your changes missing coverage. Please review.

❌ Your patch status has failed because the patch coverage (71.8%) is below the target coverage (80.0%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (67.0%) is below the target coverage (80.0%). You can increase the head coverage or adjust the target coverage.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@           Coverage Diff            @@
##            main     #43      +/-   ##
========================================
+ Coverage   32.8%   67.0%   +34.1%     
========================================
  Files         85     151      +66     
  Lines       5162    9384    +4222     
========================================
+ Hits        1698    6290    +4592     
+ Misses      3464    3094     -370     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5a3be1...07e3918. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/rules/utils/contributors.py (2)

55-55: Timezone-naive datetime will cause TypeError when compared.

cutoff_date is timezone-naive, but in _has_recent_activity (lines 196-197, 206-207), it's compared against timezone-aware datetimes parsed from GitHub's ISO timestamps. Python 3 raises TypeError: can't compare offset-naive and offset-aware datetimes.

🐛 Proposed fix
-            cutoff_date = datetime.now() - timedelta(days=days_back)
+            cutoff_date = datetime.now(tz=timezone.utc) - timedelta(days=days_back)

Also update the import at line 9:

-from datetime import datetime, timedelta
+from datetime import datetime, timedelta, timezone

247-252: Singleton ignores github_client parameter after first initialization.

Once _contributor_analyzer is created, subsequent calls with a different github_client will silently return the analyzer bound to the original client. This could cause authentication or tenant isolation issues.

♻️ Suggested fix: validate or key by client

Option 1 - Always use the provided client:

 def get_contributor_analyzer(github_client) -> ContributorAnalyzer:
     """Get or create the global contributor analyzer instance."""
     global _contributor_analyzer
-    if _contributor_analyzer is None:
-        _contributor_analyzer = ContributorAnalyzer(github_client)
+    if _contributor_analyzer is None or _contributor_analyzer.github_client is not github_client:
+        _contributor_analyzer = ContributorAnalyzer(github_client)
     return _contributor_analyzer

Option 2 - Remove global singleton and always create fresh instance (simpler, cache still works per-instance):

 def get_contributor_analyzer(github_client) -> ContributorAnalyzer:
-    """Get or create the global contributor analyzer instance."""
-    global _contributor_analyzer
-    if _contributor_analyzer is None:
-        _contributor_analyzer = ContributorAnalyzer(github_client)
-    return _contributor_analyzer
+    """Create a contributor analyzer instance."""
+    return ContributorAnalyzer(github_client)
src/webhooks/handlers/deployment_status.py (1)

36-42: enqueue call no longer matches TaskQueue API (missing handler + return type).

TaskQueue.enqueue now requires a handler function and returns bool, but this call passes no func and treats the return as task_id. This will raise at runtime and the response will be wrong even after fixing the call. Align the handler to the new API (or change TaskQueue to return a task_id).

src/main.py (2)

61-101: Fix handler registration key type (Enum vs str).

WebhookDispatcher.dispatch looks up handlers by string event type. Registering with EventType enums will not match and will drop all events as “handler_not_found”.

Proposed fix (normalize to string keys)
-    dispatcher.register_handler(EventType.PULL_REQUEST, pull_request_handler.handle)
-    dispatcher.register_handler(EventType.PUSH, push_handler.handle)
-    dispatcher.register_handler(EventType.CHECK_RUN, check_run_handler.handle)
-    dispatcher.register_handler(EventType.ISSUE_COMMENT, issue_comment_handler.handle)
-    dispatcher.register_handler(EventType.DEPLOYMENT, deployment_handler.handle)
-    dispatcher.register_handler(EventType.DEPLOYMENT_STATUS, deployment_status_handler.handle)
-    dispatcher.register_handler(EventType.DEPLOYMENT_REVIEW, deployment_review_handler.handle)
-    dispatcher.register_handler(EventType.DEPLOYMENT_PROTECTION_RULE, deployment_protection_rule_handler.handle)
+    dispatcher.register_handler(EventType.PULL_REQUEST.value, pull_request_handler.handle)
+    dispatcher.register_handler(EventType.PUSH.value, push_handler.handle)
+    dispatcher.register_handler(EventType.CHECK_RUN.value, check_run_handler.handle)
+    dispatcher.register_handler(EventType.ISSUE_COMMENT.value, issue_comment_handler.handle)
+    dispatcher.register_handler(EventType.DEPLOYMENT.value, deployment_handler.handle)
+    dispatcher.register_handler(EventType.DEPLOYMENT_STATUS.value, deployment_status_handler.handle)
+    dispatcher.register_handler(EventType.DEPLOYMENT_REVIEW.value, deployment_review_handler.handle)
+    dispatcher.register_handler(EventType.DEPLOYMENT_PROTECTION_RULE.value, deployment_protection_rule_handler.handle)

144-158: TaskQueue does not expose a tasks attribute; endpoint will crash with AttributeError.

The health_tasks() function accesses task_queue.tasks.values() (line 144), but TaskQueue only exposes queue, processed_hashes, and workers. Additionally, the Task model lacks a status field, so even if tasks existed, the filtering logic would fail.

🤖 Fix all issues with AI agents
In @.pre-commit-config.yaml:
- Around line 17-18: The pre-commit entry for the Ruff hook is pinned to an
outdated revision (repo: https://github.com/astral-sh/ruff-pre-commit with rev:
v0.3.0); update the rev to a current release (preferably v0.14.14, or at minimum
v0.14.13) in the .pre-commit-config.yaml and then run pre-commit autoupdate or
reinstall hooks to ensure the newer Ruff version is used.

In `@src/agents/repository_analysis_agent/nodes.py`:
- Around line 282-295: In the loop over pr_nodes in nodes.py (the block
computing codeowner_bypass_rate), avoid directly indexing
review["author"]["login"] since review["author"] may be None; instead retrieve
the reviewer login defensively (e.g., reviewer = review.get("author") and
reviewer_login = reviewer.get("login") if reviewer else None) and use that
reviewer_login in the approval check (compare against the existing author
variable which may also be None). Update the approved condition in the reviews
iteration to first handle a missing author object and only consider reviews with
state "APPROVED" and a non-null reviewer_login that differs from pr author.

In `@src/api/dependencies.py`:
- Around line 23-41: The get_current_user_optional function currently accepts
any Bearer token and returns a fake User, enabling auth bypass; replace this by
validating the token before constructing a User: call a token verification
helper (e.g., verify_token or integrate with your IdP client) from
get_current_user_optional to validate signature and claims, reject
invalid/malformed tokens by returning None and logging the failure, and only
create and return a User (or map claims to User) when verification succeeds;
ensure exceptions from verification are caught, logged via logger.warning, and
result in None instead of a fake authenticated user.

In `@src/integrations/github/api.py`:
- Around line 1020-1024: fetch_recent_pull_requests calls a non-existent helper
self._detect_issue_references(...) which will raise AttributeError; add a
private method _detect_issue_references(self, body, title) to the GitHubClient
class that returns a boolean by scanning the PR body and title for issue
reference patterns (e.g. plain `#123`, owner/repo#123, or keywords like
"fixes|closes|resolves" followed by an issue ref) using a safe regex and ensure
it is used by fetch_recent_pull_requests; keep the method name exactly
_detect_issue_references so the existing call resolves.

In `@src/integrations/github/graphql_client.py`:
- Around line 89-94: The list comprehension building Commit instances can crash
when GraphQL returns author: null because node["commit"].get("author", {}) will
be None; update the Commit construction (where Commit(...) is created) to first
coerce the author dict to a safe mapping (e.g., author_info =
node["commit"].get("author") or {}) and then use author_info.get("name",
"Unknown") for the author field so it never raises an AttributeError.

In `@src/main.py`:
- Around line 111-119: The CORS middleware is configured with
allow_origins=["*"] and allow_credentials=True which is invalid for wildcard
origins; update the app.add_middleware(CORSMiddleware, ...) configuration by
removing or setting allow_credentials to False when using allow_origins=["*"],
or alternatively replace allow_origins=["*"] with an explicit list of allowed
origins and keep allow_credentials=True; modify the parameters on the
CORSMiddleware invocation (app.add_middleware) accordingly.

In `@src/rules/validators.py`:
- Around line 238-243: The validate method in has_min_approals currently returns
True unconditionally; change it to read min_approvals =
parameters.get("min_approvals", 1), obtain the PR identifier from event (e.g.
event["pull_request"]["number"] or event["issue"]["number"] depending on
payload), fetch the PR reviews using the repository/GitHub client available in
the validator (or use event["reviews"] if the payload includes them), count
reviews with state "APPROVED" (ignoring duplicates by reviewer if desired) and
return True only if approved_count >= min_approvals; update the async validate
method to perform this logic and surface errors rather than always returning
True.

In `@src/tasks/task_queue.py`:
- Around line 43-61: The current enqueue method adds task_id to processed_hashes
immediately which both leaks memory and prevents retries on failure; change the
behavior so enqueue (in enqueue and using _generate_task_id, Task, queue.put)
only checks processed_hashes but does NOT add the id—move the addition to the
successful completion path in the worker/executor that runs the Task (i.e.,
after the task.func finishes without error) so failed tasks can be retried. Also
replace or augment processed_hashes with a TTL/evicting structure (e.g., an
async-safe TTL cache or an OrderedDict with timestamp eviction) to bound memory
growth and periodically prune expired entries; ensure all access to
processed_hashes is concurrency-safe in the async executor and enqueue.

In `@src/webhooks/auth.py`:
- Around line 28-30: The current call logger.info("request_headers_received",
headers=dict(request.headers)) exposes sensitive headers; either remove this
debug logging entirely or change it to logger.debug and sanitize headers first
by filtering out Authorization, Cookie, Set-Cookie, Proxy-Authorization, and any
custom API key headers from request.headers before logging; update the code
around the logger.info call (the request_headers_received logging) to build a
safe headers dict that masks or omits those sensitive keys and log that instead.

In `@src/webhooks/handlers/deployment.py`:
- Around line 36-39: The current call to TaskQueue.enqueue incorrectly omits the
required handler argument and misinterprets its boolean return as task_id;
update the call to pass the appropriate handler function (e.g.,
deployment_handler or the actual handler used for deployment tasks) as func and
treat the return value as a success flag (bool) instead of an ID: call await
task_queue.enqueue(func=deployment_handler, event_type="deployment",
repo_full_name=repo_full_name, installation_id=installation_id,
payload=payload), then check the boolean result and if true proceed to
build/return the correct response (generate or fetch a real task_id separately
if your API requires one) and if false log/return an error (e.g., 503) —
reference TaskQueue.enqueue and the local variable task_id (replace its usage
with a success check or a real id lookup).

In `@src/webhooks/handlers/issue_comment.py`:
- Line 153: The logger call uses structlog-style keyword args which the standard
logging module doesn't support; update the logger.info invocation that
references logger and comment_body (the line currently
"logger.info('extracting_acknowledgment_reason',
body_length=len(comment_body))") to use standard logging formatting and
positional args (e.g. include body_length in the format string and pass
len(comment_body) as an argument) so the message logs correctly at runtime.
- Line 33: The log call using logger.info("comment_processed",
commenter=commenter, body_length=len(comment_body)) is invalid for the standard
logging.Logger; update the logger usage in the issue comment handler to either
(a) convert the module logger to a structlog logger (align with PR) and ensure
the logger at module scope is created via structlog.get_logger() before calling
info with kwargs, or (b) keep the existing logging.getLogger(__name__) and
change the call in the handler to a formatted message using positional/format
placeholders or keyword interpolation (e.g., include commenter and body_length
in the message string) so no kwargs are passed to logging.Logger.info; locate
the logger initialization and the info call in the issue comment handler to make
the consistent change.
🟡 Minor comments (14)
docs/benchmarks.md-9-9 (1)

9-9: Fix the “70 + enterprise” typo for consistency.
The spacing reads like arithmetic and conflicts with “70+” later in the document.

✏️ Suggested fix
-Our analysis of 70 + enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.
+Our analysis of 70+ enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.
src/api/rules.py-11-11 (1)

11-11: event_data field is defined but unused.

The event_data field is declared with a comment indicating "Advanced: pass extra event data for edge cases" but it's never passed to agent.execute(). Either implement the feature or remove the unused field to avoid confusion.

🔧 If intended to be used, pass it to the agent
-    result = await agent.execute(rule_description=request.rule_text)
+    result = await agent.execute(rule_description=request.rule_text, event_data=request.event_data)
src/core/utils/caching.py-161-164 (1)

161-164: Use is not None to properly handle explicit ttl=0.

The truthiness check treats ttl=0 as falsy, causing it to default to 3600 instead of creating a cache with zero TTL. While no code currently passes ttl=0, this design is surprising and prevents that use case. Change the condition to if ttl is not None to preserve explicit zero values.

Proposed fix
-        cache = AsyncCache(maxsize=maxsize, ttl=ttl) if ttl else TTLCache(maxsize=maxsize, ttl=ttl or 3600)
+        cache = (
+            AsyncCache(maxsize=maxsize, ttl=ttl)
+            if ttl is not None
+            else TTLCache(maxsize=maxsize, ttl=3600)
+        )
pyproject.toml-111-119 (1)

111-119: Inconsistency: respx missing from [tool.uv] dev-dependencies.

respx>=0.20.0 is added to [project.optional-dependencies] dev (line 38) but not to [tool.uv] dev-dependencies. This may cause issues depending on which installation method is used.

Proposed fix
 [tool.uv]
 dev-dependencies = [
     "pytest>=7.4.0",
     "pytest-asyncio>=0.21.0",
     "pytest-cov>=4.1.0",
+    "respx>=0.20.0",
     "mypy>=1.7.0",
     "pre-commit>=3.5.0",
     "ruff>=0.1.0",
 ]
pyproject.toml-28-28 (1)

28-28: Update giturlparse version constraint to be more specific.

The constraint >=0.1.0 is very permissive and allows versions spanning from 0.1.0 to the current latest (0.14.0). Consider pinning to a more specific range (e.g., >=0.10.0,<0.15.0 or similar) to avoid unexpected behavior from incompatible future releases.

tests/unit/tasks/test_queue.py-91-129 (1)

91-129: Stop workers at test end to avoid leaked background tasks.

Tests that call start_workers() should stop_workers() in a finally block to prevent task leakage across the suite.

🛠️ Suggested fix
     async def test_worker_processes_tasks(self, queue: TaskQueue, sample_payload: dict[str, object]) -> None:
         """Test that worker processes enqueued tasks."""
         handler = AsyncMock()
 
         # Start the worker
         await queue.start_workers()
 
-        # Enqueue a task
-        await queue.enqueue(handler, "pull_request", sample_payload)
-
-        # Wait for worker to process
-        await asyncio.sleep(0.1)
-        await queue.queue.join()
-
-        # Verify handler was called
-        assert handler.called
+        try:
+            # Enqueue a task
+            await queue.enqueue(handler, "pull_request", sample_payload)
+
+            # Wait for worker to process
+            await asyncio.sleep(0.1)
+            await queue.queue.join()
+
+            # Verify handler was called
+            assert handler.called
+        finally:
+            await queue.stop_workers()
@@
     async def test_worker_handles_exceptions(self, queue: TaskQueue, sample_payload: dict[str, object]) -> None:
         """Test that worker continues after handler raises exception."""
         # Create a handler that raises an exception
         failing_handler = AsyncMock(side_effect=ValueError("Test error"))
         success_handler = AsyncMock()
 
         # Start the worker
         await queue.start_workers()
 
-        # Enqueue failing task
-        await queue.enqueue(failing_handler, "pull_request", sample_payload)
-
-        # Enqueue successful task with different payload
-        different_payload = {**sample_payload, "action": "closed"}
-        await queue.enqueue(success_handler, "pull_request", different_payload)
-
-        # Wait for worker to process both
-        await asyncio.sleep(0.2)
-        await queue.queue.join()
-
-        # Verify both handlers were called despite first one failing
-        assert failing_handler.called
-        assert success_handler.called
+        try:
+            # Enqueue failing task
+            await queue.enqueue(failing_handler, "pull_request", sample_payload)
+
+            # Enqueue successful task with different payload
+            different_payload = {**sample_payload, "action": "closed"}
+            await queue.enqueue(success_handler, "pull_request", different_payload)
+
+            # Wait for worker to process both
+            await asyncio.sleep(0.2)
+            await queue.queue.join()
+
+            # Verify both handlers were called despite first one failing
+            assert failing_handler.called
+            assert success_handler.called
+        finally:
+            await queue.stop_workers()
@@
     async def test_enqueue_with_args_and_kwargs(self, queue: TaskQueue, sample_payload: dict[str, object]) -> None:
         """Test enqueue passes args and kwargs to handler."""
         handler = AsyncMock()
 
         # Start worker
         await queue.start_workers()
 
-        # Enqueue with additional args and kwargs
-        event_mock = {"test": "data"}
-        await queue.enqueue(handler, "pull_request", sample_payload, event_mock, timeout=30)
-
-        # Wait for processing
-        await asyncio.sleep(0.1)
-        await queue.queue.join()
-
-        # Verify handler was called with correct args and kwargs
-        assert handler.called
-        call_args, call_kwargs = handler.call_args
-        assert call_args[0] == event_mock
-        assert call_kwargs["timeout"] == 30
+        try:
+            # Enqueue with additional args and kwargs
+            event_mock = {"test": "data"}
+            await queue.enqueue(handler, "pull_request", sample_payload, event_mock, timeout=30)
+
+            # Wait for processing
+            await asyncio.sleep(0.1)
+            await queue.queue.join()
+
+            # Verify handler was called with correct args and kwargs
+            assert handler.called
+            call_args, call_kwargs = handler.call_args
+            assert call_args[0] == event_mock
+            assert call_kwargs["timeout"] == 30
+        finally:
+            await queue.stop_workers()

Also applies to: 155-169

src/api/rate_limit.py-21-28 (1)

21-28: Guard against missing client info when building the rate‑limit key.

request.client can be None in proxy scenarios and certain test setups, which would raise an AttributeError when accessing .host. Add a safe fallback.

🛠️ Suggested fix
-    else:
-        key = f"ip:{request.client.host}"
+    else:
+        client_host = request.client.host if request.client else "unknown"
+        key = f"ip:{client_host}"
src/webhooks/handlers/push.py-30-42 (1)

30-42: Unreachable ImportError handler is dead code.

The except ImportError block (lines 36-39) is unreachable because the try block only contains log.info() and a return statement—neither of which can raise ImportError. This appears to be leftover code from a previous implementation that dynamically imported a processor.

Remove the dead code or, if dynamic processor loading is planned, add a TODO comment explaining the intent.

🧹 Proposed fix: Remove unreachable exception handler
         try:
             # Handler is thin—just logs and confirms readiness
             log.info("push_ready_for_processing")

             return WebhookResponse(status="success", detail="Push handler executed", event_type="push")

-        except ImportError:
-            # Deployment processor may not exist yet
-            log.warning("deployment_processor_not_found")
-            return WebhookResponse(status="success", detail="Push acknowledged (no processor)", event_type="push")
         except Exception as e:
             log.error("push_processing_failed", error=str(e), exc_info=True)
             return WebhookResponse(status="error", detail=f"Push processing failed: {str(e)}", event_type="push")
tests/integration/test_repo_analysis.py-60-66 (1)

60-66: Align ai_generated_rate expectation with the model default.
If HygieneMetrics.ai_generated_rate defaults to None (as in the core model), this assertion will fail; either set it to 0.0 during hygiene computation or update the test expectation.

💡 Possible test adjustment (if default is None)
-        assert metrics.ai_generated_rate == 0.0
+        assert metrics.ai_generated_rate is None
src/agents/acknowledgment_agent/agent.py-212-233 (1)

212-233: Avoid truthiness checks for required kwargs.
Empty but valid payloads (e.g., {} or []) will be rejected. Prefer is not None checks and let evaluate_acknowledgment handle content validation.

💡 Suggested fix
-        if acknowledgment_reason and violations and pr_data and commenter and rules:
+        if (
+            acknowledgment_reason is not None
+            and violations is not None
+            and pr_data is not None
+            and commenter is not None
+            and rules is not None
+        ):
tests/unit/webhooks/test_router.py-108-123 (1)

108-123: Fix inconsistent comment: states 202 but asserts 200.

The comment on line 120 says "Should return 202 for unsupported events per router logic" but the assertion on line 121 checks for status code 200. Update the comment to match the actual expected behavior.

📝 Suggested fix
-        # Should return 202 for unsupported events per router logic
-        assert response.status_code == 200
+        # Router returns 200 with "received" status for unsupported events
+        assert response.status_code == 200
src/agents/repository_analysis_agent/nodes.py-91-92 (1)

91-92: Potential KeyError if API response item lacks 'name' key.

If the GitHub API returns an item in files without a name field, this list comprehension will raise a KeyError.

Proposed defensive fix
-    file_names = [f["name"] for f in files] if files else []
+    file_names = [f.get("name") for f in files if f.get("name")] if files else []
src/agents/repository_analysis_agent/nodes.py-51-57 (1)

51-57: Inconsistent key access pattern may cause KeyError.

pr_data["number"] uses direct key access while other fields use .get() with defaults. If the API response is malformed or missing the number field, this will raise a KeyError.

Proposed fix
     return PRSignal(
-        pr_number=pr_data["number"],
+        pr_number=pr_data.get("number", 0),
         has_linked_issue=pr_data.get("has_issue_ref", False),
         author_association=pr_data.get("author_association", "NONE"),
         is_ai_generated_hint=is_ai_generated,
         lines_changed=pr_data.get("lines_changed", 0),
     )
src/integrations/github/api.py-154-161 (1)

154-161: Unreachable code after raise_for_status().

Line 161 (return []) is unreachable. When response.status != 200, raise_for_status() raises an exception, so the subsequent return [] never executes.

Proposed fix
         async with session.get(url, headers=headers) as response:
             if response.status == 200:
                 data = await response.json()
                 return data if isinstance(data, list) else [data]
 
             # Raise exception for error statuses to avoid silent failures
             response.raise_for_status()
-            return []
🧹 Nitpick comments (24)
src/rules/models.py (1)

20-28: LGTM - Clean enum definition following established patterns.

The RuleCategory enum correctly inherits from str, Enum (consistent with RuleSeverity), and the inline comments provide useful context for each category.

One observation: The Rule model has a severity: RuleSeverity field but no corresponding category: RuleCategory field. Since related agent models (RuleRecommendation) already use category, consider adding an optional field to the Rule model for consistency:

category: RuleCategory | None = None
src/rules/utils/contributors.py (1)

143-156: Minor duplication in date extraction logic.

The same date parsing pattern is repeated for commits, PRs, and issues. This is functional but could be extracted to a helper if this pattern expands.

src/rules/validators.py (1)

168-187: Use the shared glob matcher to support ** and caching.

Right now this path uses _glob_to_regex + re.match, which ignores the newer glob matcher and misses ** semantics. Reusing _matches_any keeps behavior consistent and avoids re‑implementing glob logic.

♻️ Proposed refactor
-        # Glob→regex—simple, not robust. TODO: improve for edge cases.
-        regex_pattern = FilePatternCondition._glob_to_regex(pattern)
-
-        # Pattern match—performance: optimize if file count high.
-        matching_files = [file for file in changed_files if re.match(regex_pattern, file)]
+        # Pattern match—reuse glob matcher to support ** and cached regexes.
+        has_match = any(_matches_any(file, [pattern]) for file in changed_files)
@@
-        if condition_type == "files_not_match_pattern":
-            return len(matching_files) == 0
-        else:
-            return len(matching_files) > 0
+        if condition_type == "files_not_match_pattern":
+            return not has_match
+        return has_match
src/webhooks/handlers/deployment_protection_rule.py (1)

18-27: Use structured key-value logging with structlog.

F-strings embed data into the message, which loses structured fields in JSON output. With the JSONRenderer configured in structlog, pass fields as explicit kwargs for better filtering, aggregation, and observability.

♻️ Suggested refactor
-        logger.info(f"🔄 Enqueuing deployment protection rule event for {event.repo_full_name}")
+        logger.info(
+            "🔄 Enqueuing deployment protection rule event",
+            repo_full_name=event.repo_full_name,
+        )
...
-        logger.info(f"✅ Deployment protection rule event enqueued with task ID: {task_id}")
+        logger.info(
+            "✅ Deployment protection rule event enqueued",
+            task_id=task_id,
+            repo_full_name=event.repo_full_name,
+        )
src/webhooks/handlers/check_run.py (2)

18-18: Use structured logging parameters instead of f-strings.

When using structlog, prefer keyword arguments over f-strings to get proper structured log output that can be parsed and queried by log aggregation systems.

♻️ Proposed fix
-        logger.info(f"🔄 Enqueuing check run event for {event.repo_full_name}")
+        logger.info("Enqueuing check run event", repo=event.repo_full_name)

27-27: Same structured logging issue.

Apply the same refactor here for consistent structured logging.

♻️ Proposed fix
-        logger.info(f"✅ Check run event enqueued with task ID: {task_id}")
+        logger.info("Check run event enqueued", task_id=task_id)
README.md (1)

239-244: Consider clarifying rate limit scope and section placement.

Two minor suggestions:

  1. The section is under "Contributing & Development" but describes API usage behavior. Consider moving it closer to the "Usage" or "Configuration" sections for better discoverability.

  2. Line 242: "Authenticated users are limited to 100 requests per hour" - consider clarifying whether this is per-user, per-token, or per-IP for consistency with the anonymous limit description.

src/webhooks/auth.py (1)

33-33: Use structured logging parameters for consistency.

For consistency with structlog best practices across the codebase:

♻️ Proposed fix
-        logger.warning("Received a request without the X-Hub-Signature-256 header.")
+        logger.warning("Missing webhook signature header", header="X-Hub-Signature-256")
src/api/rules.py (1)

20-26: Consider checking result.success before accessing data.

The comment at line 19 mentions the agent may throw, but per the agent's implementation, it catches exceptions and returns AgentResult with success=False. The current code doesn't check result.success, which could return misleading values (e.g., supported: False when the actual issue was an error, not infeasibility).

♻️ Proposed fix
     result = await agent.execute(rule_description=request.rule_text)

-    # Output: keep format stable for frontend. Brittle if agent changes keys.
+    if not result.success:
+        return {
+            "supported": False,
+            "snippet": "",
+            "feedback": result.message,
+            "error": True,
+        }
+
     return {
         "supported": result.data.get("is_feasible", False),
         "snippet": result.data.get("yaml_content", ""),
         "feedback": result.message,
     }
src/integrations/github/schemas.py (1)

11-26: Prefer datetime types for timestamps.

GitHub API timestamps are ISO8601 strings (e.g., "2024-01-15T10:30:45Z"). Using datetime lets Pydantic automatically parse them and avoids the manual string handling pattern currently seen in downstream code like src/rules/utils/contributors.py, which manually calls datetime.fromisoformat(date_str.replace("Z", "+00:00")).

♻️ Proposed update
-from typing import Any
+from datetime import datetime
+from typing import Any
@@
-    created_at: str | None = None
-    updated_at: str | None = None
-    pushed_at: str | None = None
+    created_at: datetime | None = None
+    updated_at: datetime | None = None
+    pushed_at: datetime | None = None

Pydantic v2.5+ automatically parses ISO8601 timestamps and serializes them back to strings for JSON compatibility, so this change is backward compatible.

src/core/config/settings.py (1)

117-119: Missing error handling for malformed environment variables.

If ANONYMOUS_RATE_LIMIT, AUTHENTICATED_RATE_LIMIT, or ANALYSIS_TIMEOUT contain non-numeric values, int() will raise an unhandled ValueError at startup. Consider wrapping these in try/except with fallback to defaults, similar to how cors config handles JSONDecodeError.

Proposed defensive parsing
-        self.anonymous_rate_limit = int(os.getenv("ANONYMOUS_RATE_LIMIT", "5"))  # Per hour
-        self.authenticated_rate_limit = int(os.getenv("AUTHENTICATED_RATE_LIMIT", "100"))  # Per hour
-        self.analysis_timeout = int(os.getenv("ANALYSIS_TIMEOUT", "60"))  # Seconds
+        self.anonymous_rate_limit = self._safe_int("ANONYMOUS_RATE_LIMIT", 5)  # Per hour
+        self.authenticated_rate_limit = self._safe_int("AUTHENTICATED_RATE_LIMIT", 100)  # Per hour
+        self.analysis_timeout = self._safe_int("ANALYSIS_TIMEOUT", 60)  # Seconds

Add a helper method:

def _safe_int(self, env_var: str, default: int) -> int:
    """Parse integer from env var with fallback to default."""
    try:
        return int(os.getenv(env_var, str(default)))
    except ValueError:
        return default
src/integrations/github/service.py (1)

50-60: Consider adding GitHub API recommended headers.

GitHub recommends setting Accept: application/vnd.github+json and optionally X-GitHub-Api-Version for API stability. This helps ensure consistent response formats.

Proposed improvement
         try:
-            async with httpx.AsyncClient(timeout=60.0) as client:
-                response = await client.get(api_url)
+            headers = {"Accept": "application/vnd.github+json"}
+            async with httpx.AsyncClient(timeout=60.0, headers=headers) as client:
+                response = await client.get(api_url)
src/api/rate_limit.py (1)

30-41: Consider evicting inactive keys to avoid unbounded in‑memory growth.

Keys never expire, so long‑running processes can grow memory with high‑cardinality IP/user traffic. Consider periodic cleanup or a bounded/TTL store.

src/webhooks/models.py (1)

1-36: Well-structured webhook models with good documentation.

The models are clean, well-documented with docstrings and field descriptions, and follow Pydantic best practices.

One consideration: the status field on WebhookResponse (line 34) is a free-form str. If the valid statuses are known and fixed (e.g., "success", "received", "error", "queued" as seen in tests and handlers), consider using Literal to provide type safety and self-documenting constraints.

♻️ Optional: Constrain status values with Literal
-from pydantic import BaseModel, Field
+from typing import Literal
+from pydantic import BaseModel, Field
+
+WebhookStatus = Literal["success", "received", "error", "queued"]


 class WebhookResponse(BaseModel):
     """Standardized response model for all webhook handlers."""

-    status: str = Field(..., description="Processing status: success, received, error")
+    status: WebhookStatus = Field(..., description="Processing status: success, received, error, queued")
     detail: str | None = Field(None, description="Additional context or error message")
     event_type: str | None = Field(None, description="Normalized GitHub event type")
tests/integration/webhooks/test_webhook_flow.py (1)

15-35: Ensure TaskQueue workers are stopped after tests.

Workers are started but never stopped, which can leave background tasks running and cause warnings or cross-test interference.

One option: async fixture teardown
-import pytest
+import pytest
+import pytest_asyncio
@@
-@pytest.fixture
-def fresh_queue() -> TaskQueue:
-    """Create a fresh task queue for testing."""
-    return TaskQueue()
+@pytest_asyncio.fixture
+async def fresh_queue() -> TaskQueue:
+    """Create a fresh task queue for testing."""
+    queue = TaskQueue()
+    yield queue
+    await queue.stop_workers()

You can also rely on queue.join() and drop fixed sleeps to reduce flakiness.

tests/unit/webhooks/test_router.py (2)

10-12: Remove unused function mock_verify_signature.

This function is defined but never used. The app fixture on line 23 uses lambda: True instead. Consider removing this dead code.

🧹 Suggested removal
-async def mock_verify_signature() -> bool:
-    """Mock verification dependency that always returns True."""
-    return True
-
-

157-177: Consider more specific assertion for structured logging validation.

The current assertion len(calls) > 0 only checks that at least one call contains "webhook_validated". A more specific assertion would verify the exact log event name.

💡 Suggested improvement
             # Verify structured logging was called
             assert mock_logger.info.called
-            # Check that webhook_validated was logged
-            calls = [call for call in mock_logger.info.call_args_list if "webhook_validated" in str(call)]
-            assert len(calls) > 0
+            # Verify webhook_validated event was logged
+            call_args = [str(call) for call in mock_logger.info.call_args_list]
+            assert any("webhook_validated" in arg for arg in call_args), f"Expected 'webhook_validated' log event, got: {call_args}"
src/api/recommendations.py (2)

156-161: Move yaml import to module level.

Importing inside the function adds overhead on each request. Move to the top of the file with other imports.

♻️ Suggested fix

At the top of the file (around line 6):

 from pydantic import BaseModel, Field, HttpUrl
+import yaml

Then remove line 157:

     # Generate rules_yaml from recommendations
-    import yaml
-
     rules_output = {"rules": [rec.model_dump(exclude_none=True) for rec in final_state.get("recommendations", [])]}

188-208: Consider using a Pydantic model for payload instead of raw dict.

Using payload: dict requires manual validation (lines 200-208). A Pydantic model would provide automatic validation, type hints, and better API documentation.

♻️ Suggested model
class ProceedWithPRRequest(BaseModel):
    """Payload for creating a PR with recommended rules."""
    repository_full_name: str = Field(..., description="Target repository in 'owner/repo' format")
    rules_yaml: str = Field(..., description="YAML content for the rules file")
    installation_id: int = Field(..., description="GitHub App installation ID")
    branch_name: str = Field("watchflow/rules", description="Branch name for the PR")


`@router.post`("/recommend/proceed-with-pr", ...)
async def proceed_with_pr(
    payload: ProceedWithPRRequest, 
    user: User | None = Depends(get_current_user_optional)
):
    # Manual validation no longer needed
    ...
src/agents/repository_analysis_agent/models.py (1)

9-24: Broad exception handling may hide errors.

The except Exception on line 22 catches all exceptions including programming errors. Consider catching more specific exceptions or logging the error.

💡 Suggested improvement
     try:
         p = parse(identifier)
         if p.valid and p.owner and p.repo:
             return f"{p.owner}/{p.repo}"
         # Fallback for simple "owner/repo" strings that might fail strict URL parsing
         if "/" in identifier and not identifier.startswith(("http", "git@")):
             return identifier.strip().strip("/")
         return identifier
-    except Exception:
+    except (ValueError, AttributeError, TypeError) as e:
         # If parsing fails completely, return original to let validator catch it later
+        import structlog
+        structlog.get_logger().debug("parse_github_repo_identifier_fallback", identifier=identifier, error=str(e))
         return identifier
src/integrations/github/api.py (4)

19-41: Remove unused module-level GraphQL query.

This _PR_HYGIENE_QUERY definition is never used. The fetch_pr_hygiene_stats method at line 1157 defines its own local version with additional fields (mergedAt, additions, deletions, author, detailed reviews, files). The module-level definition is dead code.

Proposed fix - remove dead code
-_PR_HYGIENE_QUERY = """
-query PRHygiene($owner: String!, $repo: String!) {
-  repository(owner: $owner, name: $repo) {
-    pullRequests(last: 20, states: [MERGED, CLOSED]) {
-      nodes {
-        number
-        title
-        body
-        changedFiles
-        comments {
-          totalCount
-        }
-        closingIssuesReferences(first: 1) {
-          totalCount
-        }
-        reviews(first: 1) {
-          totalCount
-        }
-      }
-    }
-  }
-}
-"""
-

276-285: Accessing internal _loop attribute is fragile.

self._session._loop is an internal/private attribute of aiohttp.ClientSession. While the comment acknowledges avoiding the deprecated .loop property, accessing _loop directly may break in future aiohttp versions without notice.

Consider using a safer approach that doesn't rely on internal attributes:

Alternative approach
-                try:
-                    current_loop = asyncio.get_running_loop()
-                    # Try to access session's internal loop to check if it's the same
-                    # If the session's loop is closed, this will fail
-                    if self._session._loop != current_loop or self._session._loop.is_closed():
-                        await self._session.close()
-                        self._session = aiohttp.ClientSession()
-                except RuntimeError:
-                    # No running loop or loop is closed, recreate session
-                    self._session = aiohttp.ClientSession()
+                # Safest approach: recreate session if event loop changed
+                # This is common in test environments
+                try:
+                    # Attempt a trivial operation to verify the session is usable
+                    _ = asyncio.get_running_loop()
+                except RuntimeError:
+                    # No running loop, recreate session
+                    self._session = aiohttp.ClientSession()

1090-1117: GraphQL API typically requires authentication.

The GitHub GraphQL API requires authentication for most queries. Using allow_anonymous=True at line 1110 may cause unexpected failures. Additionally, line 1117 raises a generic Exception rather than a more specific type.

Consider:

  1. Requiring authentication for GraphQL queries (most queries need it)
  2. Using a custom exception type for clarity
Proposed improvements
-        headers = await self._get_auth_headers(allow_anonymous=True)
+        # GraphQL API requires authentication for most queries
+        headers = await self._get_auth_headers(allow_anonymous=False)
         if not headers:
-            # Fallback or error? GraphQL usually demands auth.
-            # If we have no headers, we likely can't query GraphQL successfully for many fields.
-            # We'll try with empty headers if that's what _get_auth_headers returns (it returns None on failure).
-            # If None, we can't proceed.
             logger.error("GraphQL execution failed: No authentication headers available.")
-            raise Exception("Authentication required for GraphQL query.")
+            raise GitHubGraphQLError([{"message": "Authentication required for GraphQL query."}])

10-17: Inconsistent logging: mixing logging and structlog.

The module uses standard logging (line 17) but also imports and uses structlog (line 10, line 983). Creating a new structlog logger inside fetch_recent_pull_requests leads to inconsistent log formatting across the module.

Proposed fix - use structlog consistently
 import structlog
 from cachetools import TTLCache
 from tenacity import retry, stop_after_attempt, wait_exponential

 from src.core.config import config
 from src.core.errors import GitHubGraphQLError

-logger = logging.getLogger(__name__)
+logger = structlog.get_logger(__name__)

And remove the local logger creation in fetch_recent_pull_requests:

     async def fetch_recent_pull_requests(
         ...
     ) -> list[dict[str, Any]]:
         ...
-
-        logger = structlog.get_logger()
-
         try:

Also applies to: 983-984

Comment on lines 17 to +18
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.9.9
rev: v0.3.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

latest version of ruff linter 2025 2026

💡 Result:

As of January 24, 2026:

  • Latest Ruff release (GitHub): 0.14.14, released January 22, 2026. [1]
  • Latest Ruff version on PyPI: 0.14.13, uploaded January 15, 2026. [2]

Sources: [1] [2]


Update Ruff to a current version—v0.3.0 is outdated.

The pinned version v0.3.0 is approximately two years old (from early 2024). The current latest version is v0.14.14 (released January 22, 2026). This appears to be a downgrade or typo. Update to at least v0.14.13 or v0.14.14 to benefit from bug fixes, security updates, and improvements across 11+ minor versions.

🤖 Prompt for AI Agents
In @.pre-commit-config.yaml around lines 17 - 18, The pre-commit entry for the
Ruff hook is pinned to an outdated revision (repo:
https://github.com/astral-sh/ruff-pre-commit with rev: v0.3.0); update the rev
to a current release (preferably v0.14.14, or at minimum v0.14.13) in the
.pre-commit-config.yaml and then run pre-commit autoupdate or reinstall hooks to
ensure the newer Ruff version is used.

Comment on lines 23 to 41
async def get_current_user_optional(request: Request) -> User | None:
"""
Auth check—don't fail if missing. Critical for public repo support (Phase 1).
"""
auth_header = request.headers.get("Authorization")

if not auth_header:
return None

try:
# Token extraction—fragile if header format changes.
scheme, token = auth_header.split()
if scheme.lower() != "bearer":
return None

# TODO: Wire to real IdP (Supabase/Auth0). For now, fake user if token present. WARNING: Must verify signature in prod.
return User(id=123, username="authenticated_user", email="user@example.com", github_token=token)
except Exception as e:
logger.warning(f"Failed to parse auth header: {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bearer token is accepted without verification (auth bypass).

Any request with a Bearer header becomes authenticated, which allows unauthorized access to protected endpoints. This must validate signatures/claims (or reject tokens) before creating a User.

🤖 Prompt for AI Agents
In `@src/api/dependencies.py` around lines 23 - 41, The get_current_user_optional
function currently accepts any Bearer token and returns a fake User, enabling
auth bypass; replace this by validating the token before constructing a User:
call a token verification helper (e.g., verify_token or integrate with your IdP
client) from get_current_user_optional to validate signature and claims, reject
invalid/malformed tokens by returning None and logging the failure, and only
create and return a User (or map claims to User) when verification succeeds;
ensure exceptions from verification are caught, logged via logger.warning, and
result in None instead of a fake authenticated user.

Comment on lines +89 to +94
commits = [
Commit(
oid=node["commit"]["oid"],
message=node["commit"]["message"],
author=node["commit"].get("author", {}).get("name", "Unknown"),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against null commit author to avoid AttributeError.

If GraphQL returns author: null, .get("author", {}) returns None and .get("name") crashes.

Proposed fix
-                        author=node["commit"].get("author", {}).get("name", "Unknown"),
+                        author=(node["commit"].get("author") or {}).get("name", "Unknown"),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
commits = [
Commit(
oid=node["commit"]["oid"],
message=node["commit"]["message"],
author=node["commit"].get("author", {}).get("name", "Unknown"),
)
commits = [
Commit(
oid=node["commit"]["oid"],
message=node["commit"]["message"],
author=(node["commit"].get("author") or {}).get("name", "Unknown"),
)
🤖 Prompt for AI Agents
In `@src/integrations/github/graphql_client.py` around lines 89 - 94, The list
comprehension building Commit instances can crash when GraphQL returns author:
null because node["commit"].get("author", {}) will be None; update the Commit
construction (where Commit(...) is created) to first coerce the author dict to a
safe mapping (e.g., author_info = node["commit"].get("author") or {}) and then
use author_info.get("name", "Unknown") for the author field so it never raises
an AttributeError.

Comment on lines 43 to 61
async def enqueue(
self,
func: Callable[..., Coroutine[Any, Any, Any]],
event_type: str,
payload: dict[str, Any],
*args: Any,
**kwargs: Any,
) -> bool:
"""Adds a task to the queue if it is not a duplicate."""
task_id = self._generate_task_id(event_type, payload)

if task_id in self.processed_hashes:
logger.info("task_skipped_duplicate", task_id=task_id, event_type=event_type)
return False

task = Task(task_id=task_id, event_type=event_type, payload=payload, func=func, args=args, kwargs=kwargs)
await self.queue.put(task)
self.processed_hashes.add(task_id)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dedup entries never expire and are marked before execution.

This can (a) grow memory unbounded with unique events and (b) permanently drop retries if a task fails because the hash is already marked processed. Consider deferring “processed” marking until success and adding TTL/eviction for old hashes.

🤖 Prompt for AI Agents
In `@src/tasks/task_queue.py` around lines 43 - 61, The current enqueue method
adds task_id to processed_hashes immediately which both leaks memory and
prevents retries on failure; change the behavior so enqueue (in enqueue and
using _generate_task_id, Task, queue.put) only checks processed_hashes but does
NOT add the id—move the addition to the successful completion path in the
worker/executor that runs the Task (i.e., after the task.func finishes without
error) so failed tasks can be retried. Also replace or augment processed_hashes
with a TTL/evicting structure (e.g., an async-safe TTL cache or an OrderedDict
with timestamp eviction) to bound memory growth and periodically prune expired
entries; ensure all access to processed_hashes is concurrency-safe in the async
executor and enqueue.

Comment on lines 36 to 39
# Enqueue: async, may fail if queue overloaded.
task_id = await task_queue.enqueue(
event_type="deployment", repo_full_name=repo_full_name, installation_id=installation_id, payload=payload
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

enqueue call no longer matches TaskQueue API (missing handler + return type).

TaskQueue.enqueue now requires a handler function and returns bool, but this call provides no func and treats the return as task_id. This will fail at runtime and returns the wrong response shape.

🤖 Prompt for AI Agents
In `@src/webhooks/handlers/deployment.py` around lines 36 - 39, The current call
to TaskQueue.enqueue incorrectly omits the required handler argument and
misinterprets its boolean return as task_id; update the call to pass the
appropriate handler function (e.g., deployment_handler or the actual handler
used for deployment tasks) as func and treat the return value as a success flag
(bool) instead of an ID: call await task_queue.enqueue(func=deployment_handler,
event_type="deployment", repo_full_name=repo_full_name,
installation_id=installation_id, payload=payload), then check the boolean result
and if true proceed to build/return the correct response (generate or fetch a
real task_id separately if your API requires one) and if false log/return an
error (e.g., 503) — reference TaskQueue.enqueue and the local variable task_id
(replace its usage with a success check or a real id lookup).

…improvements

- Codeowner bypass rate calculation
- Issue-diff mismatch detection
- require_linked_issue validator
- Rate limit handling with warnings
- New API endpoints: auth/validate-token, repos/installation
- Standardize responses (rule_yaml)
- Refactor prompts (concise, schema-driven)
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Fix all issues with AI agents
In `@src/agents/repository_analysis_agent/nodes.py`:
- Around line 356-383: The regex in variable file_pattern and the handling of
matches is returning only the captured extension (due to the capturing group) so
mentioned_files contains extensions instead of full paths; update file_pattern
to use a non-capturing group (e.g., replace "(py|ts|...)" with "(?:py|ts|...)"
or otherwise ensure the full match is returned), call re.findall to get
full-path matches, and simplify the population of mentioned_files by extending
with matches directly (remove the m[0] tuple handling); keep the subsequent
comparison logic that uses mentioned_files and changed_file_names and leave
issue_diff_mismatch_count increments unchanged.

In `@src/api/auth.py`:
- Around line 91-96: The except block currently returns an HTTPException with
the raw exception string (detail=f"Failed to validate token: {str(e)}"), which
can leak sensitive info; change the HTTPException detail to a generic message
(e.g., "Failed to validate token") while keeping the full exception logged via
logger.exception (reference symbols: logger.exception, HTTPException,
status.HTTP_500_INTERNAL_SERVER_ERROR) so internal logs preserve details but
clients receive no sensitive information.
- Around line 64-67: The HTTPException currently includes raw error_text from
GitHub in the response; change it to return a generic client-facing message
(e.g., "Failed to validate token") and move the raw error_text to server-side
logs instead: call the existing logger (e.g., logger.error or app_logger.error)
to record error_text and any context before raising the HTTPException without
the sensitive details, keeping the
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR and referencing the same raise
HTTPException(...) site and the error_text variable for logging.

In `@src/api/rate_limit.py`:
- Around line 27-32: The current rate-limit key logic can raise AttributeError
when request.client is None and incorrectly uses "user:None" when user.email is
None; update the branch in the rate limiting code to only treat requests as
authenticated when user and user.email are truthy (use user.email explicitly),
and compute client_host defensively (e.g., client_host = request.client.host if
request.client and getattr(request.client, "host", None) else
request.headers.get("x-forwarded-for", "unknown")), then set key =
f"user:{user.email}" and limit = AUTH_LIMIT only when user.email is present,
otherwise set key = f"ip:{client_host}" and limit = ANON_LIMIT; reference the
variables/user checks and AUTH_LIMIT/ANON_LIMIT in the change.

In `@src/api/recommendations.py`:
- Around line 448-451: The log currently records raw client IP via
request.client.host into client_ip and passes it to
logger.info("analysis_requested", ...), which may be PII; change this by either
omitting the raw IP or replacing it with a non-reversible identifier (e.g., a
salted hash/HMAC using an application secret) before logging. Locate the
client_ip assignment and the logger.info call in recommendations.py (references:
client_ip, request.client.host, user.email, logger.info("analysis_requested"))
and update the code to compute and log a hashed_ip or simply exclude the IP
field based on a config flag (e.g., LOG_RAW_IPS=false), ensuring the hashing
uses a stable secret so the identifier can be correlated without exposing the
plain IP.

In `@src/api/repos.py`:
- Around line 40-45: The handler currently exposes internal exception text in
the API response; keep detailed info in logs but return a generic message to
clients—remove str(e) from the HTTPException.detail and replace with a
non-sensitive string like "Failed to check installation" (or similar), while
retaining logger.exception("installation_check_failed", repo=repo_full_name,
error=str(e)) or better: keep logger.exception("installation_check_failed",
repo=repo_full_name) so the full exception is recorded server-side; update the
raise HTTPException call (status.HTTP_500_INTERNAL_SERVER_ERROR) to use the
generic detail and do not include the exception string.

In `@src/integrations/github/api.py`:
- Around line 1098-1109: The except block is catching httpx.HTTPStatusError in
aiohttp-based code; change the exception type to aiohttp.ClientResponseError and
log the correct status attribute (use e.status instead of
e.response.status_code) so the logger call in this handler (where
logger.error("pr_fetch_http_error", repo=repo_full_name, status_code=...,
error_type="network_error", error=str(e))) reports the proper aiohttp error
details; keep the existing fallback except Exception block unchanged.
- Around line 153-160: The code after response.raise_for_status() is
unreachable; update the async GET block (the "async with session.get(...)" scope
that calls response.raise_for_status()) to either remove the unreachable "return
[]" or catch the HTTP error and return [] for non-200 responses to match other
methods in this class — e.g., wrap response.raise_for_status() in a try/except
for aiohttp.ClientResponseError (or a broad Exception), logging or swallowing
the error and returning [] from the function instead of leaving a dead return
statement.

In `@src/rules/validators.py`:
- Around line 988-1049: In validate (async def validate), guard and normalize
the commits variable before iterating: when obtaining commits from
event.get("commits", []) or pull_request.get("commits", []), check
isinstance(commits, list) (or iterable of dicts) and if not, set commits = []
(or skip commit scanning) to avoid iterating an integer commit count from
pull_request; ensure the commit_message loop only runs when commits is a list of
dicts with "message" keys.

In `@src/tasks/task_queue.py`:
- Around line 68-79: The worker currently logs failures and drops tasks; modify
_worker to support bounded retries and a dead-letter queue: add a retry counter
on the Task (e.g., task.retries and task.max_retries) or maintain a separate
retry map, and on exception increment task.retries and if retries < max_retries
re-enqueue the task (optionally with a small backoff) otherwise push the task
into a dead_letter_queue (e.g., self.dead_letter_queue) for inspection; ensure
queue.task_done() is still called once per dequeue and include task identifiers
(task.task_id) when moving to dead-letter so failed work can be recovered or
inspected later.
♻️ Duplicate comments (5)
src/tasks/task_queue.py (1)

46-66: Deduplication concerns already flagged.

The issues with marking processed_hashes before task execution (blocking retries on failure) and unbounded memory growth have been raised in a previous review. Please address the earlier feedback.

src/api/dependencies.py (1)

38-40: Auth bypass: Any Bearer token creates an authenticated user without validation.

This returns a hardcoded User(id=123, username="authenticated_user") for any Bearer token, which allows unauthorized access. While the comment notes this is intentional for the "open-source version," the fabricated user identity could cause issues in audit logs and any user-specific logic downstream.

Consider at minimum validating the token against GitHub's /user endpoint (similar to validate_token in src/api/auth.py) to populate real user data, or clearly documenting this limitation in API documentation.

src/integrations/github/api.py (1)

1057-1060: Critical: _detect_issue_references method is called but not defined.

Line 1060 calls self._detect_issue_references(body, title), but this method does not exist in the GitHubClient class. This will raise an AttributeError at runtime when fetch_recent_pull_requests processes merged PRs.

🐛 Proposed fix - add the missing method
+    def _detect_issue_references(self, body: str, title: str) -> bool:
+        """
+        Detect if PR body or title contains issue references.
+        
+        Looks for common patterns like:
+        - `#123`, fixes `#123`, closes `#123`
+        - owner/repo#123
+        """
+        import re
+        combined_text = f"{title} {body}".lower()
+        # Pattern matches: `#number` or owner/repo#number, optionally preceded by keywords
+        pattern = r'(?:(?:fix(?:es|ed)?|clos(?:es|ed)?|resolv(?:es|ed)?)\s+)?(?:[\w-]+/[\w-]+)?#\d+'
+        return bool(re.search(pattern, combined_text, re.IGNORECASE))
+
     async def fetch_recent_pull_requests(
src/main.py (1)

116-122: CORS configuration with wildcard origins and credentials is invalid.

This was flagged in a previous review. Per the Fetch specification, Access-Control-Allow-Origin: * cannot be used with Access-Control-Allow-Credentials: true. Browsers will block credentialed requests with this configuration.

src/rules/validators.py (1)

238-243: has_min_approvals always passes.
This validator is registered and currently returns True unconditionally, so approval gating is bypassed. Implement the approval count (or delegate to MinApprovalsCondition).

🐛 Proposed fix
-        # Unused: min_approvals—left for future logic.
-        # min_approvals = parameters.get("min_approvals", 1)
-
-        # TODO: Check actual PR reviews. Always returns True—demo only.
-        return True
+        min_approvals = parameters.get("min_approvals", 1)
+        reviews = event.get("reviews", [])
+        approved_count = sum(1 for review in reviews if review.get("state") == "APPROVED")
+        logger.debug(
+            "ApprovalCountCondition: PR has %d approvals, requires %d",
+            approved_count,
+            min_approvals,
+        )
+        return approved_count >= min_approvals
🧹 Nitpick comments (18)
src/event_processors/pull_request.py (1)

319-327: URL‑encode query params when building the landing URL.

Encoding avoids malformed links and keeps the generated URLs robust as inputs evolve.

♻️ Proposed fix
@@
-import time
+import time
+import urllib.parse
@@
-                landing_url = "https://watchflow.dev"
-                if repo_full_name and installation_id:
-                    landing_url = (
-                        f"https://watchflow.dev/analyze?installation_id={installation_id}&repo={repo_full_name}"
-                    )
-                elif repo_full_name:
-                    landing_url = f"https://watchflow.dev/analyze?repo={repo_full_name}"
+                landing_url = "https://watchflow.dev"
+                if repo_full_name and installation_id:
+                    query = urllib.parse.urlencode(
+                        {"installation_id": installation_id, "repo": repo_full_name}
+                    )
+                    landing_url = f"https://watchflow.dev/analyze?{query}"
+                elif repo_full_name:
+                    query = urllib.parse.urlencode({"repo": repo_full_name})
+                    landing_url = f"https://watchflow.dev/analyze?{query}"
src/tasks/task_queue.py (3)

19-19: The | Any union defeats type safety.

The type Callable[..., Coroutine[Any, Any, Any]] | Any accepts any value, making the annotation meaningless. If the intent is to accept async callables, remove | Any.

Suggested fix
-    func: Callable[..., Coroutine[Any, Any, Any]] | Any = Field(..., description="Handler function to execute")
+    func: Callable[..., Coroutine[Any, Any, Any]] = Field(..., description="Handler function to execute")

40-44: json.dumps may fail on non-serializable payload values.

If payload contains non-JSON-serializable types (e.g., datetime, bytes, custom objects), json.dumps will raise TypeError. Consider wrapping in a try/except with a fallback (e.g., repr(payload)) or use a custom encoder.

Suggested defensive approach
 def _generate_task_id(self, event_type: str, payload: dict[str, Any]) -> str:
     """Creates a unique hash for deduplication."""
-    payload_str = json.dumps(payload, sort_keys=True)
+    try:
+        payload_str = json.dumps(payload, sort_keys=True, default=str)
+    except (TypeError, ValueError):
+        payload_str = repr(payload)
     raw_string = f"{event_type}:{payload_str}"
     return hashlib.sha256(raw_string.encode()).hexdigest()

89-96: Pending tasks are orphaned when workers stop.

stop_workers cancels workers immediately without draining the queue. Any tasks still in self.queue are lost. For graceful shutdown, consider awaiting self.queue.join() with a timeout before cancelling workers.

Suggested graceful shutdown
 async def stop_workers(self) -> None:
-    """Stops the background workers."""
+    """Stops the background workers after draining the queue."""
     if self.workers:
+        # Wait for queue to drain (with timeout)
+        try:
+            await asyncio.wait_for(self.queue.join(), timeout=30.0)
+        except asyncio.TimeoutError:
+            logger.warning("task_queue_drain_timeout", pending=self.queue.qsize())
         for task in self.workers:
             task.cancel()
         await asyncio.gather(*self.workers, return_exceptions=True)
         self.workers.clear()
         logger.info("task_queue_workers_stopped")
src/api/rate_limit.py (1)

34-45: Memory leak: Empty rate limit entries are never removed.

After filtering expired timestamps, if the list is empty, the key remains in _RATE_LIMIT_STORE. For high-traffic anonymous endpoints with many unique IPs, this can lead to unbounded memory growth until restart.

♻️ Proposed fix
     timestamps = _RATE_LIMIT_STORE.get(key, [])
     # Remove timestamps outside the window
     timestamps = [ts for ts in timestamps if now - ts < WINDOW]
+    
+    # Clean up empty entries to prevent memory leak
+    if not timestamps:
+        _RATE_LIMIT_STORE.pop(key, None)
+    
     if len(timestamps) >= limit:
         retry_after = int(WINDOW - (now - min(timestamps)))
         raise HTTPException(
             status_code=status.HTTP_429_TOO_MANY_REQUESTS,
             detail=f"Rate limit exceeded. Try again in {retry_after} seconds.",
             headers={"Retry-After": str(retry_after)},
         )
     timestamps.append(now)
     _RATE_LIMIT_STORE[key] = timestamps
src/api/auth.py (1)

51-51: Accessing private method _get_session() creates fragile coupling.

Using github_client._get_session() directly bypasses the public API and couples this code to internal implementation details. If GitHubClient refactors session management, this code will break.

♻️ Suggested alternatives

Consider either:

  1. Exposing a public method on GitHubClient for token validation
  2. Using httpx.AsyncClient directly here since this is a one-off validation call
# Option 2: Direct httpx usage
async with httpx.AsyncClient(timeout=30.0) as client:
    async with client.get(url, headers=headers) as response:
        ...
src/integrations/github/api.py (2)

274-284: Accessing private _loop attribute is fragile.

Directly accessing self._session._loop relies on aiohttp's internal implementation. While the try/except provides a fallback, this could break silently with aiohttp updates.

Consider a simpler approach that doesn't rely on internals:

# Simpler: always recreate session if we can't verify it's usable
try:
    if self._session is None or self._session.closed:
        self._session = aiohttp.ClientSession()
except Exception:
    self._session = aiohttp.ClientSession()

1199-1289: Significant code duplication between _PR_HYGIENE_QUERY_20 and _PR_HYGIENE_QUERY_10.

These two GraphQL queries are nearly identical, differing only in last: 20 vs last: 10. Consider parameterizing the query or using a template.

♻️ Proposed refactor
def _build_pr_hygiene_query(limit: int) -> str:
    return f"""
    query PRHygiene($owner: String!, $repo: String!) {{
      repository(owner: $owner, name: $repo) {{
        pullRequests(last: {limit}, states: [MERGED, CLOSED]) {{
          nodes {{
            number
            title
            body
            changedFiles
            mergedAt
            additions
            deletions
            author {{ login }}
            authorAssociation
            comments {{ totalCount }}
            closingIssuesReferences(first: 1) {{
              totalCount
              nodes {{ title }}
            }}
            reviews(first: 10) {{
              nodes {{ state, author {{ login }} }}
            }}
            files(first: 10) {{
              edges {{ node {{ path }} }}
            }}
          }}
        }}
      }}
    }}
    """

Then call _build_pr_hygiene_query(20) or _build_pr_hygiene_query(10).

src/api/repos.py (1)

34-36: Stub implementation always returns installed: False.

This endpoint currently returns a hardcoded response without actually checking installation status. The TODO indicates this needs implementation via the GitHub App API.

Would you like me to help implement the actual installation check using the /app/installations endpoint, or should I open an issue to track this work?

src/agents/repository_analysis_agent/agent.py (1)

27-57: Docstring describes 3 steps but graph registers 5 nodes.

The docstring mentions a 3-step flow (fetch_metadata, fetch_pr_signals, generate_rules) but the actual graph has 5 nodes including generate_report and generate_reasonings. Consider updating the docstring to reflect the actual flow.

Suggested docstring update
     def _build_graph(self) -> Any:
         """
         Constructs the state graph for the analysis workflow.

         Flow:
-        1. `fetch_metadata`: Gathers static repo facts (languages, file structure).
-        2. `fetch_pr_signals`: Analyzes dynamic history (PR hygiene, AI usage).
-        3. `generate_rules`: Synthesizes data into governance recommendations.
+        1. `fetch_metadata`: Gathers static repo facts (languages, file structure).
+        2. `fetch_pr_signals`: Analyzes dynamic history (PR hygiene, AI usage).
+        3. `generate_report`: Diagnoses problems based on hygiene metrics.
+        4. `generate_rules`: Prescribes governance rules to address problems.
+        5. `generate_reasonings`: Explains why each rule was recommended.

         Returns:
             Compiled StateGraph ready for execution.
         """
src/main.py (1)

148-162: Multiple iterations over task collection can be optimized.

The current implementation iterates over tasks four times (once per status count). For large task queues, a single pass would be more efficient.

Suggested optimization
 `@app.get`("/health/tasks", tags=["Health Check"])
 async def health_tasks():
     """Check the status of background tasks."""
-    tasks = task_queue.tasks.values()
-    pending_count = sum(1 for t in tasks if t.status.value == "pending")
-    running_count = sum(1 for t in tasks if t.status.value == "running")
-    completed_count = sum(1 for t in tasks if t.status.value == "completed")
-    failed_count = sum(1 for t in tasks if t.status.value == "failed")
+    tasks = list(task_queue.tasks.values())
+    counts = {"pending": 0, "running": 0, "completed": 0, "failed": 0}
+    for t in tasks:
+        status = t.status.value
+        if status in counts:
+            counts[status] += 1

     return {
         "task_queue_status": "running",
         "workers": len(task_queue.workers),
         "tasks": {
-            "pending": pending_count,
-            "running": running_count,
-            "completed": completed_count,
-            "failed": failed_count,
+            **counts,
             "total": len(tasks),
         },
     }
src/agents/repository_analysis_agent/models.py (3)

11-26: Silent exception handling may mask parsing issues.

The except Exception at Line 24 silently catches all exceptions and returns the original identifier. While the comment explains this is intentional, consider logging a warning to aid debugging when parsing fails unexpectedly.

Suggested improvement
+import structlog
+
+logger = structlog.get_logger()
+
 def parse_github_repo_identifier(identifier: str) -> str:
     """
     Normalizes various GitHub identifiers into 'owner/repo' format using giturlparse.
     Used by tests to verify repository strings.
     """
     try:
         p = parse(identifier)
         if p.valid and p.owner and p.repo:
             return f"{p.owner}/{p.repo}"
         # Fallback for simple "owner/repo" strings that might fail strict URL parsing
         if "/" in identifier and not identifier.startswith(("http", "git@")):
             return identifier.strip().strip("/")
         return identifier
     except Exception:
-        # If parsing fails completely, return original to let validator catch it later
+        # If parsing fails completely, return original to let validator catch it later
+        logger.debug("github_identifier_parse_failed", identifier=identifier)
         return identifier

90-100: Consider adding validation for severity field.

The severity field accepts any string but should be constrained to valid values: "low", "medium", "high", or "critical".

Suggested validation
+from typing import Literal
+
 class RuleRecommendation(BaseModel):
     """
     Represents a single rule suggested by the AI.
     Contains only fields that go into rules.yaml file.
     """

     description: str = Field(..., description="What the rule does")
     enabled: bool = Field(True, description="Whether the rule is enabled")
-    severity: str = Field("medium", description="low, medium, high, or critical")
+    severity: Literal["low", "medium", "high", "critical"] = Field(
+        "medium", description="Rule severity level"
+    )
     event_types: list[str] = Field(..., description="Event types this rule applies to (e.g., ['pull_request'])")
     parameters: dict[str, Any] = Field(default_factory=dict, description="Rule parameters for validators")

136-138: Consider excluding user_token from serialization to prevent accidental exposure.

The user_token field contains a sensitive GitHub Personal Access Token. While it's needed during execution, it could be accidentally logged or serialized. Consider using Pydantic's SecretStr or exclude=True.

Suggested improvement
+from pydantic import SecretStr
+
 class AnalysisState(BaseModel):
     # --- Inputs ---
     repo_full_name: str
     is_public: bool = False
-    user_token: str | None = Field(
-        None, description="Optional GitHub Personal Access Token for authenticated API requests"
+    user_token: str | None = Field(
+        None, 
+        description="Optional GitHub Personal Access Token for authenticated API requests",
+        exclude=True,  # Prevent accidental serialization/logging
     )
src/api/recommendations.py (2)

275-289: Potential issue with AI Generated Rate metric handling.

The explanation lambda at Line 286 attempts to format v even when v might be None. While the condition at Line 287 checks for None, the else clause string doesn't use v, but the structure is error-prone. The current logic works but is confusing.

Suggested clarification
         {
             "name": "AI Generated Rate",
             "key": "ai_generated_rate",
             "category": "Quality",
             "thresholds": {"high": 0.2, "medium": 0.1},
-            "explanation": lambda v: (
+            "explanation": lambda v: (
+                "AI detection not available."
+                if v is None
+                else (
                 f"High AI-generated content ({v:.0%}) may indicate low-effort contributions."
                 if v >= 0.2
                 else f"Some AI-generated content detected ({v:.0%}). Consider review processes."
                 if v >= 0.1
                 else "Low AI-generated content rate. Contributions appear human-authored."
+                )
             )
-            if v is not None
-            else "AI detection not available.",
         },

510-510: Move yaml import to module level.

Importing yaml inside the function on every request is inefficient. Move it to the module-level imports.

Suggested fix
 # At module level (around line 1-7)
+import yaml
 from typing import Any

 import structlog
 ...

 # Remove from inside recommend_rules function
-    # Step 6: Success—map agent state to the API response model.
-    final_state = result.data  # The agent's execute method returns the final state
-
-    # Generate rules_yaml from recommendations
-    # RuleRecommendation now includes all required fields directly
-    import yaml
+    # Step 6: Success—map agent state to the API response model.
+    final_state = result.data  # The agent's execute method returns the final state
src/agents/repository_analysis_agent/nodes.py (1)

164-172: Remove or reduce debug logging statements.

Lines 165-167 and 172 contain debug logging that may be too verbose for production. Consider removing or changing to logger.debug().

Suggested cleanup
         owner, repo_name = repo.split("/")
-        logger.info("debug_split_success", owner=owner, repo_name=repo_name)
-        logger.info("debug_client_type", client_type=str(type(github_client)))

         # Use user_token if provided for authenticated requests (higher rate limits).
         pr_nodes, pr_warning = await github_client.fetch_pr_hygiene_stats(
             owner=owner, repo=repo_name, user_token=state.user_token, installation_id=None
         )
-        logger.info("debug_pr_nodes_fetched", count=len(pr_nodes))
+        logger.debug("pr_nodes_fetched", count=len(pr_nodes))
src/rules/validators.py (1)

168-206: Prefer _matches_any to avoid over‑matching across directories.
_glob_to_regex maps * to .*, which matches / and can make patterns like src/*.py match nested paths. You already have _matches_any with proper * vs ** semantics and caching—using it here improves correctness and consistency.

♻️ Proposed refactor
-        regex_pattern = FilePatternCondition._glob_to_regex(pattern)
-
-        # Pattern match—performance: optimize if file count high.
-        matching_files = [file for file in changed_files if re.match(regex_pattern, file)]
+        # Pattern match—reuse shared glob matcher for correct ** semantics.
+        matching_files = [file for file in changed_files if _matches_any(file, [pattern])]
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9f74b4 and 6a426e2.

📒 Files selected for processing (16)
  • src/agents/repository_analysis_agent/agent.py
  • src/agents/repository_analysis_agent/models.py
  • src/agents/repository_analysis_agent/nodes.py
  • src/agents/repository_analysis_agent/prompts.py
  • src/api/auth.py
  • src/api/dependencies.py
  • src/api/errors.py
  • src/api/rate_limit.py
  • src/api/recommendations.py
  • src/api/repos.py
  • src/api/rules.py
  • src/event_processors/pull_request.py
  • src/integrations/github/api.py
  • src/main.py
  • src/rules/validators.py
  • src/tasks/task_queue.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/api/rules.py
🧰 Additional context used
🧬 Code graph analysis (10)
src/agents/repository_analysis_agent/models.py (1)
src/core/models.py (2)
  • HygieneMetrics (66-123)
  • repo_full_name (56-58)
src/api/repos.py (3)
src/core/utils/caching.py (1)
  • get (44-66)
src/core/models.py (1)
  • repo_full_name (56-58)
src/integrations/github/api.py (1)
  • get_repository (126-140)
src/main.py (4)
src/webhooks/handlers/deployment_protection_rule.py (2)
  • DeploymentProtectionRuleEventHandler (10-33)
  • handle (16-33)
src/webhooks/dispatcher.py (1)
  • register_handler (24-27)
src/core/models.py (1)
  • EventType (24-38)
src/tasks/scheduler/deployment_scheduler.py (1)
  • get_deployment_scheduler (365-370)
src/agents/repository_analysis_agent/agent.py (3)
src/agents/base.py (5)
  • AgentResult (16-29)
  • BaseAgent (32-119)
  • _build_graph (55-60)
  • execute (114-119)
  • _execute_with_timeout (93-111)
src/agents/repository_analysis_agent/models.py (1)
  • AnalysisState (128-164)
src/agents/repository_analysis_agent/nodes.py (1)
  • fetch_repository_metadata (28-150)
src/api/auth.py (2)
src/integrations/github/api.py (1)
  • _get_session (262-288)
src/core/utils/caching.py (1)
  • get (44-66)
src/tasks/task_queue.py (2)
src/webhooks/handlers/issue_comment.py (1)
  • event_type (19-20)
tests/unit/tasks/test_queue.py (1)
  • queue (13-15)
src/api/dependencies.py (2)
src/core/models.py (1)
  • User (7-18)
src/integrations/github/service.py (1)
  • GitHubService (15-185)
src/event_processors/pull_request.py (2)
src/event_processors/push.py (1)
  • _format_check_run_output (184-228)
src/core/models.py (1)
  • repo_full_name (56-58)
src/api/recommendations.py (4)
src/api/dependencies.py (1)
  • get_current_user_optional (23-43)
src/api/rate_limit.py (1)
  • rate_limiter (25-45)
src/core/models.py (2)
  • User (7-18)
  • repo_full_name (56-58)
src/integrations/github/api.py (5)
  • get_repository (126-140)
  • get_git_ref_sha (831-845)
  • create_git_ref (847-883)
  • create_or_update_file (913-961)
  • create_pull_request (963-995)
src/integrations/github/api.py (2)
src/core/errors.py (1)
  • GitHubGraphQLError (6-11)
src/core/models.py (1)
  • repo_full_name (56-58)
🪛 Gitleaks (8.30.0)
src/integrations/github/api.py

[high] 131-131: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 147-147: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🔇 Additional comments (30)
src/event_processors/pull_request.py (1)

288-288: Context-aware check output is a good enhancement.

Passing repo and installation context into the formatter makes the check output more actionable and consistent with the new guidance flow.

src/tasks/task_queue.py (2)

1-10: LGTM!

Imports and logger initialization are appropriate for an async task queue with Pydantic models and structured logging.


99-100: LGTM!

Module-level singleton is appropriate for application-wide task queue access.

src/api/errors.py (1)

1-22: LGTM!

Clean implementation of a standardized error response model. The retry_after field aligns well with the rate limiting functionality being added in this PR, and the optional details dict provides flexibility for additional error context.

src/api/dependencies.py (2)

13-17: LGTM!

Simple dependency injection pattern that enables easy mocking in tests. Creating a new instance per request is acceptable given GitHubService has minimal initialization overhead.


46-54: LGTM!

Correct implementation of strict authentication enforcement with appropriate 401 status code and clear error message.

src/integrations/github/api.py (1)

1136-1146: GitHub GraphQL API requires authentication for private/protected data; anonymous access to public data is allowed with rate limits.

GitHub's GraphQL API supports unauthenticated requests for public repositories and data at a reduced rate limit of 60 requests per hour per IP. The code correctly implements this by returning appropriate headers for anonymous access without an Authorization header. Anonymous requests will not receive a 401 error for public data; they will only fail if accessing private or protected resources.

Likely an incorrect or invalid review comment.

src/agents/repository_analysis_agent/agent.py (3)

1-11: LGTM!

Clean imports with structured logging via structlog. The module organization follows the established patterns for LangGraph-based agents.


104-110: LGTM!

Good exception handling with structured logging. The comment on Line 108 appropriately documents the intentional broad exception catch for top-level orchestration safety.


87-88: This concern does not apply to the current codebase. AnalysisState uses Pydantic BaseModel with no explicit ConfigDict or extra field configuration, so it defaults to Pydantic v2's extra='ignore'. Any extra keys from LangGraph will be silently ignored, not raise a ValidationError.

Likely an incorrect or invalid review comment.

src/agents/repository_analysis_agent/prompts.py (1)

3-45: LGTM!

Well-structured prompts with clear issue-to-validator mapping. The system prompt provides actionable guidance for the LLM, and the user prompt template includes all necessary context fields for rule generation.

src/main.py (2)

32-56: LGTM!

Good structured logging configuration with JSON rendering for production observability. Silencing noisy HTTP libraries at WARNING level is appropriate.


59-103: LGTM!

Well-structured lifespan manager with proper startup (workers, scheduler, handlers) and shutdown (scheduler stop, worker stop) sequences.

src/agents/repository_analysis_agent/models.py (2)

29-46: LGTM!

Good use of model_validator for automatic URL normalization. The flexible input handling (either repository_full_name or repository_url) is user-friendly.


103-125: LGTM!

Excellent documentation in the PRSignal docstring explaining the AI Immune System feature and how signals feed into HygieneMetrics.

src/api/recommendations.py (4)

22-63: LGTM!

Well-documented Pydantic models with clear attribute descriptions. The AnalysisResponse model provides a clean API contract for the frontend.


394-410: LGTM!

Good URL validation with host verification. The explicit check for github.com and www.github.com prevents accidental use with other Git hosting providers.


592-640: LGTM!

Well-documented authentication priority and comprehensive error handling. The auth validation at Lines 631-640 correctly requires at least one form of authentication with a clear error message.


665-772: LGTM!

Robust PR creation flow with proper error handling at each step (repo fetch, SHA lookup, branch creation, file commit, PR creation). The branch existence check at Lines 704-718 handles race conditions gracefully.

src/agents/repository_analysis_agent/nodes.py (6)

17-25: LGTM!

Good list of AI detection keywords for heuristic-based detection. The keywords cover common AI tool signatures.


28-150: LGTM!

Comprehensive metadata gathering with good error handling around individual file fetches. The language detection map and workflow pattern heuristics are appropriate for this use case.


304-319: LGTM!

Good defensive coding for review author extraction at Line 309. The use of .get("author", {}).get("login", "") safely handles cases where the author may be None (e.g., deleted GitHub users). This addresses the concern from the previous review.


442-552: LGTM!

Well-structured LLM invocation with proper error handling. The fallback rule generation on failure (Lines 543-552) ensures graceful degradation. Good separation of concerns between system and user prompts.


555-609: LGTM!

Good implementation of per-rule reasoning generation. The sequential LLM calls for each recommendation provide detailed explanations while the error handling ensures the workflow continues even if reasoning generation fails.


612-677: LGTM!

Well-structured analysis report generation with appropriate LLM temperature (0.2) for consistent formatting. The fallback message on failure ensures the state always has a report.

src/rules/validators.py (5)

85-90: Comment-only change—no action needed.


132-147: Comment-only change—no action needed.


223-226: Comment-only change—no action needed.


246-260: Comment-only change—no action needed.


1078-1081: Registry update looks good.
New validator is wired into the registry as expected.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +356 to +383
mentioned_files = []
text = body + " " + title
# Look for file patterns (e.g., "src/file.py", "file.ts", etc.)
file_pattern = r"\b[\w/]+\.(py|ts|js|go|rs|java|rb|php|cpp|c|h|swift|kt|scala)\b"
matches = re.findall(file_pattern, text)
mentioned_files.extend([m[0] if isinstance(m, tuple) else m for m in matches])

# Heuristic 2: Check if PR has linked issue but description is generic/unrelated
closing_issues = pr_node.get("closingIssuesReferences", {})
has_linked_issue = closing_issues.get("totalCount", 0) > 0

# If PR has linked issue but description is very short/generic, potential mismatch
if has_linked_issue and len(body.strip()) < 50:
issue_diff_mismatch_count += 1
continue

# If specific files mentioned but don't match changed files, potential mismatch
if mentioned_files and changed_file_names:
# Check if any mentioned file matches changed files
matches_changed = any(
any(
mf in cf or cf in mf or mf.split("/")[-1] == cf.split("/")[-1]
for cf in changed_file_names
)
for mf in mentioned_files
)
if not matches_changed:
issue_diff_mismatch_count += 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Regex pattern captures only file extension, not full file path.

The regex at Line 359 uses a capturing group (py|ts|...) which causes re.findall to return only the captured extensions, not the full file paths. The code at Line 361 then treats these extensions as file paths, making the comparison at Lines 375-381 ineffective.

Suggested fix
                     # Extract file mentions from description
                     mentioned_files = []
                     text = body + " " + title
                     # Look for file patterns (e.g., "src/file.py", "file.ts", etc.)
-                    file_pattern = r"\b[\w/]+\.(py|ts|js|go|rs|java|rb|php|cpp|c|h|swift|kt|scala)\b"
+                    file_pattern = r"\b([\w/]+\.(?:py|ts|js|go|rs|java|rb|php|cpp|c|h|swift|kt|scala))\b"
                     matches = re.findall(file_pattern, text)
-                    mentioned_files.extend([m[0] if isinstance(m, tuple) else m for m in matches])
+                    mentioned_files.extend(matches)
🤖 Prompt for AI Agents
In `@src/agents/repository_analysis_agent/nodes.py` around lines 356 - 383, The
regex in variable file_pattern and the handling of matches is returning only the
captured extension (due to the capturing group) so mentioned_files contains
extensions instead of full paths; update file_pattern to use a non-capturing
group (e.g., replace "(py|ts|...)" with "(?:py|ts|...)" or otherwise ensure the
full match is returned), call re.findall to get full-path matches, and simplify
the population of mentioned_files by extending with matches directly (remove the
m[0] tuple handling); keep the subsequent comparison logic that uses
mentioned_files and changed_file_names and leave issue_diff_mismatch_count
increments unchanged.

Comment on lines +448 to +451
client_ip = request.client.host if request.client else "unknown"
user_id = user.email if user else "Anonymous"

logger.info("analysis_requested", repo_url=repo_url_str, user_id=user_id, ip=client_ip)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider whether logging client IP constitutes PII.

Logging client_ip at Line 451 may be subject to privacy regulations (GDPR, CCPA). If this is a compliance concern, consider hashing or omitting the IP from logs.

🤖 Prompt for AI Agents
In `@src/api/recommendations.py` around lines 448 - 451, The log currently records
raw client IP via request.client.host into client_ip and passes it to
logger.info("analysis_requested", ...), which may be PII; change this by either
omitting the raw IP or replacing it with a non-reversible identifier (e.g., a
salted hash/HMAC using an application secret) before logging. Locate the
client_ip assignment and the logger.info call in recommendations.py (references:
client_ip, request.client.host, user.email, logger.info("analysis_requested"))
and update the code to compute and log a hashed_ip or simply exclude the IP
field based on a config flag (e.g., LOG_RAW_IPS=false), ensuring the hashing
uses a stable secret so the identifier can be correlated without exposing the
plain IP.

Comment on lines +40 to +45
except Exception as e:
logger.exception("installation_check_failed", repo=repo_full_name, error=str(e))
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to check installation: {str(e)}",
) from e
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Exception details exposed in error response.

Including str(e) in the error detail may leak sensitive internal information to API clients.

🐛 Proposed fix
     except Exception as e:
         logger.exception("installation_check_failed", repo=repo_full_name, error=str(e))
         raise HTTPException(
             status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-            detail=f"Failed to check installation: {str(e)}",
+            detail="Failed to check installation status.",
         ) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
logger.exception("installation_check_failed", repo=repo_full_name, error=str(e))
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to check installation: {str(e)}",
) from e
except Exception as e:
logger.exception("installation_check_failed", repo=repo_full_name, error=str(e))
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to check installation status.",
) from e
🤖 Prompt for AI Agents
In `@src/api/repos.py` around lines 40 - 45, The handler currently exposes
internal exception text in the API response; keep detailed info in logs but
return a generic message to clients—remove str(e) from the HTTPException.detail
and replace with a non-sensitive string like "Failed to check installation" (or
similar), while retaining logger.exception("installation_check_failed",
repo=repo_full_name, error=str(e)) or better: keep
logger.exception("installation_check_failed", repo=repo_full_name) so the full
exception is recorded server-side; update the raise HTTPException call
(status.HTTP_500_INTERNAL_SERVER_ERROR) to use the generic detail and do not
include the exception string.

Comment on lines 153 to 160
async with session.get(url, headers=headers) as response:
if response.status == 200:
data = await response.json()
return data if isinstance(data, list) else [data]

# Raise exception for error statuses to avoid silent failures
response.raise_for_status()
return []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unreachable code: return [] after raise_for_status().

When response.status != 200, raise_for_status() will raise an exception, making the subsequent return [] unreachable. This appears to be a logic error—either catch the exception and return [], or remove the unreachable return.

🐛 Proposed fix - consistent with other methods in this class
         async with session.get(url, headers=headers) as response:
             if response.status == 200:
                 data = await response.json()
                 return data if isinstance(data, list) else [data]
-
-            # Raise exception for error statuses to avoid silent failures
-            response.raise_for_status()
-            return []
+            elif response.status == 404:
+                return []
+            else:
+                error_text = await response.text()
+                logger.error(
+                    "list_directory_failed",
+                    repo=repo_full_name,
+                    path=path,
+                    status=response.status,
+                    error=error_text[:200],
+                )
+                return []
🤖 Prompt for AI Agents
In `@src/integrations/github/api.py` around lines 153 - 160, The code after
response.raise_for_status() is unreachable; update the async GET block (the
"async with session.get(...)" scope that calls response.raise_for_status()) to
either remove the unreachable "return []" or catch the HTTP error and return []
for non-200 responses to match other methods in this class — e.g., wrap
response.raise_for_status() in a try/except for aiohttp.ClientResponseError (or
a broad Exception), logging or swallowing the error and returning [] from the
function instead of leaving a dead return statement.

Comment on lines +1098 to +1109
except httpx.HTTPStatusError as e:
logger.error(
"pr_fetch_http_error",
repo=repo_full_name,
status_code=e.response.status_code,
error_type="network_error",
error=str(e),
)
return []
except Exception as e:
logger.error("pr_fetch_unexpected_error", repo=repo_full_name, error_type="unknown_error", error=str(e))
return []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Exception type mismatch: catching httpx.HTTPStatusError in aiohttp code.

This method uses aiohttp for HTTP requests but catches httpx.HTTPStatusError, which will never be raised. The aiohttp equivalent is aiohttp.ClientResponseError.

🐛 Proposed fix
-        except httpx.HTTPStatusError as e:
+        except aiohttp.ClientResponseError as e:
             logger.error(
                 "pr_fetch_http_error",
                 repo=repo_full_name,
-                status_code=e.response.status_code,
+                status_code=e.status,
                 error_type="network_error",
                 error=str(e),
             )
             return []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except httpx.HTTPStatusError as e:
logger.error(
"pr_fetch_http_error",
repo=repo_full_name,
status_code=e.response.status_code,
error_type="network_error",
error=str(e),
)
return []
except Exception as e:
logger.error("pr_fetch_unexpected_error", repo=repo_full_name, error_type="unknown_error", error=str(e))
return []
except aiohttp.ClientResponseError as e:
logger.error(
"pr_fetch_http_error",
repo=repo_full_name,
status_code=e.status,
error_type="network_error",
error=str(e),
)
return []
except Exception as e:
logger.error("pr_fetch_unexpected_error", repo=repo_full_name, error_type="unknown_error", error=str(e))
return []
🤖 Prompt for AI Agents
In `@src/integrations/github/api.py` around lines 1098 - 1109, The except block is
catching httpx.HTTPStatusError in aiohttp-based code; change the exception type
to aiohttp.ClientResponseError and log the correct status attribute (use
e.status instead of e.response.status_code) so the logger call in this handler
(where logger.error("pr_fetch_http_error", repo=repo_full_name, status_code=...,
error_type="network_error", error=str(e))) reports the proper aiohttp error
details; keep the existing fallback except Exception block unchanged.

Comment on lines 988 to 1049
async def validate(self, parameters: dict[str, Any], event: dict[str, Any]) -> bool:
check_commits = parameters.get("check_commits", True)

# Get PR data
pull_request = event.get("pull_request_details", {})
if not pull_request:
return True # No violation if we can't check

# Check PR description for linked issues
body = pull_request.get("body", "") or ""
title = pull_request.get("title", "") or ""

# Check for closing keywords (closes, fixes, resolves, refs, relates) followed by issue reference
closing_keywords = ["closes", "fixes", "resolves", "refs", "relates", "addresses"]
issue_pattern = r"#\d+|(?:https?://)?(?:github\.com/[\w-]+/[\w-]+/)?(?:issues|pull)/\d+"

# Check description and title
text_to_check = (body + " " + title).lower()
has_linked_issue = False

# Check for closing keywords with issue references
for keyword in closing_keywords:
pattern = rf"\b{re.escape(keyword)}\s+{issue_pattern}"
if re.search(pattern, text_to_check, re.IGNORECASE):
has_linked_issue = True
break

# Also check for standalone issue references (e.g., #123)
if not has_linked_issue and re.search(issue_pattern, text_to_check):
has_linked_issue = True

# Check commit messages if requested
if not has_linked_issue and check_commits:
commits = event.get("commits", [])
if not commits:
# Try to get from pull_request_details
commits = pull_request.get("commits", [])

for commit in commits:
commit_message = commit.get("message", "") or ""
if not commit_message:
continue

commit_text = commit_message.lower()
for keyword in closing_keywords:
pattern = rf"\b{re.escape(keyword)}\s+{issue_pattern}"
if re.search(pattern, commit_text, re.IGNORECASE):
has_linked_issue = True
break

# Check for standalone issue references in commit
if not has_linked_issue and re.search(issue_pattern, commit_text):
has_linked_issue = True
break

if has_linked_issue:
break

logger.debug(
f"RequireLinkedIssueCondition: PR has linked issue: {has_linked_issue}, checked commits: {check_commits}"
)
return has_linked_issue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

GitHub pull_request webhook payload structure "commits" field type

💡 Result:

In the pull_request webhook event payload, the pull_request object is a “Pull Request” resource, where:

  • pull_request.commits is an integer (the count of commits in the PR) [1]
  • pull_request.commits_url is a string (URL to list the PR’s commits) [1]

If you’re thinking of a commits array, that’s part of the push event payload (not pull_request), where commits is an array of commit objects [2].

Sources:
[1] PyGithub PullRequest object docs (fields: commits = integer, commits_url = string) (pygithub.readthedocs.io)
[2] GitHub Docs “GitHub event types” (PushEvent payload: commits = array) (docs.github.com)

Citations:


🌐 Web query:

GitHub pull_request webhook event payload commit messages full payload structure

💡 Result:

pull_request webhook event: where are commit messages?

The pull_request webhook payload does not include the PR’s commit list or commit messages. It includes a pull_request object (plus metadata like action, number, sender, repository, etc.). To get commit messages, you must make an API call to list the PR’s commits. [1] [2]

High-level payload shape (common fields)

Depending on the action (e.g., opened, edited, synchronize, closed, assigned, labeled, etc.), the payload includes some extra fields (like assignee, label, changes). The common shape is: [1]

{
  "action": "opened|edited|synchronize|closed|...",
  "number": 123,
  "pull_request": {
    "id": 1,
    "number": 123,
    "state": "open|closed",
    "title": "",
    "body": "",
    "user": { "login": "" },

    "head": { "ref": "branch", "sha": "HEAD_SHA", "repo": { "full_name": "owner/repo" } },
    "base": { "ref": "main",   "sha": "BASE_SHA", "repo": { "full_name": "owner/repo" } }

    // also includes many other PR fields (URLs, timestamps, flags, counts, etc.)
  },

  "repository": { "full_name": "owner/repo" },
  "sender": { "login": "" }

  // optionally: "organization", "installation", "enterprise", and action-specific fields
}

How to get commit messages for the PR

Call “List commits on a pull request”:

  • GET /repos/{owner}/{repo}/pulls/{pull_number}/commits
    Each item includes commit.message. [2]

If you specifically want commit messages in the webhook payload

Use the push webhook event instead; it includes commits[] and each commit includes message. [3]

Sources: GitHub webhook pull_request payload docs [1], GitHub REST “List commits on a pull request” [2], GitHub push event payload (commits[][message]) [3].


Guard commits type before iterating.
event["commits"] does not exist in pull_request webhook events, and pull_request["commits"] is an integer (count of commits), not a list. Attempting to iterate over it will raise TypeError. Add type guards to normalize to a list or skip commit scanning.

🐛 Proposed fix
-            commits = event.get("commits", [])
+            commits = event.get("commits", [])
+            if not isinstance(commits, list):
+                commits = []
             if not commits:
                 # Try to get from pull_request_details
-                commits = pull_request.get("commits", [])
+                commits = pull_request.get("commits", [])
+                if not isinstance(commits, list):
+                    commits = []
🤖 Prompt for AI Agents
In `@src/rules/validators.py` around lines 988 - 1049, In validate (async def
validate), guard and normalize the commits variable before iterating: when
obtaining commits from event.get("commits", []) or pull_request.get("commits",
[]), check isinstance(commits, list) (or iterable of dicts) and if not, set
commits = [] (or skip commit scanning) to avoid iterating an integer commit
count from pull_request; ensure the commit_message loop only runs when commits
is a list of dicts with "message" keys.

Comment on lines 68 to 79
async def _worker(self) -> None:
"""Background worker loop."""
while True:
task = await self.queue.get()
try:
# Periodic cleanup
if (datetime.now() - last_cleanup).total_seconds() > cleanup_interval:
self.cleanup_old_tasks()
last_cleanup = datetime.now()

# Find pending tasks
pending_tasks = [task for task in self.tasks.values() if task.status == TaskStatus.PENDING]

if pending_tasks:
# Process the oldest task
task = min(pending_tasks, key=lambda t: t.created_at)
await self._process_task(task, worker_name)
else:
# No tasks, wait a bit
await asyncio.sleep(1)

logger.info("task_started", task_id=task.task_id, event_type=task.event_type)
await task.func(*task.args, **task.kwargs)
logger.info("task_completed", task_id=task.task_id)
except Exception as e:
logger.error(f"Worker {worker_name} error: {e}")
await asyncio.sleep(5)

logger.info(f"Worker {worker_name} stopped")

async def _process_task(self, task: Task, worker_name: str):
"""Process a single task."""
try:
task.status = TaskStatus.RUNNING
task.started_at = datetime.now()

logger.info(f"Worker {worker_name} processing task {task.id}")

# Get the appropriate processor
processor = self._get_processor(task.event_type)
result = await processor.process(task)

task.status = TaskStatus.COMPLETED
task.completed_at = datetime.now()
task.result = result.__dict__ if hasattr(result, "__dict__") else result

logger.info(f"Task {task.id} completed successfully")

except Exception as e:
task.status = TaskStatus.FAILED
task.completed_at = datetime.now()
task.error = str(e)
logger.error(f"Task {task.id} failed: {e}")

def cleanup_old_tasks(self, max_age_hours: int = 24):
"""Clean up old completed tasks and their event hashes to prevent memory leaks."""
cutoff_time = datetime.now().timestamp() - (max_age_hours * 3600)

# Find old completed tasks
old_task_ids = [
task_id
for task_id, task in self.tasks.items()
if task.status in [TaskStatus.COMPLETED, TaskStatus.FAILED] and task.created_at.timestamp() < cutoff_time
]

# Remove old tasks and their event hashes
for task_id in old_task_ids:
task = self.tasks[task_id]
if task.event_hash and task.event_hash in self.event_hashes:
del self.event_hashes[task.event_hash]
del self.tasks[task_id]

if old_task_ids:
logger.info(f"Cleaned up {len(old_task_ids)} old tasks")

def _get_processor(self, event_type: str):
"""Get the appropriate processor for the event type."""
from src.event_processors.factory import EventProcessorFactory

return EventProcessorFactory.create_processor(event_type)


# Global task queue instance
logger.error("task_failed", task_id=task.task_id, error=str(e), exc_info=True)
finally:
self.queue.task_done()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Failed tasks are silently discarded.

When task.func raises an exception, the error is logged but the task is permanently lost with no retry or dead-letter mechanism. Combined with the pre-execution deduplication marking, failed tasks cannot be recovered. Consider implementing a bounded retry count or dead-letter queue for failed tasks.

🤖 Prompt for AI Agents
In `@src/tasks/task_queue.py` around lines 68 - 79, The worker currently logs
failures and drops tasks; modify _worker to support bounded retries and a
dead-letter queue: add a retry counter on the Task (e.g., task.retries and
task.max_retries) or maintain a separate retry map, and on exception increment
task.retries and if retries < max_retries re-enqueue the task (optionally with a
small backoff) otherwise push the task into a dead_letter_queue (e.g.,
self.dead_letter_queue) for inspection; ensure queue.task_done() is still called
once per dequeue and include task identifiers (task.task_id) when moving to
dead-letter so failed work can be recovered or inspected later.

@MT-superdev
Copy link
Contributor Author

PR Summary: AI Immune System & Backend Hardening (4-Phase Execution)

Scope: Backend Refactor & Standardization
Files Changed: 74
Test Coverage: 150 Tests (100% Pass)

Overview

This PR executes the agreed 4-Phase Implementation Plan to finalize the "AI Immune System" logic and harden the backend for enterprise deployment. It involves a complete architectural audit, resulting in significant refactoring to ensure type safety, security, and scalability.

Key Changes by Phase

Phase 1: Code Recovery & GraphQL Integration

  • Restored GraphQL Client: Fixed the missing files in src/integrations/github to enable efficient data fetching.
  • Optimized Fetching: Replaced N+1 REST calls with batched GraphQL queries for PR signals.

Phase 2: Data Contract Standardization

  • Strict YAML Schema: Refactored rule generation to strictly allow only valid fields (description, severity, etc.).
  • Clean Separation: Removed reasoning and metadata from the generated YAML files, moving them to a dedicated rule_reasonings field in the API response.
  • API Response: Standardized AnalysisResponse models for seamless frontend integration.

Phase 3: Agentic Prompt Refactoring

  • Schema-Driven Prompts: Rewrote prompts for RepositoryAnalysisAgent, FeasibilityAgent, and EngineAgent.
  • Hallucination Control: Removed verbose examples in favor of concise, schema-based instructions to reduce token usage and improve determinism.

Phase 4: Immune System Metrics Verification

  • Implemented 9 Core Metrics:
    • ghost_contributor_rate, issue_diff_mismatch_rate, unlinked_issue_rate
    • codeowner_bypass_rate, ai_generated_rate
    • average_pr_size (LOC-based), ci_skip_rate, etc.
  • Logic Verification: Added pure-function logic for all metric calculations.

Audit & Hardening

  • Security & Performance: Conducted a full codebase audit, resolving identified security vulnerabilities and memory leaks.
  • Testing: Expanded the test suite from 71 to 150 tests, fully mocking network calls via respx.
  • Observability: Finalized migration to structlog for structured JSON logging.
  • Reliability: Enforced giturlparse for URL handling and strictly applied Pydantic V2 validation.

Verification

  • All 150 unit/integration tests passed.
  • mypy strict compliance achieved (Zero errors).
  • Linter (ruff) checks passed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In `@scripts/migrate_to_structlog.py`:
- Around line 10-27: The migration currently removes the logging import but only
converts getLogger calls, which leaves other logging.* references broken and
file I/O unprotected; update migrate_file to (1) scan the file for other logging
usages (regex for
logging\.(debug|info|warning|error|critical|exception|log)\(|logging\.(basicConfig|setLevel|Formatter)\b|logging\.(DEBUG|INFO|WARNING|ERROR|CRITICAL)\b),
(2) if those method/constant usages exist, do NOT drop the original import
logging — instead insert or preserve both imports (keep "import logging" and add
"import structlog") and convert call sites of logging.debug/info/... to
structlog.get_logger().debug/info/... while leaving logging.<CONSTANT> and
functions like basicConfig/setLevel untouched, and (3) add try/except around
file_path.read_text(...) and file_path.write_text(...) in migrate_file to catch
IO/encoding/permission errors and return False (or surface the error) so the
script won’t crash.

In `@src/integrations/github/graphql.py`:
- Around line 33-40: The request code using httpx.AsyncClient lacks a timeout
and doesn't inspect GraphQL-level errors; update the call that posts to
self.endpoint (the block using httpx.AsyncClient, response, query, variables,
headers) to use a 60-second timeout (e.g., pass timeout=60 to AsyncClient or the
post call) and after obtaining response.json() check for an "errors" key and
raise or return a meaningful error containing those details (include context
like the query/variables or endpoint) instead of silently returning the payload
when errors are present; ensure response.raise_for_status() remains to handle
HTTP errors and add a clear exception path for GraphQL errors.

In `@src/integrations/github/models.py`:
- Around line 62-81: The default factories are instantiating Pydantic models
with alias names (e.g., CommentConnection(totalCount=0),
FileConnection(edges=[])) which fails under Pydantic v2; either add model_config
= ConfigDict(populate_by_name=True) to the affected models (e.g.,
CommentConnection, FileConnection, IssueConnection, ReviewConnection,
CommitConnection) so they accept alias keys on instantiation, or change the
default factories to use the Python field names (e.g., total_count=0, edges=[])
when creating instances; update the model definitions (CommentConnection,
FileConnection, etc.) or the default_factory calls accordingly to resolve the
validation error.

In `@src/integrations/github/rule_loader.py`:
- Around line 52-60: The loader iterates rules_data["rules"] but assumes each
rule_data is a dict; when it's not, calling rule_data.get in the except block
raises and aborts loading. Add a type guard before parsing inside the loop: skip
non-dict entries (e.g., if not isinstance(rule_data, dict): log a warning with
the raw value/context and continue), only call GitHubRuleLoader._parse_rule for
dicts, and in the except block reference safe fields (like
rule_data.get("description") only after checking type) or include the
repr(rule_data) in logger.error to avoid further errors.

In `@src/integrations/github/rules_service.py`:
- Around line 16-22: The logger.info call logs success unconditionally even
though github_client.create_pull_request_comment may return {} on failure;
update the code that calls create_pull_request_comment (the call to
github_client.create_pull_request_comment and the subsequent logger.info) to
inspect the response (e.g., check truthiness or presence of an id/key) and only
log "Posted validation result..." when the response indicates a successful
creation, otherwise log an error or warning including the failed response and
the original validation_result["message"] for diagnostics.
- Around line 28-41: The message incorrectly assumes a missing file when
github_client.get_file_content(repo, ".watchflow/rules.yaml", installation_id)
returns None for both 404 and auth/permission failures; update the handling
around the file_content check in rules_service.py (the block that inspects
file_content) to report "missing or inaccessible" and add a short prompt to
verify the app installation/permissions/installation token (mentioning
installation_id and repository) so users know to check access as well as absence
of the file.

In `@tests/unit/agents/test_rule_schema_compliance.py`:
- Around line 43-53: The standalone multiline YAML string in
test_rule_schema_compliance.py is dead code (unused expression); remove the
literal or convert it to a comment or assign it to a clearly named constant
(e.g., expected_schema_yaml) and use it in assertions; locate the standalone
triple-quoted string (the YAML block) and either delete it or turn it into a
comment/used variable so the test contains no unused expressions.

In `@tests/unit/event_processors/test_violation_acknowledgment.py`:
- Around line 35-61: The mock in the mock_agents fixture constructs
AgentResult/EvaluationResult shapes with ad-hoc MagicMocks (result.data =
{"evaluation_result": MagicMock(violations=[violation])}) which may diverge from
production types; replace the hand-built structure by creating real
AgentResult/EvaluationResult instances (or use the shared test factory used in
production tests) and populate fields (violations, rule_description, severity,
message, how_to_fix) and the acknowledgment response using the actual
AckResult/AgentAcknowledgment model, then set engine.execute.return_value and
ack_agent.evaluate_acknowledgment.return_value to those concrete instances so
tests reflect the real schema (refer to mock_agents, engine, result, violation,
ack_agent, ack_result, evaluate_acknowledgment).

In `@tests/unit/integrations/github/test_api.py`:
- Around line 36-48: The test test_get_installation_access_token_success is
mocking an httpx-style response (status_code and synchronous .json()) but the
implementation likely uses aiohttp (which uses .status and async .json()/
.text()); update the test fixture used by
github_client.get_installation_access_token to provide an aiohttp-compatible
AsyncMock response (set .status = 201, make .json and .text AsyncMock
coroutines, and ensure session.post/get return an async context-manager
response), or alternatively adjust the mock_httpx_client to match the actual
HTTP client used by GitHubClient so that
github_client.get_installation_access_token receives an object with .status and
awaitable .json()/.text().
- Around line 8-24: The fixture mock_httpx_client is patching httpx.AsyncClient
while the code under test (GitHubClient) uses aiohttp.ClientSession and checks
response.status; change the fixture to patch "aiohttp.ClientSession" (rename
fixture if desired), return an AsyncMock session that implements
__aenter__/__aexit__, and ensure its request methods (get/post/patch/put/delete)
are AsyncMocks that return mock response objects exposing a .status attribute
(not .status_code) and awaitable behavior so the tests exercise the real aiohttp
code paths.

In `@tests/unit/rules/test_validators.py`:
- Around line 199-222: The test_allowed_hours_condition has redundant/incorrect
mocking: remove the outer with patch("src.rules.validators.datetime") block and
the line setting mock_datetime.side_effect = datetime; instead, patch the
module-level datetime reference once (patch("src.rules.validators.datetime",
mock_dt)) where mock_dt is a MagicMock whose now.return_value.hour is set (e.g.,
mock_dt.now.return_value.hour = 10), then call AllowedHoursCondition.validate
with the two assertions as shown; this ensures the module's datetime used by
AllowedHoursCondition is correctly mocked and eliminates the dead/overwritten
mock setup.

In `@tests/unit/test_rule_schema_compliance.py`:
- Around line 56-67: The test instantiates AnalysisResponse with non-existent
fields ("rules" and "immune_system_metrics"); change the test to construct
AnalysisResponse using the model's actual fields (use rules_yaml as a YAML
string representing your rules, provide analysis_summary as a dict, set
analysis_report/pr_plan as appropriate, and pass rule_reasonings and warnings)
and remove immune_system_metrics; then update assertions to check
rule_reasonings and that the serialized rules_yaml string does not contain
"reasoning" (or otherwise inspect the correct field) instead of accessing a
nonexistent response.rules attribute — refer to AnalysisResponse, rules_yaml,
analysis_summary, analysis_report, rule_reasonings, and warnings to locate and
fix the test.
- Around line 24-53: Delete the failing test function
test_rule_recommendation_schema from tests/unit/test_rule_schema_compliance.py
(it references non-existent RuleConfig and RuleParameters and uses fields not
present on RuleRecommendation); alternatively, if you need to preserve behavior,
replace references to RuleConfig and RuleParameters with the actual models and
fields defined in src/agents/repository_analysis_agent/models.py and instantiate
RuleRecommendation only with the supported fields (description, enabled,
severity, event_types, parameters) to match the real model shape.
🧹 Nitpick comments (9)
src/api/auth.py (2)

1-6: Consider moving structlog import to module level.

The structlog import is currently performed inside both endpoint functions (lines 32 and 88). Moving it to module level avoids repeated import lookups on each request and is more idiomatic.

♻️ Suggested refactor
 from fastapi import APIRouter, HTTPException
 from pydantic import BaseModel
+from structlog import get_logger
 
 from src.integrations.github.api import GitHubClient
 
 router = APIRouter()
+logger = get_logger()

Then remove the in-function imports and logger instantiation at lines 32-34 and 88-90.


51-54: Consider adding exc_info=True for consistent error logging.

Line 91 includes exc_info=True for stack traces, but this exception handler doesn't. For easier debugging of token validation failures, consider including the stack trace.

♻️ Suggested change
     except Exception as e:
         # Security: Do not leak exception details to client
-        logger.error("token_validation_failed", error=str(e))
+        logger.error("token_validation_failed", error=str(e), exc_info=True)
         return TokenValidationResponse(valid=False, message="Token validation failed. Please check your credentials.")
tests/integration/test_github_graphql.py (1)

8-21: Tests look good; consider adding a GraphQL-level error test.

The success and unauthorized tests are well-structured. However, GitHub can return HTTP 200 with errors in the response body. If the client is updated to handle GraphQL errors, a corresponding test would be valuable.

📝 Example test for GraphQL errors
`@pytest.mark.asyncio`
async def test_execute_query_graphql_error():
    token = "test_token"
    client = GitHubGraphQLClient(token)
    query = "query { invalid }"
    variables = {}

    mock_response = {"data": None, "errors": [{"message": "Field 'invalid' doesn't exist"}]}

    async with respx.mock:
        respx.post("https://api.github.com/graphql").mock(
            return_value=httpx.Response(200, json=mock_response)
        )

        result = await client.execute_query(query, variables)
        assert "errors" in result
src/integrations/github/rule_loader.py (1)

97-101: Default unknown severity values instead of skipping the rule.

RuleSeverity(...) raises on unexpected values, which drops the entire rule. Consider warning + fallback to MEDIUM to keep rules usable.

♻️ Suggested change
-        rule = Rule(
+        try:
+            severity = RuleSeverity(rule_data.get("severity", "medium"))
+        except ValueError:
+            logger.warning(
+                f"Unknown severity: {rule_data.get('severity')}; defaulting to 'medium'"
+            )
+            severity = RuleSeverity.MEDIUM
+
+        rule = Rule(
             description=rule_data["description"],
             enabled=rule_data.get("enabled", True),
-            severity=RuleSeverity(rule_data.get("severity", "medium")),
+            severity=severity,
             event_types=event_types,
src/agents/repository_analysis_agent/metrics.py (3)

1-6: Unused logger import.

The structlog logger is imported and instantiated but never used in this module.

Remove unused import if logging isn't needed
-import structlog
-
 from src.core.models import HygieneMetrics
 from src.integrations.github.models import PullRequest
-
-logger = structlog.get_logger()

8-17: Potential false positives in AI detection keywords.

The keyword "cursor" is a common programming term (database cursors, UI cursors, text cursors) that could trigger false positives for legitimate PRs. Consider using more specific patterns like "cursor ai" or "cursor.sh".

Suggested refinement
 AI_DETECTION_KEYWORDS = [
     "generated by claude",
-    "cursor",
+    "cursor ai",
+    "cursor.sh",
     "copilot",
     "chatgpt",
     "ai-generated",
-    "llm",
+    "generated by llm",
     "i am an ai",
     "as an ai",
 ]

104-115: CI skip detection only checks first commit.

The code checks nodes[0] but the comment says "last commit message". Additionally, if any commit in the PR contains [skip ci], CI would be skipped for that commit's push, but this logic only checks one commit. Consider checking all commits or clarifying the intended behavior.

Check all commits for CI skip directives
     # 6. CI Skip Rate
-    # Metric: Percentage of recent commits containing [skip ci] or similar.
-    # Note: Using last commit message from each PR as a proxy.
+    # Metric: Percentage of PRs with any commit containing [skip ci] or similar.
     ci_skip_count = 0
     for pr in prs:
         if not pr.commits.nodes:
             continue
-        message = pr.commits.nodes[0].commit.message.lower()
-        if "[skip ci]" in message or "[ci skip]" in message or "[no ci]" in message:
-            ci_skip_count += 1
+        has_skip = any(
+            any(skip in node.commit.message.lower() for skip in ["[skip ci]", "[ci skip]", "[no ci]"])
+            for node in pr.commits.nodes
+        )
+        if has_skip:
+            ci_skip_count += 1
tests/unit/test_metrics.py (1)

44-48: Move imports to module level.

The imports for CommitMessage and CommitNode inside the helper function add overhead on each call and reduce code clarity.

Move imports to top of file
 from src.integrations.github.models import (
     Actor,
     CommentConnection,
     CommitConnection,
+    CommitMessage,
+    CommitNode,
     FileConnection,
     FileEdge,
     FileNode,
     IssueConnection,
     IssueNode,
     PullRequest,
     ReviewConnection,
     ReviewNode,
 )

Then remove lines 46-48 inside the function.

tests/unit/rules/test_validators.py (1)

134-161: Consider simplifying inline comments.

The extensive comments (lines 144-159) explaining the validator's inverted return logic are helpful but verbose. Consider moving this explanation to the test's docstring or a brief inline comment.

Simplified version
 `@pytest.mark.asyncio`
 async def test_code_owners_condition():
+    """
+    CodeOwnersCondition returns False (violation) when critical files are changed,
+    indicating code owner review is required.
+    """
     condition = CodeOwnersCondition()
 
-    # Needed mocks
     with (
         patch("src.rules.validators.FilePatternCondition._glob_to_regex", return_value=".*"),
         patch("src.rules.utils.codeowners.is_critical_file", return_value=True),
         patch.object(condition, "_get_changed_files", return_value=["critical.py"]),
     ):
-        # If critical file changes, return False (review needed)
-        # Wait, validate returns NOT requires_code_owner_review
-        # requires_code_owner_review is True if any file is critical
-        # So returns False (violation because review IS needed but condition checks "is valid"?)
-        # Usually validators return True if PASS (no violation).
-        # If review is needed, and we assume it's NOT provided?
-        # The condition is "code_owners". Logic:
-        # requires_code_owner_review = any(...)
-        # return not requires_code_owner_review
-        # So if review is REQUIRED, it returns False (Validation Failed).
-        # This implies the condition asserts "No code owner validation errors" or "No critical files changed"?
-        # Description: "Validates if changes to files require review from code owners"
-        # If it returns False, it means "Code owner review REQUIRED (and presumably not present?)"
-        # Actually the validator does not check if review is GIVEN. Just if it's needed.
-        # So if it's needed, it returns False -> Trigger Violation.
-        # Violation message would be "Code owner review required".
-
+        # Critical file changed -> returns False (review required)
         assert await condition.validate({}, {}) is False
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a426e2 and 154b56d.

📒 Files selected for processing (14)
  • scripts/migrate_to_structlog.py
  • src/agents/repository_analysis_agent/metrics.py
  • src/api/auth.py
  • src/integrations/github/graphql.py
  • src/integrations/github/models.py
  • src/integrations/github/rule_loader.py
  • src/integrations/github/rules_service.py
  • tests/integration/test_github_graphql.py
  • tests/unit/agents/test_rule_schema_compliance.py
  • tests/unit/event_processors/test_violation_acknowledgment.py
  • tests/unit/integrations/github/test_api.py
  • tests/unit/rules/test_validators.py
  • tests/unit/test_metrics.py
  • tests/unit/test_rule_schema_compliance.py
🧰 Additional context used
🧬 Code graph analysis (9)
tests/unit/test_rule_schema_compliance.py (2)
src/agents/repository_analysis_agent/models.py (1)
  • RuleRecommendation (90-100)
src/api/recommendations.py (1)
  • AnalysisResponse (45-63)
tests/integration/test_github_graphql.py (1)
src/integrations/github/graphql.py (2)
  • GitHubGraphQLClient (9-44)
  • execute_query (14-44)
tests/unit/agents/test_rule_schema_compliance.py (1)
src/agents/repository_analysis_agent/models.py (1)
  • RuleRecommendation (90-100)
tests/unit/rules/test_validators.py (1)
src/rules/validators.py (13)
  • AllowedHoursCondition (603-637)
  • AuthorTeamCondition (117-147)
  • CodeOwnersCondition (687-734)
  • DaysCondition (305-345)
  • FilePatternCondition (150-206)
  • MaxFileSizeCondition (445-482)
  • MinApprovalsCondition (279-302)
  • MinDescriptionLengthCondition (381-409)
  • PastContributorApprovalCondition (737-807)
  • RequiredLabelsCondition (412-442)
  • RequireLinkedIssueCondition (979-1049)
  • TitlePatternCondition (348-378)
  • WeekendCondition (246-260)
tests/unit/test_metrics.py (2)
src/agents/repository_analysis_agent/metrics.py (1)
  • calculate_hygiene_metrics (20-136)
src/integrations/github/models.py (13)
  • Actor (4-7)
  • CommentConnection (58-59)
  • CommitConnection (42-43)
  • FileConnection (54-55)
  • FileEdge (50-51)
  • FileNode (46-47)
  • IssueConnection (28-29)
  • IssueNode (21-25)
  • PullRequest (62-80)
  • ReviewConnection (17-18)
  • ReviewNode (10-14)
  • CommitMessage (32-33)
  • CommitNode (36-39)
tests/unit/integrations/github/test_api.py (1)
src/integrations/github/api.py (7)
  • get_installation_access_token (94-124)
  • get_repository (126-140)
  • list_directory_any_auth (142-160)
  • get_file_content (162-193)
  • create_check_run (200-219)
  • get_pull_request (221-236)
  • list_pull_requests (238-260)
src/integrations/github/rule_loader.py (3)
src/core/models.py (1)
  • EventType (24-38)
src/rules/interface.py (1)
  • RuleLoader (6-26)
src/rules/models.py (3)
  • Rule (44-53)
  • RuleAction (37-41)
  • RuleSeverity (9-17)
src/agents/repository_analysis_agent/metrics.py (2)
src/core/models.py (1)
  • HygieneMetrics (66-123)
src/integrations/github/models.py (1)
  • PullRequest (62-80)
src/integrations/github/rules_service.py (2)
src/core/models.py (1)
  • repo_full_name (56-58)
src/integrations/github/api.py (2)
  • create_pull_request_comment (379-408)
  • get_file_content (162-193)
🔇 Additional comments (24)
src/api/auth.py (3)

9-17: LGTM!

The request/response models are well-structured with appropriate optional fields and sensible defaults.


20-24: LGTM!

The installation check response model is well-designed with appropriate defaults.


57-92: LGTM!

The endpoint correctly:

  • Re-raises HTTPException for expected errors
  • Logs full error details server-side with stack trace (exc_info=True)
  • Returns a generic message to clients

The in-function import of github_client (line 62) may be intentional to avoid circular imports; if not, consider moving it to module level along with the structlog import.

scripts/migrate_to_structlog.py (1)

30-84: Approach is reasonable for a one-time migration script.

The hardcoded file list gives explicit control over which files are migrated, which is appropriate for a deliberate migration. Consider adding a --dry-run flag or running with the warning detection suggested above before committing changes.

src/integrations/github/graphql.py (1)

9-12: LGTM!

The client initialization is straightforward. Hardcoding the GitHub GraphQL endpoint is reasonable for this use case.

tests/integration/test_github_graphql.py (1)

24-37: LGTM!

The unauthorized test correctly validates that HTTP errors are propagated as HTTPStatusError.

src/integrations/github/models.py (2)

83-98: LGTM with minor note.

The response wrapper models are well-structured. The pull_requests: dict | None typing on line 87 is loose compared to other typed connections — this may be intentional for flexibility with different query shapes.


1-60: LGTM!

The Pydantic models are well-structured with appropriate field aliases mapping GraphQL camelCase fields to Python snake_case. Documentation strings provide good context.

tests/unit/event_processors/test_violation_acknowledgment.py (9)

1-5: Imports look good.

Clean and minimal for the test needs.


8-20: Fixture setup is solid.

The payload mirrors the expected webhook shape and keeps tests readable.


23-32: Good coverage of GitHub client methods.

Mocks are comprehensive for the processor workflow.


64-70: Rule provider fixture is clear and focused.

Matches the expected event type usage.


73-98: Processor wiring is clean.

Patching + direct injection keeps tests deterministic.


101-112: Happy-path test is concise and effective.

Checks both success output and response comment content.


114-135: Negative-path coverage looks right.

The test validates rejection messaging and retained violations.


137-149: Edge-case handling is well covered.

The help response assertions are on point.


152-158: Extraction tests are clear and targeted.

Covers the key formats succinctly.

tests/unit/test_rule_schema_compliance.py (1)

6-21: LGTM!

The test correctly validates that RuleConfig enforces the expected schema and excludes internal fields like reasoning and rationale from serialization.

tests/unit/agents/test_rule_schema_compliance.py (1)

69-77: LGTM!

The test correctly validates that RuleParameters handles optional fields and that model_dump(exclude_none=True) properly omits unset fields.

src/agents/repository_analysis_agent/metrics.py (1)

20-46: LGTM!

The function signature, documentation, and empty-input handling are well-structured. The early return for empty input is a good defensive pattern.

tests/unit/test_metrics.py (1)

66-151: LGTM!

Comprehensive test coverage for all hygiene metrics. The test cases cover edge cases (empty input), boundary conditions, and expected calculations with clear documentation in comments.

tests/unit/rules/test_validators.py (3)

25-45: LGTM!

Good test coverage for AuthorTeamCondition and RequireLinkedIssueCondition. The tests appropriately validate both direct linked issues and body keyword detection.


47-116: LGTM!

Solid test coverage for pattern matching and validation conditions (FilePatternCondition, MinApprovalsCondition, DaysCondition, TitlePatternCondition, MinDescriptionLengthCondition, RequiredLabelsCondition). The use of patch.object to mock _get_changed_files is appropriate.


164-196: LGTM!

Thorough test coverage for PastContributorApprovalCondition with proper mocking of is_new_contributor. The three test cases (not new, new with old reviewer, new with new reviewer) cover the key scenarios.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +10 to +27
def migrate_file(file_path: Path) -> bool:
"""Migrate a single file from logging to structlog"""
content = file_path.read_text(encoding="utf-8")
original = content

# Replace import statements
content = re.sub(r"^import logging$", "import structlog", content, flags=re.MULTILINE)

# Replace getLogger calls
content = re.sub(r"logging\.getLogger\(__name__\)", "structlog.get_logger()", content)
content = re.sub(r'logging\.getLogger\("([^"]+)"\)', r"structlog.get_logger()", content)
content = re.sub(r"logging\.getLogger\('([^']+)'\)", r"structlog.get_logger()", content)

# Write back if changed
if content != original:
file_path.write_text(content, encoding="utf-8")
return True
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incomplete migration may break files with other logging.* usages.

The script replaces import logging with import structlog but doesn't handle other logging module references that may exist in the same file:

  • logging.debug(), logging.info(), logging.warning(), logging.error(), etc.
  • logging.basicConfig(), logging.setLevel(), logging.Formatter, etc.
  • Log level constants like logging.INFO, logging.DEBUG

After migration, any remaining logging.* calls will raise NameError at runtime.

Additionally, file I/O operations lack error handling—encoding issues or permission errors will crash the script mid-migration.

Proposed fix with validation and error handling
 def migrate_file(file_path: Path) -> bool:
     """Migrate a single file from logging to structlog"""
-    content = file_path.read_text(encoding="utf-8")
-    original = content
-
-    # Replace import statements
-    content = re.sub(r"^import logging$", "import structlog", content, flags=re.MULTILINE)
-
-    # Replace getLogger calls
-    content = re.sub(r"logging\.getLogger\(__name__\)", "structlog.get_logger()", content)
-    content = re.sub(r'logging\.getLogger\("([^"]+)"\)', r"structlog.get_logger()", content)
-    content = re.sub(r"logging\.getLogger\('([^']+)'\)", r"structlog.get_logger()", content)
-
-    # Write back if changed
-    if content != original:
-        file_path.write_text(content, encoding="utf-8")
-        return True
-    return False
+    try:
+        content = file_path.read_text(encoding="utf-8")
+        original = content
+
+        # Replace import statements
+        content = re.sub(r"^import logging$", "import structlog", content, flags=re.MULTILINE)
+
+        # Replace getLogger calls
+        content = re.sub(r"logging\.getLogger\(__name__\)", "structlog.get_logger()", content)
+        content = re.sub(r'logging\.getLogger\("([^"]+)"\)', r"structlog.get_logger()", content)
+        content = re.sub(r"logging\.getLogger\('([^']+)'\)", r"structlog.get_logger()", content)
+
+        # Warn if there are remaining logging.* usages that weren't migrated
+        remaining = re.findall(r"logging\.\w+", content)
+        if remaining:
+            print(f"  [WARN] Unmigrated logging usages: {set(remaining)}")
+
+        # Write back if changed
+        if content != original:
+            file_path.write_text(content, encoding="utf-8")
+            return True
+        return False
+    except (OSError, UnicodeDecodeError) as e:
+        print(f"  [ERROR] Failed to process {file_path}: {e}")
+        return False
🤖 Prompt for AI Agents
In `@scripts/migrate_to_structlog.py` around lines 10 - 27, The migration
currently removes the logging import but only converts getLogger calls, which
leaves other logging.* references broken and file I/O unprotected; update
migrate_file to (1) scan the file for other logging usages (regex for
logging\.(debug|info|warning|error|critical|exception|log)\(|logging\.(basicConfig|setLevel|Formatter)\b|logging\.(DEBUG|INFO|WARNING|ERROR|CRITICAL)\b),
(2) if those method/constant usages exist, do NOT drop the original import
logging — instead insert or preserve both imports (keep "import logging" and add
"import structlog") and convert call sites of logging.debug/info/... to
structlog.get_logger().debug/info/... while leaving logging.<CONSTANT> and
functions like basicConfig/setLevel untouched, and (3) add try/except around
file_path.read_text(...) and file_path.write_text(...) in migrate_file to catch
IO/encoding/permission errors and return False (or surface the error) so the
script won’t crash.

Comment on lines 33 to 40
try:
async with httpx.AsyncClient() as client:
response = await client.post(
self.endpoint, json={"query": query, "variables": variables}, headers=headers
)
response.raise_for_status()

return response.json()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing timeout and GraphQL-level error handling.

Two concerns:

  1. No timeout configured: The httpx.AsyncClient has no timeout, risking indefinite hangs. The PR objectives mention "default agent execution timeout set to 60s" — consider aligning HTTP timeouts.

  2. GraphQL errors not checked: GitHub returns errors in the response body with HTTP 200. The errors field should be inspected.

🔧 Proposed fix
         try:
-            async with httpx.AsyncClient() as client:
+            async with httpx.AsyncClient(timeout=30.0) as client:
                 response = await client.post(
                     self.endpoint, json={"query": query, "variables": variables}, headers=headers
                 )
                 response.raise_for_status()

-                return response.json()
+                data = response.json()
+                if "errors" in data and data["errors"]:
+                    logger.error("graphql_query_errors", errors=data["errors"])
+                return data
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
async with httpx.AsyncClient() as client:
response = await client.post(
self.endpoint, json={"query": query, "variables": variables}, headers=headers
)
response.raise_for_status()
return response.json()
try:
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.post(
self.endpoint, json={"query": query, "variables": variables}, headers=headers
)
response.raise_for_status()
data = response.json()
if "errors" in data and data["errors"]:
logger.error("graphql_query_errors", errors=data["errors"])
return data
🤖 Prompt for AI Agents
In `@src/integrations/github/graphql.py` around lines 33 - 40, The request code
using httpx.AsyncClient lacks a timeout and doesn't inspect GraphQL-level
errors; update the call that posts to self.endpoint (the block using
httpx.AsyncClient, response, query, variables, headers) to use a 60-second
timeout (e.g., pass timeout=60 to AsyncClient or the post call) and after
obtaining response.json() check for an "errors" key and raise or return a
meaningful error containing those details (include context like the
query/variables or endpoint) instead of silently returning the payload when
errors are present; ensure response.raise_for_status() remains to handle HTTP
errors and add a clear exception path for GraphQL errors.

Comment on lines +62 to +81
class PullRequest(BaseModel):
"""
GitHub Pull Request Data Representation.
Maps GraphQL response fields to domain logic requirements.
"""

number: int
title: str
body: str
changed_files: int = Field(alias="changedFiles")
additions: int
deletions: int
merged_at: str | None = Field(None, alias="mergedAt")
author: Actor | None
comments: CommentConnection = Field(default_factory=lambda: CommentConnection(totalCount=0))
closing_issues_references: IssueConnection = Field(alias="closingIssuesReferences")
reviews: ReviewConnection = Field(alias="reviews")
commits: CommitConnection = Field(alias="commits")
files: FileConnection = Field(default_factory=lambda: FileConnection(edges=[]))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check Pydantic version
rg "pydantic" pyproject.toml requirements.txt setup.py 2>/dev/null | head -10

Repository: warestack/watchflow

Length of output: 100


🏁 Script executed:

# Examine the models.py file to understand its structure
wc -l src/integrations/github/models.py

Repository: warestack/watchflow

Length of output: 99


🏁 Script executed:

# Search for ConfigDict and Pydantic configurations
rg -n "ConfigDict|populate_by_name|allow_population_by_field_name|model_config" src/integrations/github/models.py -A 2

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Look at the CommentConnection model definition
rg -n "class CommentConnection" src/integrations/github/models.py -A 10

Repository: warestack/watchflow

Length of output: 353


🏁 Script executed:

# Check all Field definitions with aliases to see patterns
rg -n "Field\(.*alias" src/integrations/github/models.py

Repository: warestack/watchflow

Length of output: 571


🏁 Script executed:

# Get the exact content around line 76 to verify the issue
sed -n '70,81p' src/integrations/github/models.py

Repository: warestack/watchflow

Length of output: 630


🏁 Script executed:

# Check if there are any Pydantic ConfigDict usages elsewhere in the codebase
rg "ConfigDict|populate_by_name|model_config" src/integrations/ -l

Repository: warestack/watchflow

Length of output: 97


🏁 Script executed:

# Get the full CommentConnection definition and surrounding context
sed -n '55,85p' src/integrations/github/models.py

Repository: warestack/watchflow

Length of output: 1007


🏁 Script executed:

# Check the FileConnection model definition
sed -n '40,65p' src/integrations/github/models.py

Repository: warestack/watchflow

Length of output: 514


🏁 Script executed:

# Look for all BaseModel definitions and check if any have ConfigDict
sed -n '1,30p' src/integrations/github/models.py

Repository: warestack/watchflow

Length of output: 508


🏁 Script executed:

# Check the schemas.py file to see how ConfigDict is used there
cat src/integrations/github/schemas.py

Repository: warestack/watchflow

Length of output: 872


Fix Pydantic alias configuration for model instantiation in default factories.

Line 76 instantiates CommentConnection(totalCount=0) using the alias name, but in Pydantic v2, models require either the field name (total_count=0) or populate_by_name=True configuration to accept aliases during instantiation. Without this, instantiation will fail with a validation error.

Add model_config = ConfigDict(populate_by_name=True) to CommentConnection (and other models with aliases if they're similarly instantiated):

from pydantic import BaseModel, ConfigDict, Field

class CommentConnection(BaseModel):
    model_config = ConfigDict(populate_by_name=True)
    
    total_count: int = Field(alias="totalCount")

Alternatively, use the field name directly: CommentConnection(total_count=0).

🤖 Prompt for AI Agents
In `@src/integrations/github/models.py` around lines 62 - 81, The default
factories are instantiating Pydantic models with alias names (e.g.,
CommentConnection(totalCount=0), FileConnection(edges=[])) which fails under
Pydantic v2; either add model_config = ConfigDict(populate_by_name=True) to the
affected models (e.g., CommentConnection, FileConnection, IssueConnection,
ReviewConnection, CommitConnection) so they accept alias keys on instantiation,
or change the default factories to use the Python field names (e.g.,
total_count=0, edges=[]) when creating instances; update the model definitions
(CommentConnection, FileConnection, etc.) or the default_factory calls
accordingly to resolve the validation error.

Comment on lines +52 to +60
rules = []
for rule_data in rules_data["rules"]:
try:
rule = GitHubRuleLoader._parse_rule(rule_data)
if rule:
rules.append(rule)
except Exception as e:
rule_description = rule_data.get("description", "unknown")
logger.error(f"Error parsing rule {rule_description}: {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard malformed rule entries to prevent parser crash.

If a rule entry isn't a dict, the exception handler itself raises (rule_data.get), which aborts the whole load instead of skipping the bad entry. Add a type guard before parsing.

🛠️ Proposed fix
             rules = []
             for rule_data in rules_data["rules"]:
+                if not isinstance(rule_data, dict):
+                    logger.error("Invalid rule entry; expected mapping", entry=rule_data)
+                    continue
                 try:
                     rule = GitHubRuleLoader._parse_rule(rule_data)
                     if rule:
                         rules.append(rule)
                 except Exception as e:
                     rule_description = rule_data.get("description", "unknown")
                     logger.error(f"Error parsing rule {rule_description}: {e}")
                     continue
🤖 Prompt for AI Agents
In `@src/integrations/github/rule_loader.py` around lines 52 - 60, The loader
iterates rules_data["rules"] but assumes each rule_data is a dict; when it's
not, calling rule_data.get in the except block raises and aborts loading. Add a
type guard before parsing inside the loop: skip non-dict entries (e.g., if not
isinstance(rule_data, dict): log a warning with the raw value/context and
continue), only call GitHubRuleLoader._parse_rule for dicts, and in the except
block reference safe fields (like rule_data.get("description") only after
checking type) or include the repr(rule_data) in logger.error to avoid further
errors.

Comment on lines +16 to +22
await github_client.create_pull_request_comment(
repo=repo_full_name,
pr_number=pr_number,
comment=validation_result["message"],
installation_id=installation_id,
)
logger.info(f"Posted validation result to PR #{pr_number} in {repo_full_name}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Only log success if the PR comment was created.

Line 22 logs success unconditionally, but create_pull_request_comment can return {} on failure. Consider logging based on the response to avoid misleading telemetry.

🛠️ Proposed fix
-        await github_client.create_pull_request_comment(
+        result = await github_client.create_pull_request_comment(
             repo=repo_full_name,
             pr_number=pr_number,
             comment=validation_result["message"],
             installation_id=installation_id,
         )
-        logger.info(f"Posted validation result to PR #{pr_number} in {repo_full_name}")
+        if result:
+            logger.info(f"Posted validation result to PR #{pr_number} in {repo_full_name}")
+        else:
+            logger.error(f"Failed to post validation result to PR #{pr_number} in {repo_full_name}")
🤖 Prompt for AI Agents
In `@src/integrations/github/rules_service.py` around lines 16 - 22, The
logger.info call logs success unconditionally even though
github_client.create_pull_request_comment may return {} on failure; update the
code that calls create_pull_request_comment (the call to
github_client.create_pull_request_comment and the subsequent logger.info) to
inspect the response (e.g., check truthiness or presence of an id/key) and only
log "Posted validation result..." when the response indicates a successful
creation, otherwise log an error or warning including the failed response and
the original validation_result["message"] for diagnostics.

Comment on lines 199 to 222
@pytest.mark.asyncio
async def test_allowed_hours_condition():
condition = AllowedHoursCondition()

# Mock datetime
# We can't easily mock datetime.now() because it's a built-in type method.
# But the code does: datetime.now(tz)
# We can patch datetime in the module.

with patch("src.rules.validators.datetime") as mock_datetime:
mock_datetime.now.return_value.hour = 10
mock_datetime.side_effect = datetime # To allow other usage if needed? No, generic mock is risky.
# Better: mock the whole class but that's hard.
# Alternative: use freezegun or simple patch of 'src.rules.validators.datetime' reference?
# The file imports `datetime` classes: `from datetime import datetime`.
# So we patch `src.rules.validators.datetime`.

mock_dt = MagicMock()
mock_dt.now.return_value.hour = 10

with patch("src.rules.validators.datetime", mock_dt):
assert await condition.validate({"allowed_hours": [9, 10, 11]}, {}) is True
assert await condition.validate({"allowed_hours": [12, 13]}, {}) is False

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead code and problematic mock setup in test_allowed_hours_condition.

Lines 208-218 have issues:

  1. The outer patch context at line 208 is immediately overwritten by the inner patch at line 219
  2. Line 210's mock_datetime.side_effect = datetime doesn't make sense for mocking

Only the inner patch (lines 219-221) takes effect.

Simplified test
 `@pytest.mark.asyncio`
 async def test_allowed_hours_condition():
     condition = AllowedHoursCondition()
 
-    # Mock datetime
-    # We can't easily mock datetime.now() because it's a built-in type method.
-    # But the code does: datetime.now(tz)
-    # We can patch datetime in the module.
-
-    with patch("src.rules.validators.datetime") as mock_datetime:
-        mock_datetime.now.return_value.hour = 10
-        mock_datetime.side_effect = datetime  # To allow other usage if needed? No, generic mock is risky.
-        # Better: mock the whole class but that's hard.
-        # Alternative: use freezegun or simple patch of 'src.rules.validators.datetime' reference?
-        # The file imports `datetime` classes: `from datetime import datetime`.
-        # So we patch `src.rules.validators.datetime`.
-
-        mock_dt = MagicMock()
-        mock_dt.now.return_value.hour = 10
-
-        with patch("src.rules.validators.datetime", mock_dt):
-            assert await condition.validate({"allowed_hours": [9, 10, 11]}, {}) is True
-            assert await condition.validate({"allowed_hours": [12, 13]}, {}) is False
+    mock_dt = MagicMock()
+    mock_dt.now.return_value.hour = 10
+
+    with patch("src.rules.validators.datetime", mock_dt):
+        assert await condition.validate({"allowed_hours": [9, 10, 11]}, {}) is True
+        assert await condition.validate({"allowed_hours": [12, 13]}, {}) is False
🤖 Prompt for AI Agents
In `@tests/unit/rules/test_validators.py` around lines 199 - 222, The
test_allowed_hours_condition has redundant/incorrect mocking: remove the outer
with patch("src.rules.validators.datetime") block and the line setting
mock_datetime.side_effect = datetime; instead, patch the module-level datetime
reference once (patch("src.rules.validators.datetime", mock_dt)) where mock_dt
is a MagicMock whose now.return_value.hour is set (e.g.,
mock_dt.now.return_value.hour = 10), then call AllowedHoursCondition.validate
with the two assertions as shown; this ensures the module's datetime used by
AllowedHoursCondition is correctly mocked and eliminates the dead/overwritten
mock setup.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/integrations/providers/bedrock_provider.py (1)

145-177: Remove Amazon and Meta profile matching until provider-aware routing is implemented.

The code currently returns Amazon and Meta inference profile ARNs from _find_inference_profile, but these are unconditionally passed to _get_anthropic_inference_profile_client at line 35, which always creates an Anthropic-only client. The AnthropicBedrockWrapper._generate method hardcodes Anthropic's message API (self.anthropic_client.messages.create()) and will fail at runtime for Amazon Titan or Meta Llama models.

While provider-aware routing exists in _get_standard_bedrock_client (lines 119–124), it is never called for inference profiles. Until routing is added for inference profiles, restrict matching to Anthropic profiles to prevent silent failures.

🛠️ Safe interim fix
                is_amazon = any(k in profile_name for k in ["amazon", "titan", "nova"])
                if is_amazon and ("amazon" in model_lower or "titan" in model_lower or "nova" in model_lower):
                    return profile_arn

                is_meta = any(k in profile_name for k in ["meta", "llama"])
                if is_meta and ("meta" in model_lower or "llama" in model_lower):
                    return profile_arn

Replace with comment noting provider routing is needed.

src/tasks/scheduler/deployment_scheduler.py (1)

117-133: Fix timezone mismatch when created_at is a timestamp.

current_time is UTC-aware, but datetime.fromtimestamp(created_at) is naive, which will raise on subtraction.

🛠️ Suggested fix
-                    if isinstance(created_at, int | float):
-                        # Convert timestamp to datetime
-                        created_at = datetime.fromtimestamp(created_at)
+                    if isinstance(created_at, int | float):
+                        # Convert timestamp to UTC-aware datetime
+                        created_at = datetime.fromtimestamp(created_at, tz=UTC)
+                    elif isinstance(created_at, datetime) and created_at.tzinfo is None:
+                        created_at = created_at.replace(tzinfo=UTC)
src/webhooks/handlers/check_run.py (1)

21-35: Handle enqueue return value and duplicate path.

TaskQueue.enqueue returns bool. The current response implies a task ID and “queued” even when enqueue returns False (duplicate).

💡 Suggested fix
-        task_id = await task_queue.enqueue(
+        enqueued = await task_queue.enqueue(
             CheckRunProcessor().process,
             event_type="check_run",
             repo_full_name=event.repo_full_name,
             installation_id=event.installation_id,
             payload=event.payload,
         )
-
-        logger.info(f"✅ Check run event enqueued with task ID: {task_id}")
+        if not enqueued:
+            return WebhookResponse(
+                status="ignored",
+                detail=f"Check run event for {event.repo_full_name} already queued",
+                event_type=EventType.CHECK_RUN,
+            )
+        logger.info("check_run_event_enqueued", repo_full_name=event.repo_full_name)
 
         return WebhookResponse(
             status="ok",
-            detail=f"Check run event has been queued for processing with task ID: {task_id}",
+            detail="Check run event queued for processing",
             event_type=EventType.CHECK_RUN,
         )
src/webhooks/handlers/deployment_status.py (1)

38-52: Handle enqueue return value and duplicate path.

TaskQueue.enqueue returns bool; the current logic logs a “task ID” and returns “ok” even if enqueue returns False.

💡 Suggested fix
-        task_id = await task_queue.enqueue(
+        enqueued = await task_queue.enqueue(
             DeploymentStatusProcessor().process,
             event_type="deployment_status",
             repo_full_name=repo_full_name,
             installation_id=installation_id,
             payload=payload,
         )
-
-        logger.info(f"✅ Deployment status event enqueued with task ID: {task_id}")
+        if not enqueued:
+            return WebhookResponse(
+                status="ignored",
+                detail=f"Deployment status event for {repo_full_name} already queued",
+                event_type=EventType.DEPLOYMENT_STATUS,
+            )
+        logger.info("deployment_status_event_enqueued", repo_full_name=repo_full_name)
src/event_processors/violation_acknowledgment.py (1)

408-456: Type mismatch: agent returns dicts but code expects Violation objects.

The _evaluate_acknowledgment method returns acknowledgable_violations and require_fixes from the agent response (lines 371-372), which are lists of dicts. However:

  1. Type hints on lines 412-413 declare list[Violation]
  2. Lines 430-432 and 447-456 use getattr(violation, "message", ...) which will always return the default value for dicts (dicts don't have attributes)

This means violation details won't be correctly extracted in comments.

Proposed fix

Either convert dicts to Violation objects in _evaluate_acknowledgment:

             acknowledgable_violations = evaluation_data.get("acknowledgable_violations", [])
             require_fixes = evaluation_data.get("require_fixes", [])
+            # Convert dicts to Violation objects
+            acknowledgable_violations = [Violation.model_validate(v) if isinstance(v, dict) else v for v in acknowledgable_violations]
+            require_fixes = [Violation.model_validate(v) if isinstance(v, dict) else v for v in require_fixes]

Or update the helper methods to handle both dicts and objects:

-                message = getattr(violation, "message", "Rule violation detected")
+                message = violation.get("message") if isinstance(violation, dict) else getattr(violation, "message", "Rule violation detected")
🤖 Fix all issues with AI agents
In `@src/agents/engine_agent/nodes.py`:
- Around line 163-167: The loop over validator_rules currently skips rules with
strategy VALIDATOR that lack conditions, causing silent misses; update the
handling in the block that builds validator_tasks (the loop iterating
validator_rules that calls _execute_conditions and appends to validator_tasks)
to detect rules with strategy == "VALIDATOR" but empty or missing
rule_desc.conditions and handle them explicitly: either (A) log a warning via
the module logger referencing the rule (e.g., rule_desc.id or name) and the
strategy selector decision, or (B) assert/fallback (raise a clear exception) so
the error is surfaced, or (C) add a prompt-guidance flag/path to ensure the LLM
never selects VALIDATOR without conditions—implement one of these behaviors so
that any VALIDATOR rule without conditions is not silently skipped.

In `@src/core/models.py`:
- Around line 107-112: There are two incompatible WebhookResponse models
(WebhookResponse in src/core/models.py and another in src/webhooks/models.py)
causing type/validation conflicts; pick one canonical shape (or rename one to a
distinct class) and update usages: unify the status literal values (choose
either Literal["ok","error","ignored"] or the router’s
"success"/"received"/"error" set) and normalize event_type to the same type
(EventType | None or str | None), then update all imports in handler functions,
BaseHandler, and router to reference the single model (or the newly renamed
types) so the schema and validation are consistent across modules.

In `@src/core/utils/logging.py`:
- Around line 17-19: The runtime NameError happens because Callable is only
imported under TYPE_CHECKING but is used in the runtime variable annotation for
log_fn (the line with "log_fn: Callable[..., Any] = getattr(logger_obj, level,
logger_obj.info)"); fix by making Callable available at runtime — either move
"from collections.abc import Callable" out of the TYPE_CHECKING guard so it is
imported unconditionally, or change the variable annotation to a string
("Callable[..., Any]") or remove the annotation entirely; update the annotation
near the getattr call (log_fn) and any other in-function variable annotations
that reference Callable.

In `@src/integrations/github/service.py`:
- Around line 108-131: The analyze_repository_rules loop currently treats every
non-200 as "missing" and never reaches the exception handlers; update the
per-file response handling in analyze_repository_rules (the loop over
files_to_check that builds found_files and hits self.BASE_URL) to explicitly
branch on resp.status_code: if 200 append filepath to found_files; if 404
continue quietly (file not found); if 401/403/429 log a descriptive error
including repo, filepath, status_code and response text (use the same logger
pattern as get_repository_metadata) and continue or bubble up if you prefer
rate-limit/auth to abort; for any other unexpected status log an error with
repo/file/status/response and do not silently swallow it; keep the existing
try/except for network/timeouts but ensure these new per-file branches are
present so auth and rate-limit failures are visible.

In `@src/rules/acknowledgment.py`:
- Around line 100-113: The info-level logging currently emits the full
comment_body and extracted reason (see logger.info call and the loop over
ACKNOWLEDGMENT_PATTERNS that computes reason from match.group), which can leak
sensitive data; change these to safe logs by moving full content to debug-level
and/or logging only metadata or a truncated/hashed snippet at info-level (e.g.,
use logger.debug for the initial "Extracting acknowledgment reason" message and
the "Pattern matched" message that includes reason, and at info only emit
non-sensitive status like "acknowledgment reason extracted" or a
sanitized/truncated reason). Ensure you update both the initial logger.info(f"🔍
Extracting acknowledgment reason from: '{comment_body}'") and the
logger.info(f"✅ Pattern {i + 1} matched! Reason: '{reason}'") usages, keeping
ACKNOWLEDGMENT_PATTERNS, comment_body, reason and the loop logic unchanged.

In `@src/rules/conditions/access_control.py`:
- Around line 134-140: The call to is_critical_file (which calls load_codeowners
and reads CODEOWNERS from disk) will always fail in webhook contexts; update the
code and flow so is_critical_file can operate on provided CODEOWNERS content
instead of reading files: create a new API-compatible function (e.g.,
is_critical_file_from_content or extend is_critical_file to accept a
codeowners_content parameter) in src.rules.utils.codeowners, change
CodeOwnersCondition (the block building critical_files using is_critical_file)
to accept and pass the CODEOWNERS content (fetched during event enrichment via
the GitHub API) or, if content is not provided, log/warn and treat the condition
as unsupported; ensure callers reference the new parameter and update any
imports/usages accordingly.

In `@src/rules/conditions/filesystem.py`:
- Around line 118-128: _update _get_changed_files to actually extract files from
the event instead of returning empty lists: for pull_request and push events
read enriched event fields (e.g., event.get("files") or event.get("commits") as
used by MaxFileSizeCondition) and return a list of file paths; ensure
FilePatternCondition.evaluate and FilePatternCondition.validate receive the
populated list so they don't always report "No files available" or return False;
alternatively, if enrichment may be missing, add a clear placeholder log/error
and guard clauses to avoid masking real validation with empty lists. References:
_get_changed_files, FilePatternCondition.evaluate,
FilePatternCondition.validate, MaxFileSizeCondition.

In `@src/rules/conditions/pull_request.py`:
- Around line 118-122: The code currently does
int(parameters.get("min_description_length", 1)) which will raise ValueError for
non-numeric config values; update the logic around parameters/context handling
to validate and coerce min_description_length safely (e.g., read raw =
parameters.get("min_description_length", 1), attempt to int(raw) inside a
try/except or use str.isdigit-like check) and fallback to a safe default (1)
when conversion fails, logging or otherwise handling the bad value; apply the
same defensive change both where min_length is set (the block using
min_description_length around the creation of min_length) and the second
occurrence near lines 160-161 so both uses are protected.

In `@src/rules/registry.py`:
- Around line 68-95: get_conditions_for_parameters currently instantiates
condition classes without passing any values so condition logic that depends on
configured parameters is lost; update it to extract the subset of parameters
that match condition_cls.parameter_patterns and pass them into the condition
(e.g., via condition_cls(**matched_params)) or call a factory like
condition_cls.from_parameters(matched_params) when available, falling back to
parameterless instantiation only if no matching keys are present; reference
get_conditions_for_parameters, AVAILABLE_CONDITIONS,
condition_cls.parameter_patterns, and consider adding/using a from_parameters
classmethod on condition classes to support structured initialization.

In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 352-365: When converting old rule format in the block that
inspects first_rule and builds converted_rules, include the original rule's
"enabled" flag (or a sensible default) so conversion preserves behavior;
specifically, update the conversion for each rule in converted_rules to set
"enabled": rule.get("enabled", True) (i.e., carry through the existing enabled
value if present, otherwise default to True) to avoid enforcement/default
behavior drift.

In `@src/webhooks/handlers/pull_request.py`:
- Around line 15-44: The handler currently only logs and returns success; wire
it to the real processor or task queue before returning by invoking the
pull-request processing routine and awaiting its result (or enqueuing the event)
instead of immediately returning WebhookResponse; specifically replace the no-op
section in handle with a call to the module referenced in comments (e.g.,
event_processors.process_pull_request(event) or TaskQueue.enqueue(task=event))
and propagate success/error into the returned WebhookResponse (use
EventType.PULL_REQUEST and include error details on failure), ensuring
exceptions remain caught by the existing try/except around handle.

In `@src/webhooks/handlers/push.py`:
- Around line 15-43: The push handler currently logs then returns without
invoking any processing; update the handle method to call the appropriate
processor or enqueue a processing task (e.g., call
event_processors.process_push_event(event) or push_queue.enqueue(event)) before
returning success: locate async def handle(self, event: WebhookEvent) in
push.py, add the await to call the existing processor function (or create and
import one from event_processors) or push the event to your task queue API,
handle ImportError as before if the processor module is absent, and surface
processor errors into the existing exception branch so the response accurately
reflects processing outcome.
- Around line 1-6: There are two incompatible WebhookResponse model definitions;
standardize on the core definition in src.core.models by removing or deprecating
the duplicate in src.webhooks.models, ensure WebhookResponse uses status:
Literal["ok","error","ignored"] with default "ok" and event_type: EventType |
None (not str), and update all imports (e.g., in the router and any handler
files that currently import from src.webhooks.models) to import WebhookResponse
and EventType from src.core.models; also run tests/type checks to confirm no
remaining references to the old model.

In `@tests/unit/core/test_models.py`:
- Around line 47-61: The test passes extra fields not present on the
Acknowledgment model; update test_valid_acknowledgment to construct
Acknowledgment using only the actual model fields (rule_id, reason, commenter)
and rely on timestamp being set by the model, e.g., remove violations and
pull_request_id from the constructor, and keep test_required_fields (which
asserts ValidationError for missing required fields) as-is to validate missing
reason/commenter behavior; reference Acknowledgment, test_valid_acknowledgment,
and test_required_fields when making the change.
🟡 Minor comments (23)
src/integrations/providers/openai_provider.py-26-28 (1)

26-28: # type: ignore placement is incorrect and won't suppress the mypy error.

The # type: ignore[call-arg] directive must be on the same line as the code it's suppressing. Placing it on a separate line (line 27) above max_tokens=self.max_tokens, (line 28) has no effect—mypy will still report the error.

🐛 Proposed fix
         return ChatOpenAI(
             model=self.model,
-            # mypy complains about max_tokens but it is valid for ChatOpenAI
-            # type: ignore[call-arg]
-            max_tokens=self.max_tokens,
+            max_tokens=self.max_tokens,  # type: ignore[call-arg]  # valid for ChatOpenAI
             temperature=self.temperature,
             api_key=self.kwargs.get("api_key"),
             **{k: v for k, v in self.kwargs.items() if k != "api_key"},
         )
src/core/utils/caching.py-161-163 (1)

161-163: Edge case: ttl=0 is treated as "no TTL provided".

The condition if ttl treats ttl=0 as falsy, falling through to TTLCache with a 3600s TTL. If ttl=0 is ever passed intentionally (e.g., for immediate expiry or testing), this behavior may be unexpected.

Consider using explicit None check:

Proposed fix
     if cache is None:
-        # SIM108: Use ternary operator
-        cache = AsyncCache(maxsize=maxsize, ttl=ttl) if ttl else TTLCache(maxsize=maxsize, ttl=ttl or 3600)
+        # SIM108: Use ternary operator
+        cache = AsyncCache(maxsize=maxsize, ttl=ttl) if ttl is not None else TTLCache(maxsize=maxsize, ttl=3600)
src/tasks/task_queue.py-153-159 (1)

153-159: Incorrect attempt count in failure log.

Line 157 uses task.retry_count + 1 which is always 1 since retry_count is never incremented. This makes the failure log misleading. Use attempt + 1 to reflect the actual number of attempts made.

🐛 Proposed fix
         logger.error(
             "task_failed",
             task_id=task.task_id,
             error=str(last_error),
-            attempts=min(task.retry_count + 1, MAX_RETRIES + 1),
+            attempts=attempt + 1,
             exc_info=True,
         )
src/tasks/task_queue.py-161-173 (1)

161-173: Misleading task_completed log for failed tasks.

_execute_with_retry logs task_failed when all retries are exhausted but doesn't raise an exception. The worker then logs task_completed at line 168 regardless of outcome. This produces contradictory logs: task_failed followed by task_completed for the same task.

Consider having _execute_with_retry return a success boolean or raise on permanent failure so the worker can log appropriately.

♻️ Suggested approach
-    async def _execute_with_retry(self, task: Task) -> None:
+    async def _execute_with_retry(self, task: Task) -> bool:
         """Execute task with exponential backoff retry for transient failures."""
         last_error: Exception | None = None

         for attempt in range(MAX_RETRIES + 1):
             try:
                 await task.func(*task.args, **task.kwargs)
                 if attempt > 0:
                     logger.info("task_retry_succeeded", task_id=task.task_id, attempt=attempt + 1)
-                return
+                return True
             except Exception as e:
                 last_error = e
                 if attempt < MAX_RETRIES and _is_transient_error(e):
                     # ... backoff logic ...
                     await asyncio.sleep(backoff)
                 else:
                     break

         logger.error(
             "task_failed",
             task_id=task.task_id,
             error=str(last_error),
-            attempts=min(task.retry_count + 1, MAX_RETRIES + 1),
+            attempts=attempt + 1,
             exc_info=True,
         )
+        return False

Then in _worker:

     async def _worker(self) -> None:
         """Background worker loop."""
         while True:
             task = await self.queue.get()
             try:
                 logger.info("task_started", task_id=task.task_id, event_type=task.event_type)
-                await self._execute_with_retry(task)
-                logger.info("task_completed", task_id=task.task_id)
+                success = await self._execute_with_retry(task)
+                if success:
+                    logger.info("task_completed", task_id=task.task_id)
             except Exception as e:
                 logger.error("task_worker_error", task_id=task.task_id, error=str(e), exc_info=True)
             finally:
                 self.queue.task_done()
src/core/config/settings.py-148-157 (1)

148-157: Normalize provider to lowercase before validation.
If AI_PROVIDER is set as Vertex_AI or OpenAI, the current checks can be skipped. Normalize once and use the normalized value for all provider validations.

🔧 Proposed normalization
-        if self.ai.provider == "openai" and not self.ai.api_key:
+        provider = self.ai.provider.lower()
+        if provider == "openai" and not self.ai.api_key:
             errors.append("OPENAI_API_KEY is required for OpenAI provider")
@@
-        if self.ai.provider == "bedrock" and not self.ai.bedrock_model_id:
+        if provider == "bedrock" and not self.ai.bedrock_model_id:
             errors.append("BEDROCK_MODEL_ID is required for Bedrock provider")
@@
-        if self.ai.provider in vertex_aliases and not self.ai.vertex_ai_model:
+        if provider in vertex_aliases and not self.ai.vertex_ai_model:
             errors.append("VERTEX_AI_MODEL is required for Google Vertex AI provider")
src/core/config/settings.py-114-119 (1)

114-119: Validate new rate-limit/timeout values are positive.
These values are now configurable, but invalid (≤0) values can break rate limiting or timeouts. Add validation in validate() to fail fast on bad env input.

✅ Proposed validation guard
         if self.ai.provider == "openai" and not self.ai.api_key:
             errors.append("OPENAI_API_KEY is required for OpenAI provider")
+
+        if self.anonymous_rate_limit <= 0:
+            errors.append("ANONYMOUS_RATE_LIMIT must be a positive integer")
+        if self.authenticated_rate_limit <= 0:
+            errors.append("AUTHENTICATED_RATE_LIMIT must be a positive integer")
+        if self.analysis_timeout <= 0:
+            errors.append("ANALYSIS_TIMEOUT must be a positive integer")
src/core/config/provider_config.py-6-6 (1)

6-6: Remove redundant typing.cast import and usage.

cast("int"/"float", ...) uses non-idiomatic string type arguments (reserved for forward references, not built-in types). More importantly, the cast is unnecessary—agent_config.max_tokens and agent_config.temperature are already correctly typed as int and float by the AgentConfig dataclass definition, and the isinstance() check confirms the type. Remove the import and cast entirely; int() and float() calls alone are sufficient (or can be removed as well).

Proposed fix
-from typing import cast
-            if agent_config and isinstance(agent_config, AgentConfig) and hasattr(agent_config, "max_tokens"):
-                return int(cast("int", agent_config.max_tokens))
+            if agent_config and isinstance(agent_config, AgentConfig) and hasattr(agent_config, "max_tokens"):
+                return int(agent_config.max_tokens)
-            if agent_config and isinstance(agent_config, AgentConfig) and hasattr(agent_config, "temperature"):
-                return float(cast("float", agent_config.temperature))
+            if agent_config and isinstance(agent_config, AgentConfig) and hasattr(agent_config, "temperature"):
+                return float(agent_config.temperature)
src/agents/feasibility_agent/agent.py-2-2 (1)

2-2: Minor typo in docstring.

There's an extra space: "error handling and" should be "error handling and".

✏️ Proposed fix
-Rule Feasibility Agent implementation with error handling  and retry logic.
+Rule Feasibility Agent implementation with error handling and retry logic.
tests/conftest.py-55-62 (1)

55-62: Remove the deprecated event_loop fixture.

Custom event_loop fixtures are deprecated in pytest-asyncio >= 0.23. Since asyncio_mode = "auto" is already configured in pyproject.toml (line 113), this fixture is redundant and will trigger deprecation warnings in future versions.

Remove this fixture entirely. If you need session-scoped async fixtures, explicitly set loop_scope = "session" on those fixtures instead of relying on a custom event loop fixture.

src/presentation/github_formatter.py-77-90 (1)

77-90: Include INFO severity instead of folding it into LOW.

Severity.INFO currently gets bucketed into low (and omitted from the summary), which misrepresents severity in both check runs and comments.

🔧 Suggested fix (apply to both formatters)
-    severity_order = ["critical", "high", "medium", "low"]
+    severity_order = ["critical", "high", "medium", "low", "info"]
     severity_groups: dict[str, list[Violation]] = {s: [] for s in severity_order}
@@
-        else:
-            # Fallback for unexpected severities
-            if "low" not in severity_groups:
-                severity_groups["low"] = []
-            severity_groups["low"].append(violation)
+        else:
+            # Fallback for unexpected severities
+            severity_groups.setdefault("info", []).append(violation)

Also applies to: 125-133

tests/unit/integrations/github/test_check_runs.py-48-60 (1)

48-60: Test uses incorrect acknowledgments type but would pass at runtime.

The test passes acknowledgments = {"Rule 1": "Reason"} (a dict[str, str]), but the formatter function in src/presentation/github_formatter.py line 187 expects dict[str, Acknowledgment] based on the model in src/core/models.py. However, the acknowledgments parameter is never accessed in the formatter's implementation, so the test would pass at runtime. Update the test to use proper Acknowledgment instances for type consistency and to match production expectations.

src/agents/acknowledgment_agent/agent.py-212-234 (1)

212-234: Truthiness check may reject valid empty collections.

Line 222 uses if ... and pr_data and ... and rules: which will short-circuit to False if pr_data={} or rules=[] are passed. If these are valid inputs (e.g., no rules to evaluate), the current check incorrectly rejects them.

Consider using explicit is not None checks for optional arguments:

Suggested fix using explicit None checks
-        if acknowledgment_reason and violations and pr_data and commenter and rules:
+        if (
+            acknowledgment_reason is not None
+            and violations is not None
+            and pr_data is not None
+            and commenter is not None
+            and rules is not None
+        ):

Alternatively, if empty collections should indeed be rejected, consider adding explicit validation with clearer error messages indicating which argument was empty vs. missing.

src/agents/acknowledgment_agent/models.py-46-46 (1)

46-46: Type mismatch between model and agent method signature.

AcknowledgmentContext.violations is typed as list[Violation], but the evaluate_acknowledgment method (line 80 of agent.py) accepts violations: list[dict[str, Any]]. This creates a type inconsistency: the _evaluate_node method passes state.violations (which is list[Violation] from the AcknowledgmentContext state) directly to evaluate_acknowledgment, which expects dicts. The _find_violation_by_rule_description method also expects dicts and calls .get() on each violation, which would fail on Violation objects.

Either update evaluate_acknowledgment and _find_violation_by_rule_description to accept list[Violation], or convert Violation objects to dicts before passing them to these methods.

src/rules/conditions/workflow.py-35-39 (1)

35-39: Coerce minutes to numeric to avoid TypeError.

If parameters["minutes"] is a string (common for YAML), duration_minutes > max_minutes will raise.

🔧 Suggested fix
-        max_minutes = parameters.get("minutes", 3)
+        try:
+            max_minutes = float(parameters.get("minutes", 3))
+        except (TypeError, ValueError):
+            logger.debug(
+                "WorkflowDurationCondition: Invalid minutes parameter",
+                value=parameters.get("minutes"),
+            )
+            max_minutes = 3.0
src/integrations/github/check_runs.py-52-59 (1)

52-59: Inconsistent check run naming.

create_check_run uses "Watchflow Rules" (line 55) while create_acknowledgment_check_run uses "watchflow-rules" (line 108). This inconsistency will display as separate check runs in GitHub's UI, potentially confusing users.

🛠️ Proposed fix - standardize naming
             await self.github_client.create_check_run(
                 repo=repo,
                 sha=sha,
-                name="watchflow-rules",
+                name="Watchflow Rules",
                 status="completed",

Also applies to: 105-117

src/integrations/github/rules_service.py-97-97 (1)

97-97: rule_data.get('id') references a field that doesn't exist in Rule model.

The Rule model has no id field—description is the primary identifier. Consider using description for error reporting.

🛠️ Proposed fix
-                        f"❌ **Rule #{i + 1} (`{rule_data.get('id', 'N/A')}`) failed validation**\n\n"
+                        f"❌ **Rule #{i + 1} (`{rule_data.get('description', 'N/A')[:50]}`) failed validation**\n\n"
src/integrations/github/rules_service.py-39-41 (1)

39-41: YAML example references non-existent id and name fields.

The example YAML shows id: and name: fields, but according to src/rules/models.py, the Rule model uses description as the primary identifier and has no id or name fields. This inconsistency will confuse users.

🛠️ Proposed fix
-                    "   ```yaml\n   rules:\n     - id: pr-approval-required\n       name: PR Approval Required\n       description: All pull requests must have at least 2 approvals\n       enabled: true\n       severity: high\n       event_types: [pull_request]\n       parameters:\n         min_approvals: 2\n   ```\n\n"
+                    "   ```yaml\n   rules:\n     - description: All pull requests must have at least 2 approvals\n       enabled: true\n       severity: high\n       event_types: [pull_request]\n       parameters:\n         min_approvals: 2\n   ```\n\n"
src/rules/models.py-56-61 (1)

56-61: Verify that Rule.rules serialization with BaseCondition objects is never attempted.

BaseCondition is an ABC (not a Pydantic model), so while arbitrary_types_allowed=True permits storing these objects, calling model_dump_json() on Rule or any state containing BaseCondition instances will fail with a PydanticSerializationError. Currently, state.model_dump() is called in the workflow's final node (nodes.py:397), but the rules field is never extracted or exposed in API responses, so the issue doesn't manifest. However, this is a latent bug—if the rules field is ever serialized to JSON (e.g., for caching, logging, or future API changes), it will fail. Consider either excluding the rules field from serialization via model_config, or storing condition type + parameters separately for serialization purposes.

src/webhooks/handlers/issue_comment.py-87-96 (1)

87-96: WebhookResponse should reflect enqueue outcome.

enqueue returns False on duplicates, but the response always says “enqueued,” which can mislead users.

✅ Align response with enqueue result
                 result = await task_queue.enqueue(
                     process_acknowledgment,
                     "violation_acknowledgment",
                     ack_payload,
                 )
                 logger.info(f"✅ Acknowledgment comment enqueued: {result}")
-                return WebhookResponse(
-                    status="ok",
-                    detail=f"Acknowledgment enqueued with reason: {ack_reason}",
-                )
+                if not result:
+                    return WebhookResponse(status="ignored", detail="Duplicate acknowledgment ignored")
+                return WebhookResponse(
+                    status="ok",
+                    detail=f"Acknowledgment enqueued with reason: {ack_reason}",
+                )
src/event_processors/push.py-80-101 (1)

80-101: Guard against non‑dict or non‑string violation payloads.

A malformed violation can trigger AttributeError on .get/.lower, silently dropping the entry. Add a type guard and cast.

🛡️ Harden violation parsing
-        for v in raw_violations:
+        for v in raw_violations:
+            if not isinstance(v, dict):
+                logger.error("Invalid violation payload: %r", v)
+                continue
             try:
-                severity_str = v.get("severity", "medium").lower()
+                severity_raw = v.get("severity", "medium")
+                severity_str = str(severity_raw).lower()
src/event_processors/deployment_protection_rule.py-119-123 (1)

119-123: Inconsistent violation extraction pattern compared to other processors.

Line 123 uses Violation.model_validate(v) to convert violations, but PullRequestProcessor (line 116) and ViolationAcknowledgmentProcessor (line 170) simply cast with list(eval_result.violations).

If eval_result.violations already contains Violation objects, model_validate is redundant. If it contains dicts, the other processors would fail. This inconsistency should be unified.

Proposed fix - align with other processors
             if hasattr(eval_result, "violations"):
-                    violations = [Violation.model_validate(v) for v in eval_result.violations]
+                    violations = list(eval_result.violations)

Or if conversion is needed, apply it consistently across all processors.

src/event_processors/pull_request/processor.py-65-66 (1)

65-66: Redundant installation_id check—already validated earlier.

This check on lines 65-66 is unreachable dead code. If installation_id were falsy, the function would have already returned at line 42-50.

Proposed fix
             # 1. Enrich event data
-            if not installation_id:
-                raise ValueError("Installation ID is required")
-
             event_data = await self.enricher.enrich_event_data(task, github_token)
src/event_processors/violation_acknowledgment.py-17-18 (1)

17-18: Module-level mutable acknowledged_prs set is unused and poses concurrency risks.

This set is declared at module level but never used in the class. If intended for future use, module-level mutable state can cause issues with concurrent requests in async contexts.

Proposed fix

If unused, remove it:

-# Add at the top
-acknowledged_prs: set[str] = set()

If needed, move it to instance state or use a proper cache with TTL.

Comment on lines +107 to +112
class WebhookResponse(BaseModel):
"""Standardized response from webhook handlers."""

status: Literal["ok", "error", "ignored"] = "ok"
detail: str | None = None
event_type: EventType | None = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all WebhookResponse definitions and usages
echo "=== WebhookResponse definitions ==="
rg -n "class WebhookResponse" --type=py

echo ""
echo "=== WebhookResponse usages ==="
rg -n "WebhookResponse" --type=py -C2

echo ""
echo "=== Check imports/exports from src/core/models.py ==="
rg -n "from.*models import|import.*models" src/core/ --type=py | head -20

echo ""
echo "=== Check imports/exports from src/webhooks/models.py ==="
rg -n "from.*models import|import.*models" src/webhooks/ --type=py | head -20

Repository: warestack/watchflow

Length of output: 22841


🏁 Script executed:

#!/bin/bash
# Get both WebhookResponse definitions
echo "=== src/core/models.py WebhookResponse ==="
sed -n '107,115p' src/core/models.py

echo ""
echo "=== src/webhooks/models.py WebhookResponse ==="
sed -n '31,42p' src/webhooks/models.py

echo ""
echo "=== Check if src/webhooks/models.py exports WebhookResponse ==="
rg -n "__all__|from.*WebhookResponse|import.*WebhookResponse" src/webhooks/models.py

Repository: warestack/watchflow

Length of output: 813


Consolidate duplicate WebhookResponse models to avoid type conflicts and confusion.

Two incompatible WebhookResponse models exist:

  • src/core/models.py (lines 107–112): status Literal["ok", "error", "ignored"], event_type EventType | None
  • src/webhooks/models.py (line 31): status str with values "success", "received", "error", event_type str | None

Handler functions and the base handler class import from src.core.models, while the router imports from src/webhooks/models. The incompatible status values and event_type types will cause silent failures or validation errors. Merge these models into a single definition or clearly separate them with distinct names if they serve different purposes.

🤖 Prompt for AI Agents
In `@src/core/models.py` around lines 107 - 112, There are two incompatible
WebhookResponse models (WebhookResponse in src/core/models.py and another in
src/webhooks/models.py) causing type/validation conflicts; pick one canonical
shape (or rename one to a distinct class) and update usages: unify the status
literal values (choose either Literal["ok","error","ignored"] or the router’s
"success"/"received"/"error" set) and normalize event_type to the same type
(EventType | None or str | None), then update all imports in handler functions,
BaseHandler, and router to reference the single model (or the newly renamed
types) so the schema and validation are consistent across modules.

Comment on lines +108 to +131
try:
async with httpx.AsyncClient(timeout=60.0) as client:
for filepath in files_to_check:
# Tricky: Public repos can be read without auth, Private need auth
# We use the 'contents' API
check_url = f"{self.BASE_URL}/repos/{owner}/{repo}/contents/{filepath}"
resp = await client.get(check_url, headers=headers)
if resp.status_code == 200:
found_files.append(filepath)
except httpx.HTTPStatusError as e:
logger.error(
"github_files_check_failed",
repo=f"{owner}/{repo}",
status_code=e.response.status_code,
response_body=e.response.text,
)
# Continue with empty found_files on error
except httpx.TimeoutException as e:
logger.error("github_files_check_timeout", repo=f"{owner}/{repo}", error=str(e))
# Continue with empty found_files on error
except httpx.RequestError as e:
logger.error("github_files_check_request_error", repo=f"{owner}/{repo}", error=str(e))
# Continue with empty found_files on error

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "service.py" | grep github

Repository: warestack/watchflow

Length of output: 99


🏁 Script executed:

if [ -f "src/integrations/github/service.py" ]; then 
  wc -l src/integrations/github/service.py
fi

Repository: warestack/watchflow

Length of output: 101


🏁 Script executed:

sed -n '108,131p' src/integrations/github/service.py

Repository: warestack/watchflow

Length of output: 1298


🏁 Script executed:

# Get more context: see the full function
sed -n '80,140p' src/integrations/github/service.py

Repository: warestack/watchflow

Length of output: 2724


🏁 Script executed:

# Check if httpx.AsyncClient.get() raises HTTPStatusError by default
python3 << 'EOF'
# Check httpx documentation/behavior
import inspect

# Look for any raise_for_status calls in the file
with open('src/integrations/github/service.py', 'r') as f:
    content = f.read()
    has_raise_for_status = 'raise_for_status' in content
    print(f"File contains 'raise_for_status': {has_raise_for_status}")
EOF

Repository: warestack/watchflow

Length of output: 101


🏁 Script executed:

# Check the complete function to see return behavior
sed -n '70,150p' src/integrations/github/service.py

Repository: warestack/watchflow

Length of output: 3539


🏁 Script executed:

# Check if raise_for_status is called in the analyze_repository_rules method
sed -n '70,160p' src/integrations/github/service.py | grep -n "raise_for_status"

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Get full method to understand flow
sed -n '73,160p' src/integrations/github/service.py

Repository: warestack/watchflow

Length of output: 3779


🏁 Script executed:

# Check httpx behavior: does AsyncClient.get() raise HTTPStatusError by default?
python3 << 'EOF'
# httpx by default does NOT raise HTTPStatusError on non-2xx status codes
# You must explicitly call raise_for_status() to trigger it
# So the exception handlers for HTTPStatusError won't catch status code errors from client.get()

# Let's verify if raise_for_status is called anywhere in the loop
with open('src/integrations/github/service.py', 'r') as f:
    lines = f.readlines()
    # Look specifically for the loop section (around line 108-131)
    for i in range(107, min(131, len(lines))):
        if 'raise_for_status' in lines[i]:
            print(f"Line {i+1}: {lines[i].rstrip()}")
        if 'resp.get' in lines[i] or 'await client' in lines[i]:
            # Print context around these lines
            print(f"Line {i+1}: {lines[i].rstrip()}")
            if i+1 < len(lines):
                print(f"Line {i+2}: {lines[i+1].rstrip()}")
EOF

Repository: warestack/watchflow

Length of output: 202


🏁 Script executed:

# Check what status codes GitHub API returns for missing files
# and understand the full logic
sed -n '100,145p' src/integrations/github/service.py

Repository: warestack/watchflow

Length of output: 2123


🏁 Script executed:

# Verify if the HTTPStatusError exception handler is actually reachable
# by checking if raise_for_status or similar is used in the loop context
rg -A 5 -B 5 "async with httpx.AsyncClient" src/integrations/github/service.py | head -30

Repository: warestack/watchflow

Length of output: 1100


🏁 Script executed:

# Check if there are integration tests or other methods that might reveal intent
rg -n "analyze_repository_rules" src/integrations/github/service.py

Repository: warestack/watchflow

Length of output: 176


🏁 Script executed:

# Verify httpx behavior: confirm raise_for_status is not called
grep -n "raise_for_status\|raise_on_http_error" src/integrations/github/service.py

Repository: warestack/watchflow

Length of output: 109


🏁 Script executed:

# Check the full context of get_repository_metadata to see the contrast
sed -n '40,75p' src/integrations/github/service.py

Repository: warestack/watchflow

Length of output: 1552


🏁 Script executed:

# Verify if this inconsistency pattern is documented or intentional
grep -n "404\|403\|status_code" src/integrations/github/service.py | head -20

Repository: warestack/watchflow

Length of output: 369


Non‑200 responses are silently treated as "file missing," masking auth failures and rate limits.

The analyze_repository_rules method ignores all non-200 status codes without distinction. Unlike get_repository_metadata in the same file, it lacks explicit handling for 404 (file not found), 401/403 (auth), or 429 (rate limit). This silently treats all failures as missing files and leaves exception handlers unreachable. Add explicit status-code branching to log unexpected failures per file.

Suggested handling
                    resp = await client.get(check_url, headers=headers)
                    if resp.status_code == 200:
                        found_files.append(filepath)
+                        continue
+                    if resp.status_code == 404:
+                        continue
+                    logger.warning(
+                        "github_files_check_failed",
+                        repo=f"{owner}/{repo}",
+                        filepath=filepath,
+                        status_code=resp.status_code,
+                        response_body=resp.text,
+                    )
🤖 Prompt for AI Agents
In `@src/integrations/github/service.py` around lines 108 - 131, The
analyze_repository_rules loop currently treats every non-200 as "missing" and
never reaches the exception handlers; update the per-file response handling in
analyze_repository_rules (the loop over files_to_check that builds found_files
and hits self.BASE_URL) to explicitly branch on resp.status_code: if 200 append
filepath to found_files; if 404 continue quietly (file not found); if
401/403/429 log a descriptive error including repo, filepath, status_code and
response text (use the same logger pattern as get_repository_metadata) and
continue or bubble up if you prefer rate-limit/auth to abort; for any other
unexpected status log an error with repo/file/status/response and do not
silently swallow it; keep the existing try/except for network/timeouts but
ensure these new per-file branches are present so auth and rate-limit failures
are visible.

Comment on lines +100 to +113
logger.info(f"🔍 Extracting acknowledgment reason from: '{comment_body}'")

for i, pattern in enumerate(ACKNOWLEDGMENT_PATTERNS):
match = re.search(pattern, comment_body, re.IGNORECASE | re.DOTALL)
if match:
# Patterns 0-2 have (acknowledge|ack) as group 1 and reason as group 2
# Patterns 3-7 have reason as group 1
reason = match.group(2).strip() if i < 3 else match.group(1).strip()

logger.info(f"✅ Pattern {i + 1} matched! Reason: '{reason}'")
if reason:
return reason
else:
logger.debug(f"❌ Pattern {i + 1} did not match")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging full comment body/reason at info level.

User comments can contain sensitive data; logging them verbatim is a compliance risk. Log lengths or move to debug.

🧹 Safer logging
-    logger.info(f"🔍 Extracting acknowledgment reason from: '{comment_body}'")
+    logger.info("extracting_acknowledgment_reason body_length=%d", len(comment_body))
...
-            logger.info(f"✅ Pattern {i + 1} matched! Reason: '{reason}'")
+            logger.info("ack_reason_matched pattern=%d reason_length=%d", i + 1, len(reason))
🤖 Prompt for AI Agents
In `@src/rules/acknowledgment.py` around lines 100 - 113, The info-level logging
currently emits the full comment_body and extracted reason (see logger.info call
and the loop over ACKNOWLEDGMENT_PATTERNS that computes reason from
match.group), which can leak sensitive data; change these to safe logs by moving
full content to debug-level and/or logging only metadata or a truncated/hashed
snippet at info-level (e.g., use logger.debug for the initial "Extracting
acknowledgment reason" message and the "Pattern matched" message that includes
reason, and at info only emit non-sensitive status like "acknowledgment reason
extracted" or a sanitized/truncated reason). Ensure you update both the initial
logger.info(f"🔍 Extracting acknowledgment reason from: '{comment_body}'") and
the logger.info(f"✅ Pattern {i + 1} matched! Reason: '{reason}'") usages,
keeping ACKNOWLEDGMENT_PATTERNS, comment_body, reason and the loop logic
unchanged.

Comment on lines +352 to +365
# Check if conversion is needed by inspecting the first rule
first_rule = rules[0]
if "rule_description" in first_rule and "event_types" not in first_rule:
# This looks like the old format
logger.info("Converting old rule format to new format")
converted_rules = []
for rule in rules:
converted_rules.append(
{
"description": rule.get("rule_description", ""),
"severity": rule.get("severity", "medium"),
"event_types": rule.get("event_types", ["deployment"]),
"parameters": rule.get("parameters", {}),
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rule conversion should preserve enabled to avoid behavior drift.

Other conversion helpers include enabled; omitting it can change enforcement defaults.

🛠️ Suggested fix
                 converted_rules.append(
                     {
                         "description": rule.get("rule_description", ""),
+                        "enabled": rule.get("enabled", True),
                         "severity": rule.get("severity", "medium"),
                         "event_types": rule.get("event_types", ["deployment"]),
                         "parameters": rule.get("parameters", {}),
                     }
                 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Check if conversion is needed by inspecting the first rule
first_rule = rules[0]
if "rule_description" in first_rule and "event_types" not in first_rule:
# This looks like the old format
logger.info("Converting old rule format to new format")
converted_rules = []
for rule in rules:
converted_rules.append(
{
"description": rule.get("rule_description", ""),
"severity": rule.get("severity", "medium"),
"event_types": rule.get("event_types", ["deployment"]),
"parameters": rule.get("parameters", {}),
}
# Check if conversion is needed by inspecting the first rule
first_rule = rules[0]
if "rule_description" in first_rule and "event_types" not in first_rule:
# This looks like the old format
logger.info("Converting old rule format to new format")
converted_rules = []
for rule in rules:
converted_rules.append(
{
"description": rule.get("rule_description", ""),
"enabled": rule.get("enabled", True),
"severity": rule.get("severity", "medium"),
"event_types": rule.get("event_types", ["deployment"]),
"parameters": rule.get("parameters", {}),
🤖 Prompt for AI Agents
In `@src/tasks/scheduler/deployment_scheduler.py` around lines 352 - 365, When
converting old rule format in the block that inspects first_rule and builds
converted_rules, include the original rule's "enabled" flag (or a sensible
default) so conversion preserves behavior; specifically, update the conversion
for each rule in converted_rules to set "enabled": rule.get("enabled", True)
(i.e., carry through the existing enabled value if present, otherwise default to
True) to avoid enforcement/default behavior drift.

Comment on lines +47 to +61
class TestAcknowledgment:
def test_valid_acknowledgment(self) -> None:
"""Test creating a valid acknowledgment."""
ack = Acknowledgment(
rule_id="rule-1", reason="False positive", commenter="user1", violations=[], pull_request_id=1
)
assert ack.rule_id == "rule-1"
assert ack.reason == "False positive"
assert ack.commenter == "user1"
assert isinstance(ack.timestamp, datetime)

def test_required_fields(self) -> None:
"""Test missing required fields raises error."""
with pytest.raises(ValidationError):
Acknowledgment(rule_id="rule-1") # type: ignore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test uses fields not present in the Acknowledgment model.

According to src/core/models.py (lines 32-40), the Acknowledgment model only has fields: rule_id, reason, commenter, and timestamp. However, this test passes violations=[] and pull_request_id=1, which are not defined in the model and will cause a Pydantic validation error.

🐛 Proposed fix to align with the actual model
     def test_valid_acknowledgment(self) -> None:
         """Test creating a valid acknowledgment."""
         ack = Acknowledgment(
-            rule_id="rule-1", reason="False positive", commenter="user1", violations=[], pull_request_id=1
+            rule_id="rule-1", reason="False positive", commenter="user1"
         )
         assert ack.rule_id == "rule-1"
         assert ack.reason == "False positive"
         assert ack.commenter == "user1"
         assert isinstance(ack.timestamp, datetime)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class TestAcknowledgment:
def test_valid_acknowledgment(self) -> None:
"""Test creating a valid acknowledgment."""
ack = Acknowledgment(
rule_id="rule-1", reason="False positive", commenter="user1", violations=[], pull_request_id=1
)
assert ack.rule_id == "rule-1"
assert ack.reason == "False positive"
assert ack.commenter == "user1"
assert isinstance(ack.timestamp, datetime)
def test_required_fields(self) -> None:
"""Test missing required fields raises error."""
with pytest.raises(ValidationError):
Acknowledgment(rule_id="rule-1") # type: ignore
class TestAcknowledgment:
def test_valid_acknowledgment(self) -> None:
"""Test creating a valid acknowledgment."""
ack = Acknowledgment(
rule_id="rule-1", reason="False positive", commenter="user1"
)
assert ack.rule_id == "rule-1"
assert ack.reason == "False positive"
assert ack.commenter == "user1"
assert isinstance(ack.timestamp, datetime)
def test_required_fields(self) -> None:
"""Test missing required fields raises error."""
with pytest.raises(ValidationError):
Acknowledgment(rule_id="rule-1") # type: ignore
🤖 Prompt for AI Agents
In `@tests/unit/core/test_models.py` around lines 47 - 61, The test passes extra
fields not present on the Acknowledgment model; update test_valid_acknowledgment
to construct Acknowledgment using only the actual model fields (rule_id, reason,
commenter) and rely on timestamp being set by the model, e.g., remove violations
and pull_request_id from the constructor, and keep test_required_fields (which
asserts ValidationError for missing required fields) as-is to validate missing
reason/commenter behavior; reference Acknowledgment, test_valid_acknowledgment,
and test_required_fields when making the change.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@src/core/utils/logging.py`:
- Around line 146-147: The comment above the local variable assignment for
log_fn is misleading about from __future__ import annotations; update the
comment to state that PEP 563 only defers evaluation of function/method
signature annotations and does not affect local variable annotations, and note
that Callable is safe here because it is imported unconditionally (not due to
__future__); specifically edit the comment that precedes "log_fn: Callable[...,
Any] = getattr(logger_obj, level, logger_obj.info)" to accurately mention
logger_obj, level, and that Callable is available due to the import at the top
of the module.

In `@src/rules/conditions/access_control.py`:
- Around line 64-66: Replace use of the hardcoded DEFAULT_TEAM_MEMBERSHIPS dict
when computing team_memberships/is_member with a dynamic call to the GitHub API:
call the org/team members endpoint (e.g., GET
/orgs/{org}/teams/{team_slug}/members) using the project's GitHub client, await
and cache the response, then check author_login against the returned member
logins instead of DEFAULT_TEAM_MEMBERSHIPS; update the logic around
team_memberships, is_member, author_login and team_name to use the fetched list
and handle API errors/rate limits (fallback or reject) and empty team responses.

In `@src/rules/conditions/pull_request.py`:
- Line 270: The min_approvals parameter (read in pull_request condition code
where min_approvals = parameters.get("min_approvals", 1)) must be validated and
normalized to an integer before any numeric comparison; update the condition
evaluation to check isinstance(min_approvals, (int, float, str)) and attempt to
convert string-like values to int (raising or treating invalid values as a
validation error), and mirror the same validation in the validate method (the
block around line 292) so YAML/string inputs are rejected or coerced
consistently; ensure comparisons later in the function use the normalized int
value and return/raise a clear validation error if conversion fails.

In `@src/tasks/task_queue.py`:
- Around line 128-173: _change the contract of _execute_with_retry to return a
success flag and the actual attempts used (e.g., return (True, attempts) on
success and (False, attempts) on failure), compute attempts as the number of
attempts executed (attempt index + 1 while retrying with MAX_RETRIES and
INITIAL_BACKOFF_SECONDS), and replace the current final logger.error uses of
task.retry_count with the returned attempts; then update _worker to call
success, attempts = await self._execute_with_retry(task) and only emit
logger.info("task_completed", ...) when success is True, otherwise log failure
using the attempts value returned; keep existing transient/backoff logic and
exc_info usage but surface the attempts and boolean result through the new
return value.
- Around line 106-125: The enqueue method has a race between _is_duplicate and
_add_to_dedup_cache because await self.queue.put yields control; wrap the dedup
check + reserve in an asyncio.Lock and reserve the task_id in the dedup cache
before awaiting queue.put so the operation is atomic. Specifically, create/use
an asyncio.Lock (e.g., self._dedup_lock), acquire it around calling
_is_duplicate and then _add_to_dedup_cache (or a new _reserve_in_dedup method)
so no two coroutines can pass the check concurrently, release the lock, then
await self.queue.put(task); also add try/except around queue.put to
remove/revert the reserved entry from the dedup cache if queue.put fails
(ensuring consistency), and keep _generate_task_id, _is_duplicate, and
_add_to_dedup_cache names as the reference points.

In `@src/webhooks/handlers/push.py`:
- Around line 4-11: The module-level import and instantiation of PushProcessor
(push_processor = PushProcessor()) makes ImportError handling unreachable;
change to a lazy/optional import inside the handler (e.g., import PushProcessor
within handle() and instantiate there if not already created) or remove the
unused except ImportError block so errors can be caught at runtime, and correct
the mistaken comment that reads "Deployment processor" to "Push processor";
update references to PushProcessor and push_processor in the handle() flow so
the processor is only created after a successful import.
🧹 Nitpick comments (7)
src/integrations/github/models.py (1)

84-99: Consider more specific typing for better IDE support and validation.

The pull_requests: dict | None (Line 88) and errors: list[dict] | None (Line 99) use loose typing. For GraphQL errors, a typed model would provide better validation and IDE autocomplete.

♻️ Optional: Define typed error model
+class GraphQLError(BaseModel):
+    """GraphQL Error representation."""
+    message: str
+    locations: list[dict] | None = None
+    path: list[str | int] | None = None
+    extensions: dict | None = None
+

 class GraphQLResponse(BaseModel):
     """Standard GraphQL Response Wrapper."""

     data: GraphQLResponseData
-    errors: list[dict] | None = None
+    errors: list[GraphQLError] | None = None
src/core/constants.py (1)

5-10: Track the TODO for production sourcing.
Since this is a runtime policy control, consider wiring it to config/DB before release and track the TODO explicitly.

If you want, I can draft a lightweight config-backed loader or open an issue to track the production integration.

src/webhooks/handlers/pull_request.py (1)

17-18: Prefer direct enum comparison over string-based name check.

Using event.event_type.name == "PULL_REQUEST" is more fragile than direct enum comparison. Consider using event.event_type == EventType.PULL_REQUEST for type safety and clarity.

♻️ Suggested fix
     async def can_handle(self, event: WebhookEvent) -> bool:
-        return event.event_type.name == "PULL_REQUEST"
+        return event.event_type == EventType.PULL_REQUEST
src/event_processors/pull_request/processor.py (2)

63-66: Redundant installation_id check.

installation_id is already validated at lines 41-49 and would have caused an early return. This second check is unreachable.

♻️ Suggested fix
             # 1. Enrich event data
-            if not installation_id:
-                raise ValueError("Installation ID is required")
-
             event_data = await self.enricher.enrich_event_data(task, github_token)

199-201: Potential mutation issue when updating parameters from conditions.

rule_dict["parameters"] may be empty dict from the same reference. If rule.parameters returns a shared/cached object, mutating it here could cause side effects. Consider creating a new dict instead of updating in place.

🛠️ Suggested fix
             if not rule_dict["parameters"] and hasattr(rule, "conditions"):
+                merged_params = {}
                 for condition in rule.conditions:
-                    rule_dict["parameters"].update(condition.parameters)
+                    merged_params.update(condition.parameters)
+                rule_dict["parameters"] = merged_params
src/agents/engine_agent/nodes.py (1)

115-115: Redundant import statement.

json is already imported at module level (line 6). This inner import is unnecessary.

♻️ Suggested fix
                 else:
                     # It's a BaseMessage, try to parse the content
-                    import json
-
                     try:
                         content = json.loads(strategy_result.content)
src/agents/engine_agent/agent.py (1)

55-55: Consider preserving type specificity for _build_graph.

The return type was changed from StateGraph to Any. If the compiled graph has a known type, preserving it would improve type safety and IDE support.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c26009 and e313021.

📒 Files selected for processing (18)
  • src/agents/engine_agent/agent.py
  • src/agents/engine_agent/models.py
  • src/agents/engine_agent/nodes.py
  • src/api/dependencies.py
  • src/core/constants.py
  • src/core/utils/logging.py
  • src/event_processors/pull_request/processor.py
  • src/integrations/github/graphql.py
  • src/integrations/github/models.py
  • src/rules/conditions/access_control.py
  • src/rules/conditions/pull_request.py
  • src/rules/registry.py
  • src/tasks/task_queue.py
  • src/webhooks/handlers/pull_request.py
  • src/webhooks/handlers/push.py
  • tests/unit/agents/test_engine_agent.py
  • tests/unit/rules/conditions/test_access_control.py
  • tests/unit/rules/conditions/test_pull_request.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/integrations/github/graphql.py
🧰 Additional context used
🧬 Code graph analysis (12)
src/api/dependencies.py (3)
src/core/models.py (1)
  • User (44-55)
src/integrations/github/service.py (1)
  • GitHubService (15-188)
src/core/utils/caching.py (1)
  • get (44-66)
src/rules/conditions/access_control.py (3)
src/core/models.py (2)
  • Severity (8-17)
  • Violation (20-30)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
src/rules/utils/codeowners.py (1)
  • is_critical_file (208-231)
src/webhooks/handlers/push.py (5)
src/core/models.py (4)
  • EventType (90-104)
  • WebhookEvent (115-137)
  • WebhookResponse (107-112)
  • repo_full_name (130-132)
src/webhooks/models.py (1)
  • WebhookResponse (31-36)
src/event_processors/push.py (2)
  • PushProcessor (14-187)
  • process (27-144)
src/webhooks/handlers/base.py (2)
  • EventHandler (6-25)
  • handle (15-25)
src/tasks/task_queue.py (2)
  • repo_full_name (33-38)
  • enqueue (106-126)
tests/unit/rules/conditions/test_pull_request.py (1)
src/rules/conditions/pull_request.py (12)
  • MinApprovalsCondition (257-300)
  • MinDescriptionLengthCondition (100-177)
  • RequiredLabelsCondition (180-254)
  • TitlePatternCondition (17-97)
  • validate (77-97)
  • validate (158-177)
  • validate (234-254)
  • validate (290-300)
  • evaluate (26-75)
  • evaluate (109-156)
  • evaluate (189-232)
  • evaluate (266-288)
src/webhooks/handlers/pull_request.py (5)
src/core/models.py (4)
  • EventType (90-104)
  • WebhookEvent (115-137)
  • WebhookResponse (107-112)
  • repo_full_name (130-132)
src/webhooks/models.py (1)
  • WebhookResponse (31-36)
src/webhooks/handlers/base.py (2)
  • EventHandler (6-25)
  • handle (15-25)
src/webhooks/handlers/check_run.py (2)
  • can_handle (14-15)
  • handle (17-35)
src/tasks/task_queue.py (2)
  • repo_full_name (33-38)
  • enqueue (106-126)
src/agents/engine_agent/models.py (3)
src/core/models.py (1)
  • Violation (20-30)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
src/rules/models.py (1)
  • Rule (50-61)
src/rules/registry.py (7)
src/rules/acknowledgment.py (1)
  • RuleID (20-32)
src/rules/conditions/access_control.py (3)
  • AuthorTeamCondition (18-96)
  • CodeOwnersCondition (99-184)
  • NoForcePushCondition (262-298)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
src/rules/conditions/filesystem.py (2)
  • FilePatternCondition (17-134)
  • MaxFileSizeCondition (137-220)
src/rules/conditions/pull_request.py (4)
  • MinApprovalsCondition (257-300)
  • MinDescriptionLengthCondition (100-177)
  • RequiredLabelsCondition (180-254)
  • TitlePatternCondition (17-97)
src/rules/conditions/temporal.py (3)
  • AllowedHoursCondition (65-150)
  • DaysCondition (153-246)
  • WeekendCondition (22-62)
src/rules/conditions/workflow.py (1)
  • WorkflowDurationCondition (17-99)
src/tasks/task_queue.py (4)
src/webhooks/handlers/issue_comment.py (1)
  • event_type (18-19)
src/core/models.py (1)
  • repo_full_name (130-132)
tests/unit/tasks/test_queue.py (1)
  • queue (13-15)
tests/unit/event_processors/test_push_processor.py (1)
  • task (42-52)
tests/unit/agents/test_engine_agent.py (4)
src/agents/engine_agent/models.py (1)
  • EngineRequest (17-24)
src/core/models.py (2)
  • Severity (8-17)
  • Violation (20-30)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
src/rules/models.py (2)
  • Rule (50-61)
  • RuleSeverity (12-20)
tests/unit/rules/conditions/test_access_control.py (1)
src/rules/conditions/access_control.py (12)
  • AuthorTeamCondition (18-96)
  • CodeOwnersCondition (99-184)
  • NoForcePushCondition (262-298)
  • ProtectedBranchesCondition (187-259)
  • validate (81-96)
  • validate (150-170)
  • validate (239-259)
  • validate (295-298)
  • evaluate (27-79)
  • evaluate (112-148)
  • evaluate (196-237)
  • evaluate (270-293)
src/rules/conditions/pull_request.py (2)
src/core/models.py (2)
  • Severity (8-17)
  • Violation (20-30)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
src/agents/engine_agent/nodes.py (6)
src/agents/engine_agent/models.py (4)
  • LLMEvaluationResponse (55-64)
  • RuleDescription (100-114)
  • StrategySelectionResponse (47-52)
  • ValidationStrategy (27-32)
src/agents/engine_agent/prompts.py (2)
  • create_llm_evaluation_prompt (79-108)
  • create_validation_strategy_prompt (37-76)
src/integrations/providers/openai_provider.py (1)
  • get_chat_model (15-32)
src/integrations/providers/factory.py (1)
  • get_chat_model (97-130)
src/rules/conditions/access_control.py (4)
  • evaluate (27-79)
  • evaluate (112-148)
  • evaluate (196-237)
  • evaluate (270-293)
src/rules/conditions/pull_request.py (4)
  • evaluate (26-75)
  • evaluate (109-156)
  • evaluate (189-232)
  • evaluate (266-288)
🔇 Additional comments (42)
src/core/utils/logging.py (2)

13-13: Good fix for the runtime NameError.

Moving Callable out of the TYPE_CHECKING guard correctly resolves the previous critical issue. The # noqa: TCH003 suppression is appropriate here since the import must remain at runtime for the variable annotation on line 147.


8-10: Typing improvements look good.

The additions align well with MyPy strict mode:

  • from __future__ import annotations enables PEP 563 deferred evaluation for signatures.
  • Explicit -> Any return types and *args: Any, **kwargs: Any annotations provide proper coverage.
  • The # AsyncGenerator[None, None] comment on line 26 documents the actual contextmanager yield type while using pragmatic Any for the complex wrapped return.

Also applies to: 16-16, 26-26, 68-68, 84-84, 88-88, 106-106

src/integrations/github/models.py (1)

1-61: Well-structured models with appropriate GraphQL alias mapping.

The base models (Actor, ReviewNode, ReviewConnection, IssueNode, IssueConnection, CommitMessage, CommitNode, CommitConnection, FileNode, FileEdge, FileConnection, CommentConnection) are cleanly defined. The CommentConnection model correctly includes model_config = ConfigDict(populate_by_name=True) to support instantiation via alias in the default factory (Line 77).

src/api/dependencies.py (2)

13-17: LGTM for DI helper.
Simple factory is fine and keeps DI surface clean.


57-65: Auth gate looks good.
401 on missing user is clear and consistent.

src/webhooks/handlers/push.py (1)

20-49: Clean enqueue/duplicate flow.
The structured response and dedup-aware enqueue outcomes are clear and consistent with the handler contract.

src/tasks/task_queue.py (4)

19-47: Task model looks solid.
The typed fields and payload-derived helpers are clear and useful.


50-61: Transient error classifier is straightforward.
The type+message checks are practical for retry heuristics.


72-99: Dedup cache setup is clear and bounded.
The LRU-style eviction avoids unbounded growth.


175-203: Worker lifecycle and stats hooks look good.
Clean start/stop semantics and helpful stats for health checks.

src/agents/engine_agent/models.py (3)

17-24: EngineRequest model is clear and well-typed.
Matches the intended execute input shape.


76-83: RuleViolation inheritance is clean.
Avoids field duplication while preserving the extended metadata.


100-136: RuleDescription/EngineState updates look consistent.
Conditions typing and forward-reference rebuilds align with the new rule model.

tests/unit/rules/conditions/test_access_control.py (4)

18-63: NoForcePushCondition tests are thorough.
Covers validate/evaluate paths for both forced and normal pushes.


65-128: AuthorTeamCondition coverage looks solid.
Exercises team membership, missing team, and missing sender cases.


130-174: CodeOwnersCondition tests are clear and focused.
Patch-based critical file checks are well-isolated.


176-237: ProtectedBranchesCondition tests look good.
Covers protected vs non-protected and missing PR details.

tests/unit/rules/conditions/test_pull_request.py (4)

16-83: MinApprovalsCondition tests look solid.
Covers sufficient, insufficient, and missing review scenarios.


85-157: TitlePatternCondition tests are comprehensive.
Includes matching, mismatch, missing details, and empty title cases.


159-231: MinDescriptionLengthCondition tests are well-rounded.
Nice coverage for empty/whitespace and threshold behavior.


233-306: RequiredLabelsCondition tests are clear and complete.
Verifies required/missing labels and no-requirement cases.

tests/unit/agents/test_engine_agent.py (6)

12-40: MockCondition test double is straightforward.
Captures evaluation context and supports both evaluate/validate paths.


43-45: Fixture setup is clean and minimal.
Keeps agent instantiation consistent across tests.


48-78: Condition execution test is solid.
Validates context propagation and violation mapping end-to-end.


80-92: EngineRequest acceptance test looks good.
Covers strict-typed request path with basic assertions.


94-116: LLM-skip behavior test is clear.
Ensures condition evaluation is triggered when present.


118-155: Legacy dict conversion test is useful.
Directly verifies the conversion path without over-mocking.

src/rules/registry.py (2)

35-43: LGTM!

The RULE_ID_TO_CONDITION mapping correctly associates RuleIDs with their corresponding condition classes.


65-68: LGTM!

Simple and correct lookup implementation.

src/webhooks/handlers/pull_request.py (1)

20-65: Handler now properly delegates to processor via task queue.

The previous review flagged that the handler returned success without processing. This is now addressed with proper enqueue logic, action filtering, and error handling. The delegation pattern with duplicate detection and contextual logging is well implemented.

src/rules/conditions/pull_request.py (2)

17-75: LGTM!

TitlePatternCondition properly handles edge cases including empty titles and invalid regex patterns with appropriate error handling.


180-254: LGTM!

RequiredLabelsCondition correctly validates required labels with proper missing label detection and informative violation messages.

src/event_processors/pull_request/processor.py (2)

29-92: LGTM!

The process method has a well-structured flow: validation, enrichment, rule loading with graceful handling of missing rules, and proper error handling with check-run updates.


205-217: LGTM!

_post_violations_to_github properly handles missing PR number and installation ID, with error handling for the GitHub API call.

src/rules/conditions/access_control.py (2)

187-259: LGTM!

ProtectedBranchesCondition correctly validates if a PR targets protected branches with proper handling of missing data and informative violation messages.


262-298: LGTM!

NoForcePushCondition correctly detects force pushes on push events with appropriate severity and guidance.

src/agents/engine_agent/nodes.py (3)

163-174: LGTM - Past issue addressed!

Rules with VALIDATOR strategy but no conditions are now explicitly logged as configuration errors and added to analysis_steps, rather than being silently skipped. This addresses the previously flagged silent failure path.


282-313: LGTM!

_execute_conditions properly evaluates attached conditions, aggregates violations, and converts them to dicts with execution metadata. Error handling ensures the function doesn't propagate exceptions.


30-60: LGTM!

analyze_rule_descriptions correctly filters rules by event type and tracks analysis steps.

src/agents/engine_agent/agent.py (3)

76-94: LGTM!

The execute method now validates kwargs via Pydantic's EngineRequest, providing strict typing while supporting both direct EngineRequest objects and keyword arguments. Error handling returns a clear AgentResult on validation failure.


191-230: LGTM!

_convert_rules_to_descriptions handles both Rule objects and dict representations with multiple fallback keys for description extraction, providing good backward compatibility.


248-266: LGTM!

Legacy evaluate method correctly delegates to the new execute method while maintaining backward compatibility for existing callers.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +146 to 147
# Callable is now only needed for typing, so it's safe to use the string name or handled by __future__
log_fn: Callable[..., Any] = getattr(logger_obj, level, logger_obj.info)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading comment: __future__ annotations does not defer local variable annotations.

The comment incorrectly implies that from __future__ import annotations makes the Callable type safe to use at runtime. In fact, PEP 563 only defers evaluation of function/method signature annotations, not variable annotations inside function bodies. Line 147's annotation is evaluated at runtime.

The code works correctly because Callable is now imported unconditionally on line 13—not because of the __future__ import.

Suggested fix: update comment for accuracy
-    # Callable is now only needed for typing, so it's safe to use the string name or handled by __future__
+    # Callable is imported at module level (line 13) so this runtime annotation is valid
     log_fn: Callable[..., Any] = getattr(logger_obj, level, logger_obj.info)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Callable is now only needed for typing, so it's safe to use the string name or handled by __future__
log_fn: Callable[..., Any] = getattr(logger_obj, level, logger_obj.info)
# Callable is imported at module level (line 13) so this runtime annotation is valid
log_fn: Callable[..., Any] = getattr(logger_obj, level, logger_obj.info)
🤖 Prompt for AI Agents
In `@src/core/utils/logging.py` around lines 146 - 147, The comment above the
local variable assignment for log_fn is misleading about from __future__ import
annotations; update the comment to state that PEP 563 only defers evaluation of
function/method signature annotations and does not affect local variable
annotations, and note that Callable is safe here because it is imported
unconditionally (not due to __future__); specifically edit the comment that
precedes "log_fn: Callable[..., Any] = getattr(logger_obj, level,
logger_obj.info)" to accurately mention logger_obj, level, and that Callable is
available due to the import at the top of the module.

Comment on lines +64 to +66
# Use constants from src.core.constants
team_memberships = DEFAULT_TEAM_MEMBERSHIPS
is_member = author_login in team_memberships.get(team_name, [])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check what DEFAULT_TEAM_MEMBERSHIPS contains
rg -n "DEFAULT_TEAM_MEMBERSHIPS" --type=py -A10 src/core/constants.py

Repository: warestack/watchflow

Length of output: 216


🏁 Script executed:

#!/bin/bash
# Also examine the context around lines 64-66 in access_control.py
sed -n '50,75p' src/rules/conditions/access_control.py

Repository: warestack/watchflow

Length of output: 1191


Team membership should use GitHub API instead of hardcoded constants.

Lines 64–66 use DEFAULT_TEAM_MEMBERSHIPS, a static dict with hardcoded members (["devops-user", "admin-user"], etc.). This will not reflect actual GitHub org/team membership and requires code changes to update team lists. For production use, fetch team membership dynamically via GitHub API (e.g., GET /orgs/{org}/teams/{team_slug}/members).

🤖 Prompt for AI Agents
In `@src/rules/conditions/access_control.py` around lines 64 - 66, Replace use of
the hardcoded DEFAULT_TEAM_MEMBERSHIPS dict when computing
team_memberships/is_member with a dynamic call to the GitHub API: call the
org/team members endpoint (e.g., GET /orgs/{org}/teams/{team_slug}/members)
using the project's GitHub client, await and cache the response, then check
author_login against the returned member logins instead of
DEFAULT_TEAM_MEMBERSHIPS; update the logic around team_memberships, is_member,
author_login and team_name to use the fetched list and handle API errors/rate
limits (fallback or reject) and empty team responses.

parameters = context.get("parameters", {})
event = context.get("event", {})

min_approvals = parameters.get("min_approvals", 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against non-numeric min_approvals parameter.

Similar to min_description_length, the min_approvals parameter should be validated. If a string or invalid value is passed from YAML config, the comparison on line 279 may fail or behave unexpectedly.

🛠️ Suggested fix
-        min_approvals = parameters.get("min_approvals", 1)
+        try:
+            min_approvals = int(parameters.get("min_approvals", 1))
+        except (TypeError, ValueError):
+            logger.error(
+                "MinApprovalsCondition: invalid min_approvals",
+                extra={"value": parameters.get("min_approvals")},
+            )
+            return [
+                Violation(
+                    rule_description=self.description,
+                    severity=Severity.MEDIUM,
+                    message="Invalid min_approvals parameter (expected integer).",
+                    how_to_fix="Set min_approvals to a whole number.",
+                )
+            ]

Also applies to line 292 in the validate method.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
min_approvals = parameters.get("min_approvals", 1)
try:
min_approvals = int(parameters.get("min_approvals", 1))
except (TypeError, ValueError):
logger.error(
"MinApprovalsCondition: invalid min_approvals",
extra={"value": parameters.get("min_approvals")},
)
return [
Violation(
rule_description=self.description,
severity=Severity.MEDIUM,
message="Invalid min_approvals parameter (expected integer).",
how_to_fix="Set min_approvals to a whole number.",
)
]
🤖 Prompt for AI Agents
In `@src/rules/conditions/pull_request.py` at line 270, The min_approvals
parameter (read in pull_request condition code where min_approvals =
parameters.get("min_approvals", 1)) must be validated and normalized to an
integer before any numeric comparison; update the condition evaluation to check
isinstance(min_approvals, (int, float, str)) and attempt to convert string-like
values to int (raising or treating invalid values as a validation error), and
mirror the same validation in the validate method (the block around line 292) so
YAML/string inputs are rejected or coerced consistently; ensure comparisons
later in the function use the normalized int value and return/raise a clear
validation error if conversion fails.

Comment on lines 106 to 125
async def enqueue(
self,
func: Callable[..., Coroutine[Any, Any, Any]],
event_type: str,
payload: dict[str, Any],
*args: Any,
**kwargs: Any,
) -> bool:
"""Adds a task to the queue if it is not a duplicate."""
task_id = self._generate_task_id(event_type, payload)

if self._is_duplicate(task_id):
logger.info("task_skipped_duplicate", task_id=task_id, event_type=event_type)
return False

task = Task(task_id=task_id, event_type=event_type, payload=payload, func=func, args=args, kwargs=kwargs)
await self.queue.put(task)
self._add_to_dedup_cache(task_id)

logger.info("task_enqueued", task_id=task_id, event_type=event_type)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd "task_queue.py" -t f

Repository: warestack/watchflow

Length of output: 86


🏁 Script executed:

cat -n src/tasks/task_queue.py

Repository: warestack/watchflow

Length of output: 9164


🏁 Script executed:

# Check if there are any tests related to concurrent dedup behavior
find . -name "test*.py" -o -name "*test.py" | head -20

Repository: warestack/watchflow

Length of output: 994


🏁 Script executed:

find . -name "*test*task*" -o -name "*task*test*" | head -20

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

rg "TaskQueue|task_queue" tests/ --type py -l

Repository: warestack/watchflow

Length of output: 251


🏁 Script executed:

cat -n tests/unit/tasks/test_queue.py

Repository: warestack/watchflow

Length of output: 7775


🏁 Script executed:

# Check if there are any concurrent/stress tests for the dedup logic
rg "asyncio.gather|concurrent|race|stress" tests/unit/tasks/test_queue.py

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Verify the structure of enqueue method one more time to confirm the await point
sed -n '106,126p' src/tasks/task_queue.py | cat -n

Repository: warestack/watchflow

Length of output: 990


Dedup check/add race condition under concurrency—duplicates can bypass the cache.

The dedup logic is not atomic. Between the _is_duplicate() check at line 117 and the _add_to_dedup_cache() call at line 123, the await self.queue.put() at line 122 yields control. Two concurrent callers with the same task_id can both pass the duplicate check before either adds to the cache, allowing both to enqueue.

Make the check and cache add atomic using asyncio.Lock, and reserve the task_id in the cache before awaiting queue.put(). Add exception handling to revert the cache entry if queue.put() fails.

🧩 Suggested fix (atomic dedup)
     def __init__(self, max_dedup_size: int = MAX_DEDUP_CACHE_SIZE) -> None:
         self.queue: asyncio.Queue[Task] = asyncio.Queue(maxsize=100)
         # LRU-based deduplication cache (prevents memory leaks)
         self._dedup_cache: OrderedDict[str, bool] = OrderedDict()
         self._max_dedup_size = max_dedup_size
+        self._dedup_lock = asyncio.Lock()
         self.workers: list[asyncio.Task[None]] = []
@@
     async def enqueue(
         self,
         func: Callable[..., Coroutine[Any, Any, Any]],
         event_type: str,
         payload: dict[str, Any],
         *args: Any,
         **kwargs: Any,
     ) -> bool:
         """Adds a task to the queue if it is not a duplicate."""
         task_id = self._generate_task_id(event_type, payload)
-
-        if self._is_duplicate(task_id):
-            logger.info("task_skipped_duplicate", task_id=task_id, event_type=event_type)
-            return False
+        async with self._dedup_lock:
+            if self._is_duplicate(task_id):
+                logger.info("task_skipped_duplicate", task_id=task_id, event_type=event_type)
+                return False
+            # Reserve the task_id before awaiting
+            self._add_to_dedup_cache(task_id)
 
         task = Task(task_id=task_id, event_type=event_type, payload=payload, func=func, args=args, kwargs=kwargs)
-        await self.queue.put(task)
-        self._add_to_dedup_cache(task_id)
+        try:
+            await self.queue.put(task)
+        except Exception:
+            async with self._dedup_lock:
+                self._dedup_cache.pop(task_id, None)
+            raise
 
         logger.info("task_enqueued", task_id=task_id, event_type=event_type)
         return True
🤖 Prompt for AI Agents
In `@src/tasks/task_queue.py` around lines 106 - 125, The enqueue method has a
race between _is_duplicate and _add_to_dedup_cache because await self.queue.put
yields control; wrap the dedup check + reserve in an asyncio.Lock and reserve
the task_id in the dedup cache before awaiting queue.put so the operation is
atomic. Specifically, create/use an asyncio.Lock (e.g., self._dedup_lock),
acquire it around calling _is_duplicate and then _add_to_dedup_cache (or a new
_reserve_in_dedup method) so no two coroutines can pass the check concurrently,
release the lock, then await self.queue.put(task); also add try/except around
queue.put to remove/revert the reserved entry from the dedup cache if queue.put
fails (ensuring consistency), and keep _generate_task_id, _is_duplicate, and
_add_to_dedup_cache names as the reference points.

Comment on lines +128 to +173
async def _execute_with_retry(self, task: Task) -> None:
"""Execute task with exponential backoff retry for transient failures."""
last_error: Exception | None = None

for attempt in range(MAX_RETRIES + 1):
try:
# Periodic cleanup
if (datetime.now() - last_cleanup).total_seconds() > cleanup_interval:
self.cleanup_old_tasks()
last_cleanup = datetime.now()

# Find pending tasks
pending_tasks = [task for task in self.tasks.values() if task.status == TaskStatus.PENDING]

if pending_tasks:
# Process the oldest task
task = min(pending_tasks, key=lambda t: t.created_at)
await self._process_task(task, worker_name)
await task.func(*task.args, **task.kwargs)
if attempt > 0:
logger.info("task_retry_succeeded", task_id=task.task_id, attempt=attempt + 1)
return
except Exception as e:
last_error = e
if attempt < MAX_RETRIES and _is_transient_error(e):
backoff = INITIAL_BACKOFF_SECONDS * (2**attempt)
logger.warning(
"task_retry_scheduled",
task_id=task.task_id,
attempt=attempt + 1,
backoff_seconds=backoff,
error=str(e),
)
await asyncio.sleep(backoff)
else:
# No tasks, wait a bit
await asyncio.sleep(1)
break

logger.error(
"task_failed",
task_id=task.task_id,
error=str(last_error),
attempts=min(task.retry_count + 1, MAX_RETRIES + 1),
exc_info=True,
)

async def _worker(self) -> None:
"""Background worker loop."""
while True:
task = await self.queue.get()
try:
logger.info("task_started", task_id=task.task_id, event_type=task.event_type)
await self._execute_with_retry(task)
logger.info("task_completed", task_id=task.task_id)
except Exception as e:
logger.error(f"Worker {worker_name} error: {e}")
await asyncio.sleep(5)

logger.info(f"Worker {worker_name} stopped")

async def _process_task(self, task: Task, worker_name: str):
"""Process a single task."""
try:
task.status = TaskStatus.RUNNING
task.started_at = datetime.now()

logger.info(f"Worker {worker_name} processing task {task.id}")

# Get the appropriate processor
processor = self._get_processor(task.event_type)
result = await processor.process(task)

task.status = TaskStatus.COMPLETED
task.completed_at = datetime.now()
task.result = result.__dict__ if hasattr(result, "__dict__") else result

logger.info(f"Task {task.id} completed successfully")

except Exception as e:
task.status = TaskStatus.FAILED
task.completed_at = datetime.now()
task.error = str(e)
logger.error(f"Task {task.id} failed: {e}")

def cleanup_old_tasks(self, max_age_hours: int = 24):
"""Clean up old completed tasks and their event hashes to prevent memory leaks."""
cutoff_time = datetime.now().timestamp() - (max_age_hours * 3600)

# Find old completed tasks
old_task_ids = [
task_id
for task_id, task in self.tasks.items()
if task.status in [TaskStatus.COMPLETED, TaskStatus.FAILED] and task.created_at.timestamp() < cutoff_time
]

# Remove old tasks and their event hashes
for task_id in old_task_ids:
task = self.tasks[task_id]
if task.event_hash and task.event_hash in self.event_hashes:
del self.event_hashes[task.event_hash]
del self.tasks[task_id]

if old_task_ids:
logger.info(f"Cleaned up {len(old_task_ids)} old tasks")

def _get_processor(self, event_type: str):
"""Get the appropriate processor for the event type."""
from src.event_processors.factory import EventProcessorFactory

return EventProcessorFactory.create_processor(event_type)
logger.error("task_worker_error", task_id=task.task_id, error=str(e), exc_info=True)

finally:
self.queue.task_done()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

task_completed is logged even when retries are exhausted.
Since _execute_with_retry() logs failures but doesn’t signal success/failure, _worker() always logs completion. Return a boolean and log completion only on success; also track the actual attempt count for failure logs.

🔧 Suggested fix (success flag + attempts)
-    async def _execute_with_retry(self, task: Task) -> None:
+    async def _execute_with_retry(self, task: Task) -> bool:
         """Execute task with exponential backoff retry for transient failures."""
         last_error: Exception | None = None
+        attempts = 0
 
         for attempt in range(MAX_RETRIES + 1):
+            attempts = attempt + 1
             try:
                 await task.func(*task.args, **task.kwargs)
                 if attempt > 0:
                     logger.info("task_retry_succeeded", task_id=task.task_id, attempt=attempt + 1)
-                return
+                return True
             except Exception as e:
                 last_error = e
                 if attempt < MAX_RETRIES and _is_transient_error(e):
                     backoff = INITIAL_BACKOFF_SECONDS * (2**attempt)
                     logger.warning(
                         "task_retry_scheduled",
                         task_id=task.task_id,
                         attempt=attempt + 1,
                         backoff_seconds=backoff,
                         error=str(e),
                     )
                     await asyncio.sleep(backoff)
                 else:
                     break
 
         logger.error(
             "task_failed",
             task_id=task.task_id,
             error=str(last_error),
-            attempts=min(task.retry_count + 1, MAX_RETRIES + 1),
+            attempts=attempts,
             exc_info=True,
         )
+        return False
@@
     async def _worker(self) -> None:
         """Background worker loop."""
         while True:
             task = await self.queue.get()
             try:
                 logger.info("task_started", task_id=task.task_id, event_type=task.event_type)
-                await self._execute_with_retry(task)
-                logger.info("task_completed", task_id=task.task_id)
+                success = await self._execute_with_retry(task)
+                if success:
+                    logger.info("task_completed", task_id=task.task_id)
             except Exception as e:
                 logger.error("task_worker_error", task_id=task.task_id, error=str(e), exc_info=True)
             finally:
                 self.queue.task_done()
🤖 Prompt for AI Agents
In `@src/tasks/task_queue.py` around lines 128 - 173, _change the contract of
_execute_with_retry to return a success flag and the actual attempts used (e.g.,
return (True, attempts) on success and (False, attempts) on failure), compute
attempts as the number of attempts executed (attempt index + 1 while retrying
with MAX_RETRIES and INITIAL_BACKOFF_SECONDS), and replace the current final
logger.error uses of task.retry_count with the returned attempts; then update
_worker to call success, attempts = await self._execute_with_retry(task) and
only emit logger.info("task_completed", ...) when success is True, otherwise log
failure using the attempts value returned; keep existing transient/backoff logic
and exc_info usage but surface the attempts and boolean result through the new
return value.

Comment on lines +4 to +11
from src.event_processors.push import PushProcessor
from src.tasks.task_queue import task_queue
from src.webhooks.handlers.base import EventHandler

logger = logging.getLogger(__name__)
logger = structlog.get_logger()

# Instantiate processor once
push_processor = PushProcessor()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n src/webhooks/handlers/push.py

Repository: warestack/watchflow

Length of output: 2663


ImportError handler is unreachable with the current module-level import.

Because PushProcessor is imported and instantiated at module load (lines 4, 11), a missing processor fails module import before handle() runs, so the except ImportError handler never executes. Either make the import lazy/optional or remove the unreachable handler. Also note: the comment at line 52 says "Deployment processor" but should reference the "Push processor" (copy-paste error).

🔧 Suggested fix (lazy/optional import)
-from src.event_processors.push import PushProcessor
+try:
+    from src.event_processors.push import PushProcessor
+except ImportError:  # Optional processor
+    PushProcessor = None
@@
-# Instantiate processor once
-push_processor = PushProcessor()
+# Instantiate processor once if available
+push_processor: PushProcessor | None = PushProcessor() if PushProcessor else None
@@
-        try:
+        try:
+            if push_processor is None:
+                log.warning("push_processor_not_found")
+                return WebhookResponse(
+                    status="ok",
+                    detail="Push acknowledged (no processor)",
+                    event_type=EventType.PUSH,
+                )
             # Enqueue the processing task
             enqueued = await task_queue.enqueue(
                 func=push_processor.process,
                 event_type="push",
                 payload=event.payload,
             )
@@
-        except ImportError:
-            # Deployment processor may not exist yet
-            log.warning("deployment_processor_not_found")
-            return WebhookResponse(status="ok", detail="Push acknowledged (no processor)", event_type=EventType.PUSH)
         except Exception as e:
             log.error("push_processing_failed", error=str(e), exc_info=True)
             return WebhookResponse(
                 status="error", detail=f"Push processing failed: {str(e)}", event_type=EventType.PUSH
             )
🤖 Prompt for AI Agents
In `@src/webhooks/handlers/push.py` around lines 4 - 11, The module-level import
and instantiation of PushProcessor (push_processor = PushProcessor()) makes
ImportError handling unreachable; change to a lazy/optional import inside the
handler (e.g., import PushProcessor within handle() and instantiate there if not
already created) or remove the unused except ImportError block so errors can be
caught at runtime, and correct the mistaken comment that reads "Deployment
processor" to "Push processor"; update references to PushProcessor and
push_processor in the handle() flow so the processor is only created after a
successful import.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/webhooks/handlers/pull_request.py`:
- Around line 47-53: The enqueue call to task_queue.enqueue is not passing the
Task object, so get_pr_processor().process (PullRequestProcessor.process) will
be called without the required task parameter; update the enqueue invocation in
the pull request handler to pass the task when enqueueing (e.g., include
task=task or pass task as a positional arg to task_queue.enqueue) so that the
processor's process(self, task: Task) receives the Task instance.

In `@tests/conftest.py`:
- Around line 33-49: The autouse fixture mock_settings and Helpers.mock_env are
set too late to prevent import-time side effects (e.g., module-level env reads
or calls like get_chat_model during import); move the environment patching into
a pytest_configure hook so the environment variables are set before test
collection/imports, and keep/mock_settings (or a new fixture) only to expose a
handle/cleanup if needed; update or add a pytest_configure function that calls
Helpers.mock_env with the same keys (APP_CLIENT_ID_GITHUB,
APP_CLIENT_SECRET_GITHUB, WEBHOOK_SECRET_GITHUB, PRIVATE_KEY_BASE64_GITHUB,
AI_PROVIDER, OPENAI_API_KEY, ENVIRONMENT) and ensure teardown/restoration occurs
after tests.
🧹 Nitpick comments (2)
src/webhooks/handlers/pull_request.py (1)

22-23: Prefer direct enum comparison over string comparison.

Using event.event_type == EventType.PULL_REQUEST is more type-safe and idiomatic than comparing .name to a string literal. This ensures compile-time checking and avoids potential typos.

♻️ Suggested refactor
     async def can_handle(self, event: WebhookEvent) -> bool:
-        return event.event_type.name == "PULL_REQUEST"
+        return event.event_type == EventType.PULL_REQUEST
tests/conftest.py (1)

19-31: Tighten typing for Helpers.mock_env to fit strict mode.

♻️ Suggested typing improvements
-from typing import Any
+from collections.abc import Mapping
+from typing import ContextManager
@@
 class Helpers:
     `@staticmethod`
-    def mock_env(env_vars) -> "Any":
+    def mock_env(env_vars: Mapping[str, str]) -> ContextManager[Mapping[str, str]]:
         from unittest.mock import patch
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e313021 and 63b3e2b.

📒 Files selected for processing (3)
  • .github/workflows/tests.yaml
  • src/webhooks/handlers/pull_request.py
  • tests/conftest.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/webhooks/handlers/pull_request.py (3)
src/core/models.py (4)
  • EventType (90-104)
  • WebhookEvent (115-137)
  • WebhookResponse (107-112)
  • repo_full_name (130-132)
src/event_processors/pull_request/processor.py (2)
  • PullRequestProcessor (17-228)
  • process (29-186)
src/tasks/task_queue.py (2)
  • repo_full_name (33-38)
  • enqueue (106-126)
🔇 Additional comments (7)
src/webhooks/handlers/pull_request.py (4)

1-10: LGTM on imports and logger setup.

Clean imports with structlog for structured logging. The lazy import pattern and explicit type imports from src.core.models are well organized.


13-16: LGTM on lazy singleton pattern.

Using @lru_cache(maxsize=1) for lazy processor instantiation is a clean approach that ensures the processor is created only when first needed and reused thereafter.


37-43: Good action filtering implementation.

The action filter appropriately limits processing to meaningful PR events (opened, synchronize, reopened, edited), reducing unnecessary processing and improving observability with the pr_action_ignored log entry.


66-70: Good exception handling with appropriate error response.

The try-except block properly catches exceptions, logs them with exc_info=True for stack traces, and returns a structured error response. This ensures the webhook handler degrades gracefully without crashing.

.github/workflows/tests.yaml (1)

41-43: CI mock env setup looks good.

Sets explicit mock values to keep tests isolated.

tests/conftest.py (2)

12-16: Path bootstrap is fine for tests.


55-62: Event loop fixture looks fine.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +33 to +49
# 3. Mock Environment Variables (Security First)
# We do this BEFORE importing app code to ensure no real secrets are read
@pytest.fixture(autouse=True)
def mock_settings():
"""Forces the test environment to use dummy values."""
with Helpers.mock_env(
{
"APP_CLIENT_ID_GITHUB": "mock-client-id",
"APP_CLIENT_SECRET_GITHUB": "mock-client-secret",
"WEBHOOK_SECRET_GITHUB": "mock-webhook-secret",
"PRIVATE_KEY_BASE64_GITHUB": "bW9jay1rZXk=", # "mock-key" in base64 # gitleaks:allow
"AI_PROVIDER": "openai",
"OPENAI_API_KEY": "sk-mock-key", # gitleaks:allow
"ENVIRONMENT": "test",
}
):
yield
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Autouse fixtures won’t protect import-time side effects.

These fixtures run after test collection, so module-level env reads or get_chat_model calls during import can still see real values or trigger network calls. Start the patches before collection (e.g., pytest_configure) and keep fixtures only to expose handles if needed.

🛠️ One way to apply mocks before collection
+from unittest.mock import MagicMock, patch
@@
-@pytest.fixture(autouse=True)
-def mock_settings():
-    """Forces the test environment to use dummy values."""
-    with Helpers.mock_env(
-        {
-            "APP_CLIENT_ID_GITHUB": "mock-client-id",
-            "APP_CLIENT_SECRET_GITHUB": "mock-client-secret",
-            "WEBHOOK_SECRET_GITHUB": "mock-webhook-secret",
-            "PRIVATE_KEY_BASE64_GITHUB": "bW9jay1rZXk=",  # "mock-key" in base64 # gitleaks:allow
-            "AI_PROVIDER": "openai",
-            "OPENAI_API_KEY": "sk-mock-key",  # gitleaks:allow
-            "ENVIRONMENT": "test",
-        }
-    ):
-        yield
+MOCK_ENV = {
+    "APP_CLIENT_ID_GITHUB": "mock-client-id",
+    "APP_CLIENT_SECRET_GITHUB": "mock-client-secret",
+    "WEBHOOK_SECRET_GITHUB": "mock-webhook-secret",
+    "PRIVATE_KEY_BASE64_GITHUB": "bW9jay1rZXk=",  # "mock-key" in base64 # gitleaks:allow
+    "AI_PROVIDER": "openai",
+    "OPENAI_API_KEY": "sk-mock-key",  # gitleaks:allow
+    "ENVIRONMENT": "test",
+}
+
+_env_patcher = patch.dict(os.environ, MOCK_ENV, clear=False)
+_mock_model = MagicMock()
+_mock_model.invoke.return_value.content = "Mocked LLM response"
+_chat_patcher = patch(
+    "src.integrations.providers.factory.get_chat_model",
+    return_value=_mock_model,
+)
+
+def pytest_configure():
+    _env_patcher.start()
+    _chat_patcher.start()
+
+def pytest_unconfigure(config):
+    _chat_patcher.stop()
+    _env_patcher.stop()

Also applies to: 65-80

🤖 Prompt for AI Agents
In `@tests/conftest.py` around lines 33 - 49, The autouse fixture mock_settings
and Helpers.mock_env are set too late to prevent import-time side effects (e.g.,
module-level env reads or calls like get_chat_model during import); move the
environment patching into a pytest_configure hook so the environment variables
are set before test collection/imports, and keep/mock_settings (or a new
fixture) only to expose a handle/cleanup if needed; update or add a
pytest_configure function that calls Helpers.mock_env with the same keys
(APP_CLIENT_ID_GITHUB, APP_CLIENT_SECRET_GITHUB, WEBHOOK_SECRET_GITHUB,
PRIVATE_KEY_BASE64_GITHUB, AI_PROVIDER, OPENAI_API_KEY, ENVIRONMENT) and ensure
teardown/restoration occurs after tests.

…, webhook dedup, welcome comment, installation auth

- rules: pass Rule objects to engine so .conditions are preserved; remove _convert_rules_to_new_format
- rules: add RequireCodeOwnerReviewersCondition and PathHasCodeOwnerCondition; enricher fetches CODEOWNERS
- rules: add MaxPrLocCondition, RequireLinkedIssueCondition; loader normalizes max_changed_lines -> max_lines
- webhooks: add delivery_id (X-GitHub-Delivery) to WebhookEvent; task_id uses delivery_id+func for dedup so processor runs and comments post
- pr: when rules not configured, post welcome comment with watchflow.dev/analyze?installation_id=&repo=
- api: when installation_id in payload and no token, use installation token so PAT not required from install link
- dev: DEVELOPMENT.md test instructions, justfile test-local, pyproject respx, .watchflow/rules.yaml examples; unit tests for new conditions and task_queue

Signed-off-by: Dimitris Kargatzis <dkargatzis@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Line 41: The ruff dependency constraint "ruff>=0.1.0" in pyproject.toml is too
permissive; update that requirement to a tighter range reflecting the minimum
tested version and an upper bound to avoid pre-1.0 breaking changes (e.g.,
replace "ruff>=0.1.0" with something like "ruff>=0.5.0,<1.0.0" or set the exact
minimum you verified), by editing the dependency string in pyproject.toml so the
project installs a supported Ruff version.

In `@src/event_processors/pull_request/processor.py`:
- Around line 106-134: previous_acknowledgments is keyed by rule_id but the code
compares against violation.rule_description, so acknowledgments never match; fix
by ensuring each Violation carries the originating Rule.rule_id (e.g., populate
violation.details["rule_id"] when violations are constructed in the rule
evaluation path) and then change the lookup in processor.py to check
previous_acknowledgments using that stored rule_id (use
violation.details.get("rule_id") or a new violation.rule_id property) instead of
violation.rule_description; alternatively implement a reverse mapping function
from Rule.description to Rule.rule_id and use it inside the loop, but the
preferred fix is to store Rule.rule_id on the Violation at creation and
reference that in the acknowledgable check for previous_acknowledgments.

In `@src/presentation/github_formatter.py`:
- Around line 77-90: The grouping currently omits the "info" severity so
INFO-level violations get misclassified or dropped; update the severity_order
and grouping logic used in format_violations_comment and format_check_run_output
to include "info" (e.g., add "info" into the severity_order list and ensure
severity_groups initialization covers "info"), and adjust the fallback logic
that maps unknown severities so INFO is not forced into "low" — reference the
severity_order variable, severity_groups dict, and the loop over violations to
make the change consistently in both locations.

In `@src/webhooks/handlers/pull_request.py`:
- Around line 49-55: The handler enqueues get_pr_processor().process with a
WebhookEvent causing a signature mismatch because PullRequestProcessor.process
expects a Task; fix by constructing and passing a Task object to
task_queue.enqueue (instead of the raw event) so the processor receives the
expected Task containing event.payload and delivery_id, e.g. create a Task with
the name "pull_request", payload=event.payload and delivery_id=getattr(event,
"delivery_id", None) and pass that Task as the argument to task_queue.enqueue
calling task_queue.enqueue(get_pr_processor().process, task), then ensure
PullRequestProcessor.process(task: Task) continues to read task.payload and
task.delivery_id.

In `@src/webhooks/router.py`:
- Around line 77-92: The response is using the raw header variable event_name
instead of the normalized event type from the created Event; update both
WebhookResponse constructions to use the normalized value (e.g., replace
event_type=event_name with event_type=event.event_type or the normalized
property returned by _create_event_from_request) in the success branch and the
202-except branch so the response.event_type matches the documented normalized
value.

In `@tests/unit/rules/conditions/test_access_control.py`:
- Around line 71-130: Patch DEFAULT_TEAM_MEMBERSHIPS in the AuthorTeamCondition
tests to make membership deterministic: in each test (or in a fixture)
monkeypatch the module-level DEFAULT_TEAM_MEMBERSHIPS to include the mapping
that makes "devops-user" a member of "devops" before constructing
AuthorTeamCondition and running validate/evaluate; reference the constant name
DEFAULT_TEAM_MEMBERSHIPS and the class AuthorTeamCondition so the tests no
longer rely on external default constants changing.
🧹 Nitpick comments (8)
justfile (1)

44-47: Target is Windows-only; consider cross-platform support.

The test-local target uses Windows-specific path syntax and will fail on Unix-like systems. While the comment documents the Unix alternative, a cross-platform target would improve developer experience.

♻️ Proposed cross-platform fix using justfile's `os()` function
 # Run pytest with this repo's venv (avoids wrong interpreter from another project)
-# Windows: just test-local   |  Unix: ./.venv/bin/python -m pytest tests/ -v
 test-local *args:
-    .\.venv\Scripts\python.exe -m pytest {{args}}
+    {{ if os() == "windows" { ".\.venv\Scripts\python.exe" } else { "./.venv/bin/python" } }} -m pytest {{args}}
src/agents/repository_analysis_agent/prompts.py (1)

28-31: Make JSON-only output explicit to avoid parser failures.

Consider adding an explicit “JSON only, no markdown/prose” directive so downstream parsers don’t choke on extra text.

Proposed tweak
-Return JSON matching the RuleRecommendation schema. For each rule include: validator name, configuration (if applicable), triggering hygiene metric(s), and a short rationale (1–2 sentences).
+Return JSON matching the RuleRecommendation schema. For each rule include: validator name, configuration (if applicable), triggering hygiene metric(s), and a short rationale (1–2 sentences). Output only raw JSON—no markdown, no code fences, no extra text.
.watchflow/rules.yaml (1)

29-31: Empty critical_owners list renders this rule ineffective.

The rule is enabled but critical_owners: [] is empty, meaning no files will be treated as critical. Consider either:

  1. Adding actual critical owner identifiers to make the rule functional, or
  2. Setting enabled: false until the list is populated.
src/agents/feasibility_agent/nodes.py (1)

83-99: Consider building a lookup dict to avoid O(n×m) iteration.

The nested loop performs a linear search through AVAILABLE_CONDITIONS for each chosen validator. While likely negligible for small catalogs, a dict lookup would be more efficient and slightly cleaner.

♻️ Optional optimization
+        # Build lookup for efficient access
+        conditions_by_name = {c.name: c for c in AVAILABLE_CONDITIONS}
+
         validator_parameters_lines = []
         for name in state.chosen_validators:
-            for condition_cls in AVAILABLE_CONDITIONS:
-                if condition_cls.name == name:
-                    keys = getattr(condition_cls, "parameter_patterns", []) or []
-                    examples = getattr(condition_cls, "examples", None) or []
-                    line = f"- {name}: use only parameter keys {keys}"
-                    if examples:
-                        line += f"; example(s): {examples[0]}"
-                    validator_parameters_lines.append(line)
-                    break
+            condition_cls = conditions_by_name.get(name)
+            if condition_cls:
+                keys = getattr(condition_cls, "parameter_patterns", []) or []
+                examples = getattr(condition_cls, "examples", None) or []
+                line = f"- {name}: use only parameter keys {keys}"
+                if examples:
+                    line += f"; example(s): {examples[0]}"
+                validator_parameters_lines.append(line)
DEVELOPMENT.md (1)

272-304: Minor duplication in test instructions.

The test instructions section has some redundancy. Lines 280-282 show uv sync --all-extras and uv run pytest, then lines 293-303 repeat essentially the same commands with slightly different formatting. Consider consolidating into a single clear block to avoid confusion.

♻️ Suggested consolidation
-```bash
-# Install deps (matches CI)
-uv sync --all-extras
-
-# Run all tests (same as GitHub Action)
-uv run pytest tests/unit/ tests/integration/ -v
-
-# Run only unit tests
-uv run pytest tests/unit/ -v
-
-# Run only integration tests
-uv run pytest tests/integration/ -v
-```
+The commands above (`uv sync --all-extras` and `uv run pytest ...`) are the same as CI. See the previous section for the full workflow.
src/rules/loaders/github_loader.py (1)

94-102: Consider removing the original alias after normalization.

The alias normalization adds max_lines but leaves max_changed_lines in the parameters dict. This could cause confusion or unexpected behavior if downstream code checks for max_changed_lines.

♻️ Suggested fix
         # Normalize aliases so conditions match (e.g. max_changed_lines -> max_lines for MaxPrLocCondition)
         if "max_changed_lines" in parameters and "max_lines" not in parameters:
             parameters["max_lines"] = parameters["max_changed_lines"]
+            del parameters["max_changed_lines"]
src/core/models.py (1)

83-85: RuleConfig severity doesn't use the Severity enum.

RuleConfig.severity uses a Literal with mixed severity scales ("info", "warning", "error" AND "low", "medium", "high", "critical"), while a Severity enum is defined at lines 8-18 with only INFO, LOW, MEDIUM, HIGH, CRITICAL. Consider using the Severity enum for consistency, or align the Literal values with a single scale.

♻️ Suggested fix using enum
-    severity: Literal["info", "warning", "error", "low", "medium", "high", "critical"] = Field(
-        "medium", description="Severity level of the rule"
-    )
+    severity: Severity = Field(
+        default=Severity.MEDIUM, description="Severity level of the rule"
+    )
src/event_processors/pull_request/processor.py (1)

196-211: Consider removing _convert_rules_to_new_format if unused.
It’s currently not invoked; if it’s dead code, dropping it reduces maintenance surface.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63b3e2b and 61cfbcd.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (31)
  • .watchflow/rules.yaml
  • DEVELOPMENT.md
  • justfile
  • pyproject.toml
  • src/agents/feasibility_agent/nodes.py
  • src/agents/feasibility_agent/prompts.py
  • src/agents/repository_analysis_agent/prompts.py
  • src/api/recommendations.py
  • src/core/models.py
  • src/event_processors/pull_request/enricher.py
  • src/event_processors/pull_request/processor.py
  • src/presentation/github_formatter.py
  • src/rules/acknowledgment.py
  • src/rules/conditions/__init__.py
  • src/rules/conditions/access_control.py
  • src/rules/conditions/filesystem.py
  • src/rules/conditions/pull_request.py
  • src/rules/loaders/github_loader.py
  • src/rules/registry.py
  • src/rules/utils/__init__.py
  • src/rules/utils/codeowners.py
  • src/tasks/task_queue.py
  • src/webhooks/dispatcher.py
  • src/webhooks/handlers/pull_request.py
  • src/webhooks/handlers/push.py
  • src/webhooks/router.py
  • tests/unit/rules/conditions/test_access_control.py
  • tests/unit/rules/conditions/test_filesystem.py
  • tests/unit/rules/conditions/test_pull_request.py
  • tests/unit/rules/test_acknowledgment.py
  • tests/unit/tasks/test_queue.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/rules/registry.py
  • tests/unit/tasks/test_queue.py
  • src/rules/conditions/init.py
  • src/rules/utils/codeowners.py
🧰 Additional context used
🧬 Code graph analysis (19)
src/rules/utils/__init__.py (1)
src/rules/utils/codeowners.py (1)
  • path_has_owner (164-176)
src/webhooks/router.py (3)
src/core/models.py (2)
  • EventType (90-104)
  • WebhookResponse (107-112)
src/webhooks/auth.py (1)
  • verify_github_signature (23-58)
src/webhooks/models.py (2)
  • GitHubEventModel (23-28)
  • WebhookResponse (31-36)
tests/unit/rules/conditions/test_pull_request.py (1)
src/rules/conditions/pull_request.py (14)
  • MinApprovalsCondition (257-300)
  • MinDescriptionLengthCondition (100-177)
  • RequiredLabelsCondition (180-254)
  • RequireLinkedIssueCondition (310-367)
  • validate (77-97)
  • validate (158-177)
  • validate (234-254)
  • validate (290-300)
  • validate (355-367)
  • evaluate (26-75)
  • evaluate (109-156)
  • evaluate (189-232)
  • evaluate (266-288)
  • evaluate (319-353)
tests/unit/rules/conditions/test_filesystem.py (1)
src/rules/conditions/filesystem.py (10)
  • FilePatternCondition (17-134)
  • MaxFileSizeCondition (137-220)
  • MaxPrLocCondition (223-275)
  • validate (95-116)
  • validate (193-220)
  • validate (267-275)
  • evaluate (29-93)
  • evaluate (146-191)
  • evaluate (232-265)
  • _glob_to_regex (131-134)
src/rules/loaders/github_loader.py (1)
src/rules/registry.py (2)
  • ConditionRegistry (78-115)
  • get_conditions_for_parameters (87-115)
tests/unit/rules/conditions/test_access_control.py (1)
src/rules/conditions/access_control.py (18)
  • AuthorTeamCondition (18-96)
  • CodeOwnersCondition (99-184)
  • NoForcePushCondition (438-474)
  • PathHasCodeOwnerCondition (205-269)
  • ProtectedBranchesCondition (363-435)
  • RequireCodeOwnerReviewersCondition (319-360)
  • validate (81-96)
  • validate (150-170)
  • validate (251-269)
  • validate (353-360)
  • validate (415-435)
  • validate (471-474)
  • evaluate (27-79)
  • evaluate (112-148)
  • evaluate (214-249)
  • evaluate (328-351)
  • evaluate (372-413)
  • evaluate (446-469)
src/tasks/task_queue.py (4)
src/webhooks/handlers/issue_comment.py (1)
  • event_type (18-19)
src/core/models.py (1)
  • repo_full_name (136-138)
src/core/utils/caching.py (2)
  • get (44-66)
  • clear (91-95)
tests/unit/tasks/test_queue.py (1)
  • queue (13-15)
src/webhooks/dispatcher.py (3)
src/core/models.py (2)
  • EventType (90-104)
  • WebhookEvent (115-143)
src/tasks/task_queue.py (1)
  • enqueue (121-142)
src/core/utils/caching.py (1)
  • get (44-66)
src/api/recommendations.py (7)
src/agents/repository_analysis_agent/agent.py (2)
  • RepositoryAnalysisAgent (14-115)
  • execute (59-115)
src/api/dependencies.py (1)
  • get_current_user_optional (23-54)
src/api/rate_limit.py (1)
  • rate_limiter (25-46)
src/core/models.py (2)
  • User (44-55)
  • repo_full_name (136-138)
src/agents/repository_analysis_agent/nodes.py (1)
  • generate_analysis_report (630-698)
tests/integration/test_rules_api.py (1)
  • client (21-23)
src/integrations/github/api.py (6)
  • get_installation_access_token (98-128)
  • get_repository (130-145)
  • get_git_ref_sha (836-850)
  • create_git_ref (852-888)
  • create_or_update_file (918-966)
  • create_pull_request (968-1002)
src/webhooks/handlers/push.py (4)
src/core/models.py (4)
  • EventType (90-104)
  • WebhookEvent (115-143)
  • WebhookResponse (107-112)
  • repo_full_name (136-138)
src/webhooks/models.py (1)
  • WebhookResponse (31-36)
src/event_processors/push.py (2)
  • PushProcessor (14-187)
  • process (27-144)
src/tasks/task_queue.py (2)
  • repo_full_name (33-38)
  • enqueue (121-142)
src/presentation/github_formatter.py (1)
src/core/models.py (3)
  • Acknowledgment (33-41)
  • Severity (8-17)
  • Violation (20-30)
src/webhooks/handlers/pull_request.py (4)
src/core/models.py (4)
  • EventType (90-104)
  • WebhookEvent (115-143)
  • WebhookResponse (107-112)
  • repo_full_name (136-138)
src/webhooks/models.py (1)
  • WebhookResponse (31-36)
src/event_processors/pull_request/processor.py (2)
  • PullRequestProcessor (17-236)
  • process (29-194)
src/tasks/task_queue.py (2)
  • repo_full_name (33-38)
  • enqueue (121-142)
src/event_processors/pull_request/enricher.py (4)
src/core/models.py (2)
  • Acknowledgment (33-41)
  • repo_full_name (136-138)
src/rules/acknowledgment.py (2)
  • is_acknowledgment_comment (89-99)
  • parse_acknowledgment_comment (163-212)
src/tasks/task_queue.py (2)
  • repo_full_name (33-38)
  • installation_id (41-47)
src/integrations/github/api.py (4)
  • get_pull_request_reviews (473-499)
  • get_pull_request_files (501-527)
  • get_file_content (167-198)
  • get_issue_comments (665-691)
src/rules/acknowledgment.py (1)
src/core/models.py (1)
  • Acknowledgment (33-41)
src/rules/conditions/filesystem.py (2)
src/core/models.py (2)
  • Severity (8-17)
  • Violation (20-30)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
src/rules/conditions/pull_request.py (2)
src/core/models.py (2)
  • Severity (8-17)
  • Violation (20-30)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
tests/unit/rules/test_acknowledgment.py (2)
src/core/models.py (1)
  • Acknowledgment (33-41)
src/rules/acknowledgment.py (6)
  • RuleID (20-36)
  • extract_acknowledgment_reason (102-128)
  • is_acknowledgment_comment (89-99)
  • map_violation_text_to_rule_description (147-160)
  • map_violation_text_to_rule_id (131-144)
  • parse_acknowledgment_comment (163-212)
src/event_processors/pull_request/processor.py (5)
src/agents/factory.py (1)
  • get_agent (20-52)
src/core/models.py (2)
  • Violation (20-30)
  • repo_full_name (136-138)
src/integrations/github/check_runs.py (2)
  • CheckRunManager (11-121)
  • create_acknowledgment_check_run (66-121)
src/tasks/task_queue.py (3)
  • Task (19-47)
  • repo_full_name (33-38)
  • installation_id (41-47)
src/presentation/github_formatter.py (2)
  • format_rules_not_configured_comment (121-149)
  • format_violations_comment (152-182)
src/rules/conditions/access_control.py (2)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
src/rules/utils/codeowners.py (4)
  • is_critical_file (223-246)
  • path_has_owner (164-176)
  • CodeOwnersParser (15-161)
  • get_owners_for_file (54-78)
🪛 LanguageTool
DEVELOPMENT.md

[uncategorized] ~272-~272: The official name of this software platform is spelled with a capital “H”.
Context: ...ests CI runs tests the same way (see [.github/workflows/tests.yaml](.github/workflows...

(GITHUB)


[uncategorized] ~272-~272: The official name of this software platform is spelled with a capital “H”.
Context: ...e way (see .github/workflows/tests.yaml). To run tests lo...

(GITHUB)

🔇 Additional comments (83)
src/agents/repository_analysis_agent/prompts.py (1)

34-57: Prompt content is clear and well-scoped.

.watchflow/rules.yaml (2)

32-37: LGTM!

Well-structured rule enforcing CODEOWNERS involvement in PR reviews. This aligns with the RequireCodeOwnerReviewersCondition mentioned in the PR objectives.


44-44: Good: positive boolean naming improves readability.

Using no_force_push: true is clearer than the previous double negative allow_force_push: false. Verify that the rule loader and any associated conditions have been updated to recognize this renamed parameter.

[approve_code_changes, request_verification]

#!/bin/bash
# Description: Verify that the backend recognizes the `no_force_push` parameter.
# Expected: Find references to `no_force_push` in the rule loader or condition implementations.

echo "=== Searching for no_force_push parameter handling ==="
rg -n -C3 'no_force_push'

echo "=== Searching for allow_force_push (old parameter) to check migration ==="
rg -n 'allow_force_push'
src/agents/feasibility_agent/prompts.py (1)

22-50: LGTM! Parameter-key-driven prompt updates are well-structured.

The revised prompt clearly separates the validator metadata from the parameter keys the LLM should use, and the guidelines explicitly instruct the model not to add validator keys. This aligns well with the engine inferring validators from parameter names.

src/agents/feasibility_agent/nodes.py (1)

27-39: LGTM! Validator catalog construction is clean.

The defensive getattr for examples correctly handles optional attributes while relying on core attributes (name, event_types, parameter_patterns, description) being guaranteed by the condition base class.

src/rules/utils/__init__.py (1)

12-12: LGTM!

The new path_has_owner utility is correctly imported from the codeowners module and properly exported via __all__, following the established pattern in this file.

Also applies to: 28-28

src/rules/loaders/github_loader.py (1)

49-61: Good defensive validation added.

The added checks for proper YAML structure (dict with "rules" key, rules being a list, skipping non-dict entries) improve robustness when loading malformed rule files.

src/webhooks/handlers/push.py (1)

20-51: LGTM on the core handler logic.

The handler correctly:

  • Binds structured log context with relevant push event details
  • Enqueues processing via task queue with delivery_id for deduplication
  • Returns appropriate WebhookResponse statuses for success/duplicate cases
pyproject.toml (2)

52-85: Well-structured Ruff configuration.

The consolidated Ruff configuration is well-documented with clear comments explaining each rule category. The migration from black/isort/flake8 to Ruff simplifies the toolchain while maintaining equivalent functionality.


98-100: Good use of MyPy overrides for boto3.

The override to ignore missing imports for boto3.* is appropriate since boto3 has incomplete type stubs, avoiding false positive errors during type checking.

src/core/models.py (1)

146-203: Well-designed HygieneMetrics model.

The model is well-documented with clear field descriptions, and the field_validator correctly enforces the 0.0-1.0 range constraint across all rate fields. Good use of mode="before" to validate before Pydantic's internal processing.

src/webhooks/handlers/pull_request.py (2)

13-16: Good lazy singleton pattern.

Using @lru_cache(maxsize=1) for lazy instantiation of the processor is a clean pattern that defers initialization until first use while ensuring only one instance exists.


37-43: Good action filtering.

Filtering to only process relevant PR actions (opened, synchronize, reopened, edited) reduces unnecessary processing noise and follows good webhook handling practices.

src/rules/acknowledgment.py (6)

20-67: Clear, centralized RuleID definitions and mappings.
Nice consolidation for consistency and type safety.


69-86: Acknowledgment indicators/patterns look comprehensive.


89-99: LGTM — simple and readable indicator check.


131-144: Mapping logic is straightforward and efficient.


147-160: Description fallback behavior looks good.


163-211: Parsing flow is clear and matches expected comment structure.

tests/unit/rules/test_acknowledgment.py (6)

28-52: Good coverage of RuleID contract expectations.


54-80: Indicator detection tests are solid.


83-135: Nice coverage across acknowledgment reason patterns.


137-171: Mapping tests look thorough and representative.


173-185: Description mapping tests are clean and focused.


187-251: Parsing tests cover key scenarios and delimiters well.

src/rules/conditions/pull_request.py (4)

17-98: TitlePatternCondition logic is clear and robust.


180-255: RequiredLabelsCondition is well-structured and readable.


303-307: Issue reference pattern is clear and well-scoped.


310-367: RequireLinkedIssueCondition reads well and covers both title/body.

src/presentation/github_formatter.py (5)

8-23: Severity-to-emoji mappings are clean and consistent.


121-149: Rules-not-configured comment is clear and user-friendly.


185-198: Acknowledgment summary formatting is concise.


201-212: Check-run violation list formatting looks good.


215-266: Acknowledgment check-run output is well-structured.

src/api/recommendations.py (11)

24-45: AnalyzeRepoRequest schema is clear and well-documented.


47-66: AnalysisResponse fields cover the frontend needs well.


68-105: ProceedWithPRRequest schema is comprehensive and clear.


107-125: ProceedWithPRResponse is concise and complete.


131-158: Metric config and helper utilities read cleanly.


161-315: Analysis report generation is structured and readable.


318-385: PR body assembly is clear and consistent.


388-401: PR title logic is concise and appropriate.


404-420: URL parsing helper is simple and well-scoped.


608-797: Proceed-with-PR flow is coherent and well-structured.


492-495: The is_public=True flag is hardcoded but does not actually gate authentication—user_token is the real auth parameter.

The nodes (fetch_repository_metadata, fetch_pr_signals) properly use state.user_token for API calls, not is_public. Passing user_token (which the code does obtain for private repos) will work regardless of the hardcoded is_public=True. However, the flag is semantically incorrect and unused—consider either removing it or computing it based on repo visibility for clarity.

Likely an incorrect or invalid review comment.

src/rules/conditions/filesystem.py (4)

29-117: File pattern logic is solid (assuming changed files are populated).


130-134: Glob-to-regex helper is straightforward.


137-221: MaxFileSizeCondition looks good and well-scoped.


223-275: MaxPrLocCondition is clear and consistent.

src/rules/conditions/access_control.py (5)

187-269: PathHasCodeOwnerCondition logic is clear and consistent.


272-317: Required code owner calculation is clean and readable.


319-361: RequireCodeOwnerReviewersCondition is well-structured.


363-436: ProtectedBranchesCondition logic is straightforward.


438-474: NoForcePushCondition is concise and correct.

tests/unit/rules/conditions/test_access_control.py (5)

21-65: LGTM: NoForcePushCondition coverage looks solid.
Clear validate/evaluate coverage for forced vs normal push.


133-177: LGTM: CodeOwnersCondition tests cover critical vs non-critical paths.


179-235: LGTM: PathHasCodeOwnerCondition cases are well covered.


237-314: LGTM: RequireCodeOwnerReviewersCondition test matrix is thorough.


316-377: LGTM: ProtectedBranchesCondition coverage is complete.

tests/unit/rules/conditions/test_filesystem.py (3)

17-98: LGTM: FilePatternCondition coverage is comprehensive.


100-194: LGTM: MaxFileSizeCondition tests look solid.


196-292: LGTM: MaxPrLocCondition edge cases are well exercised.

tests/unit/rules/conditions/test_pull_request.py (5)

17-84: LGTM: MinApprovalsCondition tests cover core approval flows.


86-158: LGTM: TitlePatternCondition tests are well rounded.


160-232: LGTM: MinDescriptionLengthCondition tests cover empty/whitespace cases.


234-308: LGTM: RequiredLabelsCondition coverage is strong.


310-377: LGTM: RequireLinkedIssueCondition tests exercise the expected paths.

src/webhooks/router.py (1)

19-40: LGTM: Event normalization + unsupported handling is clean.

src/event_processors/pull_request/enricher.py (4)

22-90: LGTM: API fetch + enrichment flow is clear and practical.


92-113: LGTM: Acknowledgment parsing path is well-contained.


115-146: LGTM: Webhook data extraction is clean and structured.


148-171: LGTM: Diff summarization is concise and bounded.

src/event_processors/pull_request/processor.py (3)

17-28: LGTM: Processor init wiring is straightforward.


213-226: LGTM: Violation comment posting is guarded and safe.


227-236: LGTM: Webhook/API data delegation is clean.

src/webhooks/dispatcher.py (1)

24-27: No normalization needed; EventType is already a string Enum.

EventType inherits from both str and Enum, so EventType.PUSH is the string value "push". When callers register handlers with EventType enum values, they're already strings—the dict keys match what dispatch extracts. The concern about handler misses doesn't apply here.

src/tasks/task_queue.py (7)

1-16: LGTM!

Imports and configuration constants are appropriate. The LRU cache size, retry count, and backoff values are reasonable defaults for an in-memory task queue.


50-61: LGTM!

The transient error detection covers common failure modes (connection errors, timeouts, rate limits, server errors) using both type checking and message inspection.


72-98: LGTM!

The LRU-based deduplication cache with bounded size addresses memory leak concerns. The maxsize=100 on the queue provides appropriate backpressure, and the _add_to_dedup_cache correctly implements LRU eviction.


100-119: LGTM!

The task ID generation strategy is well-designed: using delivery_id with function qualname allows the same webhook to be processed by different handlers, while the fallback to payload hash handles cases without delivery IDs.


121-142: Race condition between dedup check and cache add remains unaddressed.

The _is_duplicate() check at line 133 and _add_to_dedup_cache() at line 139 are not atomic. The await self.queue.put() at line 138 yields control, allowing concurrent callers with the same task_id to both pass the duplicate check before either adds to the cache.

Use an asyncio.Lock to make the check-and-reserve operation atomic, reserving the task_id before yielding to queue.put().

[raise_major_issue, duplicate_comment]

🔒 Proposed fix
     def __init__(self, max_dedup_size: int = MAX_DEDUP_CACHE_SIZE) -> None:
         self.queue: asyncio.Queue[Task] = asyncio.Queue(maxsize=100)
         self._dedup_cache: OrderedDict[str, bool] = OrderedDict()
         self._max_dedup_size = max_dedup_size
+        self._dedup_lock = asyncio.Lock()
         self.workers: list[asyncio.Task[None]] = []
     async def enqueue(
         self,
         func: Callable[..., Coroutine[Any, Any, Any]],
         event_type: str,
         payload: dict[str, Any],
         *args: Any,
         delivery_id: str | None = None,
         **kwargs: Any,
     ) -> bool:
         """Adds a task to the queue if it is not a duplicate."""
         task_id = self._generate_task_id(event_type, payload, delivery_id=delivery_id, func=func)
 
-        if self._is_duplicate(task_id):
-            logger.info("task_skipped_duplicate", task_id=task_id, event_type=event_type)
-            return False
+        async with self._dedup_lock:
+            if self._is_duplicate(task_id):
+                logger.info("task_skipped_duplicate", task_id=task_id, event_type=event_type)
+                return False
+            self._add_to_dedup_cache(task_id)
 
         task = Task(task_id=task_id, event_type=event_type, payload=payload, func=func, args=args, kwargs=kwargs)
-        await self.queue.put(task)
-        self._add_to_dedup_cache(task_id)
+        try:
+            await self.queue.put(task)
+        except Exception:
+            async with self._dedup_lock:
+                self._dedup_cache.pop(task_id, None)
+            raise
 
         logger.info("task_enqueued", task_id=task_id, event_type=event_type)
         return True

177-189: task_completed logged even when all retries are exhausted.

_execute_with_retry() handles exceptions internally and always returns (doesn't re-raise). Therefore, line 184 logs task_completed regardless of whether the task succeeded or failed after retries.

[raise_minor_issue, duplicate_comment]

🛠️ Proposed fix

Return a success indicator from _execute_with_retry:

-    async def _execute_with_retry(self, task: Task) -> None:
+    async def _execute_with_retry(self, task: Task) -> bool:
         """Execute task with exponential backoff retry for transient failures."""
         last_error: Exception | None = None
 
         for attempt in range(MAX_RETRIES + 1):
             try:
                 await task.func(*task.args, **task.kwargs)
                 if attempt > 0:
                     logger.info("task_retry_succeeded", task_id=task.task_id, attempt=attempt + 1)
-                return
+                return True
             except Exception as e:
                 # ... existing retry logic ...
                 else:
                     break
 
         logger.error(
             "task_failed",
             task_id=task.task_id,
             error=str(last_error),
-            attempts=min(task.retry_count + 1, MAX_RETRIES + 1),
+            attempts=attempt + 1,
             exc_info=True,
         )
+        return False

Then in _worker:

             logger.info("task_started", task_id=task.task_id, event_type=task.event_type)
-            await self._execute_with_retry(task)
-            logger.info("task_completed", task_id=task.task_id)
+            success = await self._execute_with_retry(task)
+            if success:
+                logger.info("task_completed", task_id=task.task_id)

191-219: LGTM!

Worker lifecycle management is well-implemented with proper cancellation handling. The get_stats method provides useful observability data for health checks.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

"respx>=0.20.0",
"mypy>=1.7.0",
"pre-commit>=3.5.0",
"ruff>=0.1.0", # Replaces black, isort, flake8
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Ruff Python linter current version and release history

💡 Result:

Current Ruff version (as of January 31, 2026)

  • ruff 0.14.14, released 2026-01-22 (GitHub “Latest” + PyPI latest). [1][2]

Recent release history (latest first)

  • 0.14.142026-01-22 [1][2]
  • 0.14.132026-01-15 (note: 0.14.12 was published to PyPI but did not get a Git tag/release; 0.14.13 is identical to 0.14.12). [1]
  • 0.14.112026-01-08 [1]
  • 0.14.102025-12-18 [1]
  • 0.14.92025-12-11 [1]
  • 0.14.82025-12-04 [1]
  • 0.14.72025-11-28 [1]
  • 0.14.62025-11-21 [1]

Where to see the full release history / changelog

  • GitHub releases (release notes for every version): [1]
  • PyPI project page (published package versions): [2]

Sources

  • [1] astral-sh/ruff GitHub Releases page
  • [2] PyPI: ruff project page

🏁 Script executed:

# Check if we can find any information about Ruff versions locally or via pip
python3 << 'EOF'
import json
try:
    import urllib.request
    import urllib.error
    # Fetch Ruff package info from PyPI
    url = "https://pypi.org/pypi/ruff/json"
    try:
        with urllib.request.urlopen(url, timeout=5) as response:
            data = json.loads(response.read().decode())
            current_version = data['info']['version']
            releases = sorted(data['releases'].keys())
            print(f"Current Ruff version: {current_version}")
            print(f"Total releases: {len(releases)}")
            print(f"First 5 versions: {releases[:5]}")
            print(f"Last 5 versions: {releases[-5:]}")
    except urllib.error.URLError as e:
        print(f"Network error: {e}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: warestack/watchflow

Length of output: 205


Tighten Ruff version constraint—>=0.1.0 is overly permissive.

Ruff is currently at version 0.14.14 (as of January 2026), making v0.1.0 significantly outdated. The constraint >=0.1.0 will accept nearly any modern Ruff version, including those with potential breaking changes in minor version bumps (since Ruff is still pre-1.0). Update to a version that reflects the minimum tested with your code—for example, >=0.5.0 or higher—to ensure consistent behavior across developer environments.

🤖 Prompt for AI Agents
In `@pyproject.toml` at line 41, The ruff dependency constraint "ruff>=0.1.0" in
pyproject.toml is too permissive; update that requirement to a tighter range
reflecting the minimum tested version and an upper bound to avoid pre-1.0
breaking changes (e.g., replace "ruff>=0.1.0" with something like
"ruff>=0.5.0,<1.0.0" or set the exact minimum you verified), by editing the
dependency string in pyproject.toml so the project installs a supported Ruff
version.

Comment on lines +77 to +90
# Group violations by severity
severity_order = ["critical", "high", "medium", "low"]
severity_groups: dict[str, list[Violation]] = {s: [] for s in severity_order}

for violation in violations:
sev = violation.severity.value if hasattr(violation.severity, "value") else str(violation.severity)
if sev in severity_groups:
severity_groups[sev].append(violation)
else:
# Fallback for unexpected severities
if "low" not in severity_groups:
severity_groups["low"] = []
severity_groups["low"].append(violation)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Include INFO severity in grouping to avoid dropping it.
format_violations_comment ignores INFO today, and format_check_run_output buckets it as low.

🛠️ Suggested adjustment
-    severity_order = ["critical", "high", "medium", "low"]
+    severity_order = ["critical", "high", "medium", "low", "info"]
...
-    severity_order = ["critical", "high", "medium", "low"]
+    severity_order = ["critical", "high", "medium", "low", "info"]
...
-        if sev in severity_groups:
-            severity_groups[sev].append(violation)
+        if sev in severity_groups:
+            severity_groups[sev].append(violation)
+        else:
+            severity_groups["info"].append(violation)

Also applies to: 156-164

🤖 Prompt for AI Agents
In `@src/presentation/github_formatter.py` around lines 77 - 90, The grouping
currently omits the "info" severity so INFO-level violations get misclassified
or dropped; update the severity_order and grouping logic used in
format_violations_comment and format_check_run_output to include "info" (e.g.,
add "info" into the severity_order list and ensure severity_groups
initialization covers "info"), and adjust the fallback logic that maps unknown
severities so INFO is not forced into "low" — reference the severity_order
variable, severity_groups dict, and the loop over violations to make the change
consistently in both locations.

- README, docs/index, concepts/overview: immune system framing, supported params
- features, configuration, quick-start: condition-based rules, welcome comment
- DEVELOPMENT, CONTRIBUTING, LOCAL_SETUP: maintainer-first tone
- docs/README, benchmarks: consistency and mkdocs.yml fix (remove extra.js)
…lization

- Add rule_id to Violation/Rule and propagate from loader/engine for ack lookup
- Build Task in PR/push handlers so process(task: Task) receives correct type
- Return normalized event_type in WebhookResponse (success and 202 paths)
- Use direct attribute access for WebhookEvent.delivery_id and Violation fields
- Patch DEFAULT_TEAM_MEMBERSHIPS in AuthorTeamCondition tests for determinism

Signed-off-by: Dimitris Kargatzis <dkargatzis@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/agents/engine_agent/nodes.py (1)

162-203: ⚠️ Potential issue | 🟠 Major

Fix task/result misalignment when some VALIDATOR rules lack conditions

validator_rules includes rules without conditions, but validator_tasks only includes those with conditions. The enumerate(results) indexing then maps results to the wrong rule when any rule was skipped, which can mis-attribute violations and mutate the wrong rule’s strategy.

🛠️ Suggested fix (keep task→rule mapping aligned)
-        validator_tasks = []
+        validator_tasks = []
+        validator_task_rules = []
         for rule_desc in validator_rules:
             if rule_desc.conditions:
                 # NEW: Use attached conditions
-                task = _execute_conditions(rule_desc, state.event_data)
-                validator_tasks.append(task)
+                validator_tasks.append(_execute_conditions(rule_desc, state.event_data))
+                validator_task_rules.append(rule_desc)
             else:
                 logger.error(
                     f"❌ Rule '{rule_desc.description[:50]}...' set to VALIDATOR strategy but has no conditions attached."
                 )
                 state.analysis_steps.append(
                     f"❌ Configuration Error: Rule '{rule_desc.description[:30]}...' has VALIDATOR strategy but no conditions."
                 )
+                # Optional: ensure it can still be evaluated downstream
+                rule_desc.validation_strategy = ValidationStrategy.HYBRID
+                rule_desc.validator_name = None

         if validator_tasks:
             results = await asyncio.gather(*validator_tasks, return_exceptions=True)

             # Process results
             for i, result in enumerate(results):
+                rule_desc = validator_task_rules[i]
                 if isinstance(result, Exception):
-                    logger.error(f"❌ Validator failed for rule '{validator_rules[i].description[:50]}...': {result}")
+                    logger.error(f"❌ Validator failed for rule '{rule_desc.description[:50]}...': {result}")
                     # Fallback to LLM if validator fails
-                    validator_rules[i].validation_strategy = ValidationStrategy.LLM_REASONING
+                    rule_desc.validation_strategy = ValidationStrategy.LLM_REASONING
                 else:
                     result_dict = cast("dict[str, Any]", result)
                     if result_dict.get("is_violated", False):
                         if "violations" in result_dict:
                             # From _execute_conditions (returns list of violations)
                             state.violations.extend(result_dict["violations"])
                         elif "violation" in result_dict:
                             # From _execute_single_validator (returns single violation dict)
                             state.violations.append(result_dict["violation"])

-                        state.analysis_steps.append(f"⚡ Validator violation: {validator_rules[i].description[:50]}...")
+                        state.analysis_steps.append(f"⚡ Validator violation: {rule_desc.description[:50]}...")
                     else:
-                        state.analysis_steps.append(f"⚡ Validator passed: {validator_rules[i].description[:50]}...")
+                        state.analysis_steps.append(f"⚡ Validator passed: {rule_desc.description[:50]}...")

                     # Track validator usage
-                    validator_name = validator_rules[i].validator_name
+                    validator_name = rule_desc.validator_name
                     if validator_name:
                         state.validator_usage[validator_name] = state.validator_usage.get(validator_name, 0) + 1
src/event_processors/violation_acknowledgment.py (1)

331-388: ⚠️ Potential issue | 🟠 Major

Coerce agent output to Violation objects before downstream use.

The agent receives violations as dicts and returns them as dicts in acknowledgable_violations and require_fixes. However, _approve_violations_selectively (line 246, 430) and _reject_acknowledgment (line 494) access attributes like violation.message, violation.rule_id, and violation.rule_description, which will fail at runtime with AttributeError.

🔧 Suggested fix
-            acknowledgable_violations = evaluation_data.get("acknowledgable_violations", [])
-            require_fixes = evaluation_data.get("require_fixes", [])
+            def _coerce_violations(items: list[Any]) -> list[Violation]:
+                return [
+                    v if isinstance(v, Violation) else Violation.model_validate(v)
+                    for v in items
+                ]
+
+            acknowledgable_violations = _coerce_violations(
+                evaluation_data.get("acknowledgable_violations", [])
+            )
+            require_fixes = _coerce_violations(
+                evaluation_data.get("require_fixes", [])
+            )
🤖 Fix all issues with AI agents
In `@docs/benchmarks.md`:
- Line 9: The numeric formatting is inconsistent: change the spaced "70 +
enterprise policies" to "70+ enterprise policies" so it matches the later usage;
locate the paragraph string in docs/benchmarks.md and replace "70 + enterprise
policies" with "70+ enterprise policies" (ensure no extra spaces around the
plus).

In `@docs/getting-started/configuration.md`:
- Around line 13-208: Convert all bold section headers (e.g., "**Linked
issue**", "**Title pattern**", "**Description length**", "**Required labels**",
"**Min approvals**", "**Max PR size (lines)**", "**CODEOWNERS: require owners as
reviewers**", "**CODEOWNERS: path must have owner**", "**Critical paths / code
owners**", "**Protected branches**", "**No force push**", "**Max file size**",
"**File pattern**", "**Allowed hours, days, weekend**", and the example
headings) into proper markdown headings (suggested level: #### HeaderName)
throughout the Parameter reference and Example rules blocks, and fix the initial
rules table header and separator (the row starting "| Field | Required |
Description |" and the separator "|-------|----------|-------------|") so pipes
have consistent surrounding spaces and the separator uses balanced dashes for
each column to satisfy MD060/MD036.

In `@README.md`:
- Around line 37-53: The table cell for the title regex (the value of
title_pattern: "^feat\|^fix\|...") contains unescaped pipe characters which
break Markdown table parsing; update that cell to escape each pipe as \\| (e.g.,
title_pattern: "^feat\\|^fix\\|...") and scan other table cells for bare '|'
characters to escape them similarly so the README.md table renders cleanly.
- Around line 108-112: The table separator row in the README command table uses
compact pipes without surrounding spaces (the line showing |--------|--------|),
which triggers MD060; update that separator row to use spaced pipes (| --------
| -------- |) so the markdownlint "compact" style is satisfied and the table
aligns correctly.

In `@src/event_processors/pull_request/processor.py`:
- Around line 119-123: Replace the loose cast of eval_result.violations with
explicit model validation: when extracting eval_result from result.data (the
block that sets violations and checks hasattr(eval_result, "violations")),
iterate eval_result.violations and call Violation.model_validate(v) for each
item so violations becomes a list of validated Violation instances; update the
variable referenced as violations and ensure imports/reference to
Violation.model_validate are available.

In `@src/rules/loaders/github_loader.py`:
- Around line 104-109: The current logic in github_loader.py sets rule_id_val
from only the first matched condition (variables: conditions, rule_id_val) using
CONDITION_CLASS_TO_RULE_ID, which is ambiguous when a rule matches multiple
conditions and misses mappings for several AVAILABLE_CONDITIONS; update the
logic to explicitly derive rule IDs by mapping all entries in conditions through
CONDITION_CLASS_TO_RULE_ID, collect non-None IDs into a list (e.g., rule_ids),
and then resolve a single rule_id_val deterministically (either by returning the
full list, selecting the highest-priority ID based on AVAILABLE_CONDITIONS
order, or using a defined priority map), and ensure unmapped conditions are
handled/logged so the behavior is clear and reproducible.

In `@src/webhooks/dispatcher.py`:
- Around line 18-27: register_handler currently stores the incoming event_type
as-is which can be an Enum (EventType) causing dispatch to miss handlers that
lookup by the Enum.value string; update register_handler to normalize the key by
extracting a string value (e.g., use event_key = event_type.value if
hasattr(event_type, "value") else str(event_type)) and store handlers[event_key]
= handler so dispatch (which expects the string key) will find the handler;
reference the register_handler method and the handlers dict for the change.

In `@src/webhooks/router.py`:
- Around line 78-93: The handler returns WebhookResponse.status values
("success" and "received") that don't match the WebhookResponse Literal types;
change the success branch to return status="ok" and change the HTTPException 202
branch to return status="ignored" (keep event_type and other fields the same).
Update the returns around _create_event_from_request /
dispatcher_instance.dispatch and the HTTPException except block to use these
allowed literals so model validation won't fail.
🧹 Nitpick comments (9)
src/rules/loaders/github_loader.py (2)

60-61: Consider logging a warning when skipping non-dict rule entries.

Silently skipping malformed rule entries may make debugging YAML configuration issues harder. Other validation failures in this method log warnings.

📝 Suggested improvement
             for rule_data in rules_data["rules"]:
                 try:
                     if not isinstance(rule_data, dict):
+                        logger.warning(f"Skipping non-dict rule entry in {repository}/{rules_file_path}")
                         continue

97-99: Consider removing the aliased key after normalization.

After normalization, parameters will contain both max_changed_lines and max_lines with the same value. If downstream code or the Rule model doesn't expect the original alias, this could cause confusion or unintended behavior.

♻️ Suggested improvement to remove the alias after copying
         # Normalize aliases so conditions match (e.g. max_changed_lines -> max_lines for MaxPrLocCondition)
         if "max_changed_lines" in parameters and "max_lines" not in parameters:
             parameters["max_lines"] = parameters["max_changed_lines"]
+            del parameters["max_changed_lines"]
src/event_processors/push.py (2)

112-114: Redundant installation_id check.

Line 113 checks if task.installation_id is None, but this condition was already validated at line 55 with an early return. This check is unreachable in normal execution.

♻️ Proposed simplification
         if not sha or sha == "0000000000000000000000000000000000000000":
             logger.warning("No valid commit SHA found, skipping check run")
         else:
-            # Ensure installation_id is not None before passing to check_run_manager
-            if task.installation_id is None:
-                logger.warning("Missing installation_id for push event, cannot create check run")
-            else:
-                if violations:
+            if violations:
+                await self.check_run_manager.create_check_run(
+                    repo=task.repo_full_name,
+                    sha=sha,
+                    installation_id=task.installation_id,
+                    violations=violations,
+                )
+                api_calls += 1
+            else:
+                # Create passing check run if no violations
+                await self.check_run_manager.create_check_run(
+                    repo=task.repo_full_name,
+                    sha=sha,
+                    installation_id=task.installation_id,
+                    violations=[],
+                    conclusion="success",
+                )
+                api_calls += 1

147-168: _convert_rules_to_new_format is duplicated in PullRequestProcessor.

This method has identical logic in src/event_processors/pull_request/processor.py (lines 198-213). Consider extracting to a shared utility to maintain DRY principles.

src/webhooks/handlers/push.py (1)

36-48: Task passed to enqueue but processor expects task as first argument.

The handler builds a Task object and passes it as a positional argument to enqueue() (line 46). Looking at _execute_with_retry, it calls await task.func(*task.args, **task.kwargs), so the processor's process(task) will receive the Task object.

However, the task is also rebuilt internally by enqueue() at line 155 of task_queue.py, creating a different Task instance. The processor receives the original task from args, not the internally created one. This works but is confusing—the build_task call appears redundant since enqueue creates its own Task.

♻️ Simplify by removing redundant build_task
         try:
-            # Build Task so process(task: Task) receives the correct type (not WebhookEvent)
-            task = task_queue.build_task(
-                "push",
-                event.payload,
-                push_processor.process,
-                delivery_id=event.delivery_id,
-            )
+            # Build Task for the processor
+            task = task_queue.build_task(
+                "push",
+                event.payload,
+                push_processor.process,
+                delivery_id=event.delivery_id,
+            )
+            # Enqueue directly without calling enqueue's internal task creation
             enqueued = await task_queue.enqueue(
                 push_processor.process,
                 "push",
                 event.payload,
                 task,
                 delivery_id=event.delivery_id,
             )

Or update TaskQueue to accept a pre-built Task directly, avoiding the dual construction pattern.

src/agents/engine_agent/agent.py (1)

86-94: Broad exception catch may mask argument validation errors.

Catching Exception when constructing EngineRequest (line 93) could mask unexpected errors. Consider catching ValidationError specifically for Pydantic validation failures.

♻️ Proposed refinement
+        from pydantic import ValidationError
+
         try:
             # If request object is passed directly (future proofing)
             if "request" in kwargs and isinstance(kwargs["request"], EngineRequest):
                 request = kwargs["request"]
             else:
                 # Validate kwargs against EngineRequest
                 request = EngineRequest(**kwargs)
-        except Exception as e:
+        except ValidationError as e:
             return AgentResult(success=False, message=f"Invalid arguments for EngineAgent: {e}", data={})
src/event_processors/pull_request/processor.py (2)

63-65: Redundant installation_id check.

Line 64 checks if not installation_id but this was already validated at line 41 with an early return. This check is unreachable.

♻️ Remove redundant check
             github_token = github_token_optional
 
             # 1. Enrich event data
-            if not installation_id:
-                raise ValueError("Installation ID is required")
-
             event_data = await self.enricher.enrich_event_data(task, github_token)

198-213: _convert_rules_to_new_format duplicated from PushProcessor.

This method is identical to the one in src/event_processors/push.py (lines 147-168). Extract to a shared utility module to avoid drift and reduce maintenance burden.

src/webhooks/handlers/pull_request.py (1)

48-62: Same dual task construction issue as push handler.

The handler builds a Task via build_task() and passes it to enqueue(), which internally creates another Task with the same ID. The processor receives the original task from args, making the internal task construction redundant.

This mirrors the issue in src/webhooks/handlers/push.py and suggests the TaskQueue API could be simplified to either:

  1. Accept a pre-built Task directly, or
  2. Remove build_task and have enqueue be the sole task constructor
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61cfbcd and aab5d42.

📒 Files selected for processing (28)
  • CONTRIBUTING.md
  • DEVELOPMENT.md
  • LOCAL_SETUP.md
  • README.md
  • docs/README.md
  • docs/benchmarks.md
  • docs/concepts/overview.md
  • docs/features.md
  • docs/getting-started/configuration.md
  • docs/getting-started/quick-start.md
  • docs/index.md
  • mkdocs.yml
  • src/agents/engine_agent/agent.py
  • src/agents/engine_agent/models.py
  • src/agents/engine_agent/nodes.py
  • src/core/models.py
  • src/event_processors/pull_request/processor.py
  • src/event_processors/push.py
  • src/event_processors/violation_acknowledgment.py
  • src/rules/loaders/github_loader.py
  • src/rules/models.py
  • src/rules/registry.py
  • src/tasks/task_queue.py
  • src/webhooks/dispatcher.py
  • src/webhooks/handlers/pull_request.py
  • src/webhooks/handlers/push.py
  • src/webhooks/router.py
  • tests/unit/rules/conditions/test_access_control.py
✅ Files skipped from review due to trivial changes (1)
  • LOCAL_SETUP.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/rules/registry.py
🧰 Additional context used
🧬 Code graph analysis (14)
src/event_processors/pull_request/processor.py (10)
src/agents/factory.py (1)
  • get_agent (20-52)
src/core/models.py (2)
  • Violation (20-31)
  • repo_full_name (137-139)
src/event_processors/base.py (2)
  • BaseEventProcessor (26-61)
  • ProcessingResult (16-23)
src/event_processors/pull_request/enricher.py (4)
  • enrich_event_data (39-90)
  • fetch_acknowledgments (92-113)
  • prepare_webhook_data (115-146)
  • fetch_api_data (22-37)
src/integrations/github/check_runs.py (2)
  • CheckRunManager (11-121)
  • create_acknowledgment_check_run (66-121)
src/rules/loaders/github_loader.py (2)
  • RulesFileNotFoundError (22-25)
  • get_rules (37-77)
src/tasks/task_queue.py (3)
  • Task (19-47)
  • repo_full_name (33-38)
  • installation_id (41-47)
src/core/utils/caching.py (1)
  • get (44-66)
src/presentation/github_formatter.py (2)
  • format_rules_not_configured_comment (121-149)
  • format_violations_comment (152-182)
src/webhooks/handlers/issue_comment.py (1)
  • event_type (18-19)
src/rules/models.py (2)
src/core/models.py (1)
  • EventType (91-105)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
src/webhooks/handlers/pull_request.py (4)
src/core/models.py (4)
  • EventType (91-105)
  • WebhookEvent (116-144)
  • WebhookResponse (108-113)
  • repo_full_name (137-139)
src/webhooks/models.py (1)
  • WebhookResponse (31-36)
src/webhooks/handlers/push.py (2)
  • can_handle (17-18)
  • handle (20-67)
src/tasks/task_queue.py (3)
  • repo_full_name (33-38)
  • build_task (121-137)
  • enqueue (139-160)
src/event_processors/push.py (6)
src/core/models.py (3)
  • Severity (8-17)
  • Violation (20-31)
  • repo_full_name (137-139)
src/event_processors/base.py (2)
  • BaseEventProcessor (26-61)
  • ProcessingResult (16-23)
src/integrations/github/check_runs.py (1)
  • CheckRunManager (11-121)
src/agents/factory.py (1)
  • get_agent (20-52)
src/tasks/task_queue.py (2)
  • installation_id (41-47)
  • repo_full_name (33-38)
src/rules/loaders/github_loader.py (1)
  • get_rules (37-77)
src/webhooks/router.py (3)
src/core/models.py (3)
  • EventType (91-105)
  • WebhookEvent (116-144)
  • WebhookResponse (108-113)
src/webhooks/dispatcher.py (2)
  • WebhookDispatcher (12-52)
  • dispatch (29-52)
src/webhooks/models.py (2)
  • GitHubEventModel (23-28)
  • WebhookResponse (31-36)
src/rules/loaders/github_loader.py (1)
src/rules/registry.py (2)
  • ConditionRegistry (81-118)
  • get_conditions_for_parameters (90-118)
src/event_processors/violation_acknowledgment.py (8)
src/core/models.py (3)
  • Acknowledgment (34-42)
  • EventType (91-105)
  • Violation (20-31)
src/event_processors/base.py (5)
  • BaseEventProcessor (26-61)
  • ProcessingResult (16-23)
  • prepare_webhook_data (44-46)
  • prepare_api_data (49-51)
  • _get_rule_provider (53-55)
src/integrations/github/check_runs.py (2)
  • CheckRunManager (11-121)
  • create_acknowledgment_check_run (66-121)
src/rules/acknowledgment.py (1)
  • extract_acknowledgment_reason (102-128)
src/tasks/task_queue.py (2)
  • Task (19-47)
  • installation_id (41-47)
src/integrations/github/api.py (2)
  • get_pull_request (226-241)
  • create_check_run (205-224)
src/event_processors/push.py (2)
  • prepare_webhook_data (170-184)
  • prepare_api_data (186-188)
src/event_processors/deployment.py (2)
  • prepare_webhook_data (50-52)
  • prepare_api_data (54-56)
src/agents/engine_agent/agent.py (1)
src/agents/engine_agent/models.py (5)
  • EngineRequest (17-24)
  • EngineState (118-132)
  • RuleDescription (100-115)
  • ValidationStrategy (27-32)
  • ValidatorDescription (35-44)
src/webhooks/dispatcher.py (4)
src/core/models.py (2)
  • EventType (91-105)
  • WebhookEvent (116-144)
src/tasks/task_queue.py (2)
  • TaskQueue (64-233)
  • enqueue (139-160)
tests/unit/tasks/test_queue.py (1)
  • queue (13-15)
src/webhooks/handlers/issue_comment.py (1)
  • event_type (18-19)
src/tasks/task_queue.py (4)
src/webhooks/handlers/issue_comment.py (1)
  • event_type (18-19)
src/core/models.py (1)
  • repo_full_name (137-139)
src/core/utils/caching.py (2)
  • get (44-66)
  • clear (91-95)
tests/unit/tasks/test_queue.py (1)
  • queue (13-15)
src/webhooks/handlers/push.py (4)
src/core/models.py (4)
  • EventType (91-105)
  • WebhookEvent (116-144)
  • WebhookResponse (108-113)
  • repo_full_name (137-139)
src/webhooks/models.py (1)
  • WebhookResponse (31-36)
src/event_processors/push.py (2)
  • PushProcessor (14-188)
  • process (27-145)
src/tasks/task_queue.py (3)
  • repo_full_name (33-38)
  • build_task (121-137)
  • enqueue (139-160)
src/agents/engine_agent/nodes.py (4)
src/agents/engine_agent/models.py (4)
  • LLMEvaluationResponse (55-64)
  • RuleDescription (100-115)
  • StrategySelectionResponse (47-52)
  • ValidationStrategy (27-32)
src/agents/engine_agent/prompts.py (3)
  • create_llm_evaluation_prompt (79-108)
  • create_validation_strategy_prompt (37-76)
  • get_llm_evaluation_system_prompt (149-171)
src/integrations/providers/openai_provider.py (1)
  • get_chat_model (15-32)
src/integrations/providers/vertex_ai_provider.py (1)
  • get_chat_model (23-47)
src/agents/engine_agent/models.py (3)
src/core/models.py (1)
  • Violation (20-31)
src/rules/conditions/base.py (1)
  • BaseCondition (15-74)
src/rules/models.py (1)
  • Rule (50-64)
src/core/models.py (4)
src/webhooks/models.py (1)
  • WebhookResponse (31-36)
src/webhooks/handlers/issue_comment.py (1)
  • event_type (18-19)
src/core/utils/caching.py (1)
  • get (44-66)
src/tasks/task_queue.py (2)
  • installation_id (41-47)
  • repo_full_name (33-38)
🪛 LanguageTool
docs/concepts/overview.md

[uncategorized] ~48-~48: The official name of this software platform is spelled with a capital “H”.
Context: ...iews, CODEOWNERS file content (from .github/CODEOWNERS, CODEOWNERS, or `docs/COD...

(GITHUB)

docs/features.md

[style] ~114-~114: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... in conditions; you only edit YAML. - No separate dashboard — Everything stays...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

DEVELOPMENT.md

[uncategorized] ~274-~274: The official name of this software platform is spelled with a capital “H”.
Context: ...ests CI runs tests the same way (see [.github/workflows/tests.yaml](.github/workflows...

(GITHUB)


[uncategorized] ~274-~274: The official name of this software platform is spelled with a capital “H”.
Context: ...e way (see .github/workflows/tests.yaml). To run tests lo...

(GITHUB)

🪛 markdownlint-cli2 (0.20.0)
docs/features.md

[warning] 12-12: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 12-12: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 12-12: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 12-12: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 12-12: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 12-12: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 28-28: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 28-28: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 34-34: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 34-34: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 35-35: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 35-35: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 35-35: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 40-40: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 40-40: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 40-40: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 40-40: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 40-40: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 40-40: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 49-49: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 49-49: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 49-49: Table column style
Table pipe does not align with header for style "aligned"

(MD060, table-column-style)


[warning] 87-87: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 87-87: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 87-87: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 87-87: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)

docs/getting-started/configuration.md

[warning] 14-14: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 29-29: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 38-38: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 47-47: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 56-56: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 65-65: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 74-74: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 83-83: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 92-92: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 101-101: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 110-110: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 121-121: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 132-132: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 141-141: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 159-159: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 185-185: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 198-198: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/getting-started/quick-start.md

[warning] 25-25: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 25-25: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 25-25: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 25-25: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 31-31: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 31-31: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 32-32: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 32-32: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 33-33: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 33-33: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 37-37: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 37-37: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 38-38: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 38-38: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 39-39: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 39-39: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 40-40: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 40-40: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 44-44: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 44-44: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 45-45: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 45-45: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 46-46: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 46-46: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 47-47: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 47-47: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 73-73: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 73-73: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 74-74: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 74-74: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 75-75: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)


[warning] 75-75: Strong style
Expected: underscore; Actual: asterisk

(MD050, strong-style)

README.md

[warning] 38-38: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 38-38: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 38-38: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 38-38: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 38-38: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 38-38: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 38-38: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 38-38: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 109-109: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 109-109: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 109-109: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 109-109: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)

🔇 Additional comments (68)
src/rules/loaders/github_loader.py (3)

17-17: LGTM!

The new imports and updated docstring accurately reflect the integration with the ConditionRegistry for parameter-to-condition mapping.

Also applies to: 31-31


118-128: LGTM!

The Rule construction correctly incorporates the new conditions and rule_id fields, aligning with the PR objective to support acknowledgment lookup.


49-56: LGTM on the enhanced YAML validation.

The defensive validation ensuring rules_data is a dict with a "rules" key that is a list provides good protection against malformed YAML files and clear logging for debugging.

mkdocs.yml (1)

79-81: LGTM: remove unused extra JS reference

Nice cleanup to align the config with actual assets.

CONTRIBUTING.md (1)

3-83: Clear, engineer-focused contributor guidance

The structure and concrete commands are easy to follow and align well with the maintainer-first tone.

DEVELOPMENT.md (2)

5-6: Direction section reads well

Concise and consistent with the maintainer-first framing.


274-305: CI-like testing guidance is helpful

Good callouts around venv isolation and explicit interpreter usage.

src/rules/models.py (1)

23-68: LGTM: rule model updates and forward-ref rebuild

The new RuleCategory and BaseCondition-backed Rule typing look consistent, and the model_rebuild call is the right follow‑through.

src/agents/engine_agent/models.py (1)

17-137: Model surface updates look consistent

EngineRequest, RuleViolation inheritance, and RuleDescription/EngineState typing align cleanly with the updated Rule/Condition model.

src/tasks/task_queue.py (3)

19-48: LGTM on Task model design.

The Task Pydantic model with helper properties repo_full_name and installation_id provides clean, type-safe access to payload data. The arbitrary_types_allowed config properly supports the callable func field.


72-98: LGTM on LRU-based deduplication cache.

The OrderedDict-based LRU cache with configurable max size and proper eviction logic prevents unbounded memory growth. The _add_to_dedup_cache method correctly handles both new entries and existing key refreshes.


100-119: Task ID generation correctly supports delivery-based deduplication.

Including delivery_id and func.__qualname__ in the hash ensures distinct task IDs for different handlers processing the same webhook delivery, while falling back to payload hash when delivery ID is unavailable.

src/core/models.py (5)

8-18: LGTM on Severity enum.

The StrEnum provides clean serialization and comparison for severity levels, with a sensible progression from INFO to CRITICAL.


20-32: LGTM on Violation model.

The standardized Violation model with rule_id for acknowledgment lookup, Severity enum, and optional how_to_fix field provides a solid foundation for rule violation tracking.


45-57: LGTM on User model with secure token handling.

Using SecretStr with exclude=True for github_token ensures the token is never accidentally serialized or logged.


147-204: LGTM on HygieneMetrics with rate validation.

The centralized field_validator enforcing 0.0-1.0 range across rate fields prevents invalid metric values. Handling None for ai_generated_rate is appropriate since it's optional.


116-144: LGTM on WebhookEvent with delivery_id support.

Adding delivery_id from the X-GitHub-Delivery header enables proper deduplication. The safe accessor properties repo_full_name and sender_login with fallback to empty strings prevent KeyError on malformed payloads.

src/event_processors/push.py (2)

55-63: LGTM on early validation for missing installation_id.

Returning early with a clear error when installation_id is missing prevents downstream failures and provides actionable error context.


80-102: LGTM on resilient violation conversion.

Wrapping each violation conversion in try/except ensures a single malformed violation doesn't break the entire processing pipeline. The fallback to Severity.MEDIUM for invalid severity strings is appropriate.

src/agents/engine_agent/agent.py (4)

76-94: LGTM on execute with EngineRequest validation.

Validating **kwargs against EngineRequest provides strict typing while maintaining flexibility. Supporting both direct request object and keyword arguments enables future-proofing.


192-234: LGTM on _convert_rules_to_descriptions with dual-format support.

The method cleanly handles both Rule objects (preferred) and legacy dicts, extracting rule_id, conditions, and normalizing enum values. This ensures backward compatibility during the migration period.


252-270: LGTM on legacy evaluate method for backward compatibility.

The evaluate method delegates to execute and transforms the result, maintaining API compatibility for existing callers while internally using the new request-based flow.


236-250: No action needed. All condition classes in AVAILABLE_CONDITIONS properly define the required class attributes (name, description, parameter_patterns, event_types, examples) by inheriting from BaseCondition and overriding them with actual values. No AttributeError risk exists.

Likely an incorrect or invalid review comment.

src/event_processors/pull_request/processor.py (3)

29-49: LGTM on process method structure with early validation.

The method properly validates installation_id upfront and returns a clear error result when missing. Extracting common data (repo, sha, pr_number) at the start keeps the flow readable.


129-136: Acknowledgment matching now uses rule_id with fallback to rule_description.

Line 131 uses violation.rule_id or violation.rule_description as the lookup key, which addresses the previously flagged mismatch issue. This ensures violations with a proper rule_id will match acknowledgments, while still providing a fallback for violations without one.


179-196: LGTM on error handling with failure check run.

Creating a failure check run on exceptions ensures GitHub reflects the processing failure state, providing visibility to users.

src/webhooks/handlers/pull_request.py (3)

13-16: LGTM on lazy singleton processor with lru_cache.

Using @lru_cache(maxsize=1) provides lazy instantiation while ensuring a single processor instance is reused, avoiding repeated initialization overhead.


37-43: LGTM on action filtering.

Filtering to only process opened, synchronize, reopened, and edited actions reduces noise and unnecessary processing for actions like labeled or assigned.


25-35: LGTM on structured logging with context binding.

Binding repo, pr_number, and action to the logger provides excellent observability for debugging and monitoring PR processing flows.

src/webhooks/dispatcher.py (2)

1-9: Import/logging setup looks good.
Clean structlog integration and typed Callable/Coroutine imports.


29-52: Queue handoff + dedup flow reads clean.
The enqueue/duplicate handling and status returns are straightforward.

docs/concepts/overview.md (4)

3-10: Design principles section is crisp and aligned.
Clear positioning of deterministic rule enforcement and optional intelligence.


12-32: Flow diagram + steps are clear and actionable.
The end-to-end sequence is easy to follow.


33-53: Core components breakdown reads well.
Nice separation of loader/registry/enricher/queue responsibilities.


54-75: AI usage boundaries and use cases are well framed.
The “where AI is used” section keeps expectations clear.

docs/index.md (4)

5-20: Navigation cards text is concise and scannable.
Good improvement in clarity for quick entry points.


25-41: Problem statement and value prop read clearly.
Strong, repo-centric framing.


42-70: Key features + quick example are actionable.
The example YAML and CTA flow are solid.


71-79: Community links and closing tagline look good.
Nice compact ending.

docs/getting-started/quick-start.md (3)

3-20: Intro + prerequisites are clear and concise.
The “What you get” list reads well.


84-102: Commands + next steps are well structured.
Nice, compact reference section.


23-75: No markdown linting issue exists in this file. The quick-start.md uses ** consistently throughout (40 instances, 0 underscores), which satisfies the MD050 rule default of "consistent" style. The markdownlint configuration in coderabbit.markdownlint-cli2.jsonc does not explicitly configure MD050, so it uses the default behavior that allows any consistent style.

Likely an incorrect or invalid review comment.

docs/features.md (3)

3-7: Intro framing is concise and consistent with the new narrative.


92-116: Post-table sections read well and are easy to scan.


11-90: The tables in this section are properly formatted and consistent. The markdownlint configuration does not explicitly specify a table-column-style requirement (MD060), which defaults to style: "any", accepting any consistent formatting style. All tables in lines 11-90 use the same compact format throughout, so no MD060 violation exists.

Likely an incorrect or invalid review comment.

README.md (3)

5-31: Overview/How-it-works sections are strong and clear.
Good, concrete framing without extra fluff.


59-103: Rule format + quick start steps are well organized.
The example YAML and step list are easy to follow.


116-138: API + docs/support sections look good.
Clean and concise.

docs/getting-started/configuration.md (2)

3-6: Intro + pro tip are clear and useful.


212-248: Severity/event types/best practices sections read well.
Clear and actionable guidance.

docs/README.md (1)

3-3: Doc tone and link update look good.

src/webhooks/router.py (3)

1-9: Structlog + model imports align with the new structured responses.


13-40: Event normalization and 202 handling are solid.


48-75: Payload validation/logging flow looks clean and consistent.

src/event_processors/violation_acknowledgment.py (14)

3-10: Typed imports and centralized acknowledgment parsing are a good move.


12-18: TYPE_CHECKING guard and module-level state are fine.


24-34: Constructor wiring for agents and check-run manager is clear.


54-58: PR data/sha placeholders improve flow clarity.


108-122: PR fetch guard + SHA extraction look solid.


165-175: Typed violations list and logging are good.


204-208: Check-run update on no-violations case is sensible.


228-260: Acknowledgment check-run construction is straightforward.


284-292: Returning typed violations keeps the processor output consistent.


325-329: Centralized acknowledgment parsing is a nice consolidation.


413-455: Acknowledgment comment formatting reads well.


470-503: Rejection message formatting is consistent with approval flow.


512-516: Explicit return type on _post_comment is good for clarity.


527-531: Typed _get_rule_provider keeps mypy happy.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

### Context Dependency in Enterprise Policies

Our analysis of 70+ enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.
Our analysis of 70 + enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use consistent numeric formatting for “70+”

The spaced plus reads like arithmetic and is inconsistent with later “70+ enterprise policies.” Consider normalizing to “70+”.

✏️ Suggested edit
-Our analysis of 70 + enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.
+Our analysis of 70+ enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Our analysis of 70 + enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.
Our analysis of 70+ enterprise policies from major tech companies revealed a critical insight: **85% of real-world governance policies require context** and cannot be effectively enforced with traditional static rules.
🤖 Prompt for AI Agents
In `@docs/benchmarks.md` at line 9, The numeric formatting is inconsistent: change
the spaced "70 + enterprise policies" to "70+ enterprise policies" so it matches
the later usage; locate the paragraph string in docs/benchmarks.md and replace
"70 + enterprise policies" with "70+ enterprise policies" (ensure no extra
spaces around the plus).

Comment on lines +13 to 208
| Field | Required | Description |
|-------|----------|-------------|
| `description` | Yes | Short human-readable description (used in check runs and comments). |
| `enabled` | No | Default `true`. Set `false` to disable without deleting. |
| `severity` | No | `low` \| `medium` \| `high` \| `critical`. Drives presentation, not logic. |
| `event_types` | Yes | Events this rule runs on: `pull_request`, `push`, `deployment`, etc. |
| `parameters` | Yes | Key-value map. **Keys determine which condition runs** (e.g. `require_linked_issue`, `max_lines`). |

### Rule Components
The loader reads `.watchflow/rules.yaml` from the repo default branch and builds `Rule` objects with condition instances from the **condition registry**. Parameter names must match what the conditions expect; see below.

| Component | Description | Required | Type |
|---------------|-----------------------------------|----------|---------|
| `description` | Rule description | Yes | string |
| `enabled` | Whether rule is active | No | boolean |
| `severity` | Rule severity level | No | string |
| `event_types` | Applicable events | Yes | array |
| `parameters` | Rule parameters and configuration | Yes | object |
---

## Rule Examples
## Parameter reference (supported logic)

### Security Review Rule
### Pull request conditions

This rule ensures that security-sensitive changes are properly reviewed:
**Linked issue**

```yaml
rules:
- description: Security-sensitive changes require security team review
enabled: true
severity: critical
event_types: [pull_request]
parameters:
file_patterns: ["**/security/**", "**/auth/**", "**/config/security.yaml"]
required_teams: ["security-team"]
parameters:
require_linked_issue: true
```

### Deployment Protection Rule
PR must reference an issue (e.g. “Fixes #123”) in title or body.

This rule prevents deployments during restricted time periods:
**Title pattern**

```yaml
rules:
- description: Prevent deployments on weekends
enabled: true
severity: medium
event_types: [deployment]
parameters:
restricted_days: [Saturday, Sunday]
parameters:
title_pattern: "^feat|^fix|^docs|^style|^refactor|^test|^chore|^perf|^ci|^build|^revert"
```

### Large PR Rule
PR title must match the regex (e.g. conventional commits).

This rule helps maintain code quality by flagging large changes:
**Description length**

```yaml
rules:
- description: Warn about large pull requests that may be difficult to review
enabled: true
severity: medium
event_types: [pull_request]
parameters:
max_files: 20
max_lines: 500
parameters:
min_description_length: 50
```

## Parameter Types
PR body length must be ≥ N characters.

**Required labels**

```yaml
parameters:
required_labels: ["Type/Bug", "Type/Feature", "Status/Review"]
```

### Common Parameters
PR must have all of these labels.

Watchflow supports various parameter types to create flexible and powerful rules:
**Min approvals**

```yaml
parameters:
# Approval requirements
min_approvals: 2
required_teams: ["security-team", "senior-engineers"]
excluded_reviewers: ["author"]
```

# File patterns (glob syntax)
file_patterns: ["**/security/**", "**/auth/**", "*.env*"]
excluded_files: ["docs/**", "*.md", "**/test/**"]
At least N approvals required.

# Time restrictions
restricted_days: [Saturday, Sunday]
restricted_hours: [22, 23, 0, 1, 2, 3, 4, 5, 6]
timezone: "UTC"
**Max PR size (lines)**

# Size limits
max_files: 20
```yaml
parameters:
max_lines: 500
max_deletions: 100
max_additions: 1000

# Branch patterns
protected_branches: ["main", "master", "production"]
excluded_branches: ["feature/*", "hotfix/*"]
```

### Diff-Aware Validators
Total additions + deletions must be ≤ N. The loader also accepts `max_changed_lines` as an alias.

**CODEOWNERS: require owners as reviewers**

Watchflow can now reason about pull-request diffs directly. The following parameter groups plug into diff-aware validators:
```yaml
parameters:
require_code_owner_reviewers: true
```

#### `diff_pattern`
For every file changed, the corresponding CODEOWNERS entries must be in the requested reviewers (users or teams). If CODEOWNERS is missing, the condition skips (no violation).

Use this to require or forbid specific regex patterns inside matched files.
**CODEOWNERS: path must have owner**

```yaml
parameters:
file_patterns:
- "packages/core/src/**/vector-query.ts"
require_patterns:
- "throw\\s+new\\s+Error"
forbidden_patterns:
- "console\\.log"
require_path_has_code_owner: true
```

#### `related_tests`
Every changed path must have at least one owner in CODEOWNERS. If CODEOWNERS is missing, the condition skips.

Ensure core source changes include matching test updates.
**Critical paths / code owners**

```yaml
parameters:
source_patterns:
- "packages/core/src/**"
test_patterns:
- "tests/**"
- "packages/core/tests/**"
min_test_files: 1
critical_owners: [] # or list of path patterns if supported
```

#### `required_field_in_diff`
Changes to critical paths require code-owner review. (See registry for exact semantics.)

Verify that additions to certain files include a text fragment (for example, enforcing `description` on new agents).
**Protected branches**

```yaml
parameters:
file_patterns:
- "packages/core/src/agent/**"
required_text:
- "description"
protected_branches: ["main", "master"]
```

These validators activate automatically when the parameters above are present, so you do not need to declare an `actions` block or manual mapping.
Blocks targeting these branches (e.g. merge without going through PR flow as configured).

## Severity Levels
### Push conditions

### Severity Configuration
**No force push**

```yaml
parameters:
no_force_push: true
```

- **low** - Informational violations, no blocking
- **medium** - Warning-level violations, may block with acknowledgment
- **high** - Critical violations that should block
- **critical** - Emergency violations that always block
Reject force pushes. Typically used with `event_types: ["push"]`.

### Example Severity Usage
### File conditions

**Max file size**

```yaml
rules:
- description: Remind developers to update documentation for significant changes
severity: low
event_types: [pull_request]
parameters:
file_patterns: ["src/**", "lib/**"]
parameters:
max_file_size_mb: 1
```

- description: Warn about large pull requests that may be difficult to review
severity: medium
event_types: [pull_request]
parameters:
max_files: 20
max_lines: 500
No single file in the PR may exceed N MB.

- description: Block security-sensitive changes without proper review
severity: critical
event_types: [pull_request]
parameters:
file_patterns: ["**/security/**", "**/auth/**"]
required_teams: ["security-team"]
**File pattern**

```yaml
parameters:
pattern: "tests/.*\\.py$|test_.*\\.py$"
condition_type: "files_match_pattern"
```

## Event Types
Changed files must (or must not) match the pattern; exact behavior depends on condition.

### Time and deployment

**Allowed hours, days, weekend** — See condition registry and examples in repo for `allowed_hours`, `timezone`, `days`, and deployment-related parameters.

### Supported Events
---

- **pull_request** - PR creation, updates, merges
- **push** - Code pushes to branches
- **deployment** - Deployment events
- **deployment_status** - Deployment status updates
- **issue_comment** - Comments on issues and PRs
## Example rules

### Event-Specific Rules
**Linked issue + PR size + CODEOWNERS reviewers**

```yaml
rules:
# PR-specific rule
- description: All pull requests must be reviewed before merging
event_types: [pull_request]
- description: "PRs must reference a linked issue (e.g. Fixes #123)"
enabled: true
severity: high
event_types: ["pull_request"]
parameters:
min_approvals: 1
require_linked_issue: true

# Deployment-specific rule
- description: Production deployments require explicit approval
event_types: [deployment]
- description: "PR total lines changed must not exceed 500"
enabled: true
severity: medium
event_types: ["pull_request"]
parameters:
environment: "production"
min_approvals: 2
max_lines: 500

# Multi-event rule
- description: Security-sensitive changes require security team review
event_types: [pull_request, push, deployment]
- description: "When a PR modifies paths with CODEOWNERS, those owners must be added as reviewers"
enabled: true
severity: high
event_types: ["pull_request"]
parameters:
file_patterns: ["**/security/**", "**/auth/**"]
required_teams: ["security-team"]
require_code_owner_reviewers: true
```

## Advanced Configuration

### Custom Parameters
**Title pattern + description length**

```yaml
rules:
- description: Configurable approval requirements based on team and branch
- description: "PR titles must follow conventional commit format; descriptions must be at least 50 chars"
enabled: true
severity: high
event_types: [pull_request]
severity: medium
event_types: ["pull_request"]
parameters:
min_approvals: 2
required_teams: ["security", "senior-engineers"]
excluded_branches: ["feature/*"]
title_pattern: "^feat|^fix|^docs|^style|^refactor|^test|^chore"
min_description_length: 50
```

### Environment-Specific Rules
**No force push to main**

```yaml
rules:
- description: Strict rules for production deployments requiring multiple approvals
- description: "No direct pushes to main - all changes must go through PRs"
enabled: true
severity: critical
event_types: [deployment]
parameters:
environment: "production"
min_approvals: 3
required_teams: ["security", "senior-engineers", "product"]

- description: Moderate rules for staging deployments with basic approval requirements
enabled: true
severity: high
event_types: [deployment]
event_types: ["push"]
parameters:
environment: "staging"
min_approvals: 1
required_teams: ["senior-engineers"]
no_force_push: true
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's check if the file exists and read the relevant section
head -210 docs/getting-started/configuration.md | tail -200

Repository: warestack/watchflow

Length of output: 4648


🏁 Script executed:

# Check for markdownlint in the repo and run it on the file
# First, find if there's a markdownlint config
find . -name ".markdownlint*" -o -name "markdownlint*" | head -5

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Try to run markdownlint if available, otherwise check for npm/yarn
npm list markdownlint 2>/dev/null || echo "Checking global or alternative installation"

Repository: warestack/watchflow

Length of output: 138


🏁 Script executed:

# Install and run markdownlint to verify the violations
pip install mdformat mdformat-tables 2>/dev/null || echo "pip install attempt"
cd docs/getting-started && markdownlint configuration.md 2>/dev/null || echo "markdownlint not in PATH"

Repository: warestack/watchflow

Length of output: 1184


🏁 Script executed:

# Let's read the actual file content to analyze it manually
cat -n docs/getting-started/configuration.md | head -250

Repository: warestack/watchflow

Length of output: 8731


🏁 Script executed:

npm install -g markdownlint-cli 2>&1 | tail -5
markdownlint docs/getting-started/configuration.md

Repository: warestack/watchflow

Length of output: 475


🏁 Script executed:

# Also try with node package if global install failed
npm install markdownlint-cli && npx markdownlint docs/getting-started/configuration.md

Repository: warestack/watchflow

Length of output: 5822


Convert bold section headers to markdown headings and fix table column spacing.

Markdownlint flags 14 instances of MD036 (bold text used as headings instead of markdown heading levels) throughout the parameter reference and example sections, and MD060 table column style violations on the rule-structure table separator line. Convert bold labels like **Linked issue** to proper heading levels (e.g., #### Linked issue), and align table pipes with consistent spacing.

🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 14-14: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 14-14: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 29-29: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 38-38: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 47-47: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 56-56: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 65-65: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 74-74: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 83-83: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 92-92: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 101-101: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 110-110: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 121-121: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 132-132: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 141-141: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 159-159: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 185-185: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


[warning] 198-198: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In `@docs/getting-started/configuration.md` around lines 13 - 208, Convert all
bold section headers (e.g., "**Linked issue**", "**Title pattern**",
"**Description length**", "**Required labels**", "**Min approvals**", "**Max PR
size (lines)**", "**CODEOWNERS: require owners as reviewers**", "**CODEOWNERS:
path must have owner**", "**Critical paths / code owners**", "**Protected
branches**", "**No force push**", "**Max file size**", "**File pattern**",
"**Allowed hours, days, weekend**", and the example headings) into proper
markdown headings (suggested level: #### HeaderName) throughout the Parameter
reference and Example rules blocks, and fix the initial rules table header and
separator (the row starting "| Field | Required | Description |" and the
separator "|-------|----------|-------------|") so pipes have consistent
surrounding spaces and the separator uses balanced dashes for each column to
satisfy MD060/MD036.

Comment on lines 104 to 109
# Set rule_id from first condition for acknowledgment lookup
rule_id_val: str | None = None
if conditions:
rid = CONDITION_CLASS_TO_RULE_ID.get(type(conditions[0]))
if rid is not None:
rule_id_val = rid.value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find the definition of AVAILABLE_CONDITIONS and CONDITION_CLASS_TO_RULE_ID
rg -n "AVAILABLE_CONDITIONS\s*=" src/

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Search for CONDITION_CLASS_TO_RULE_ID definition
rg -n "CONDITION_CLASS_TO_RULE_ID" src/ -A 3

Repository: warestack/watchflow

Length of output: 950


🏁 Script executed:

# Check the full context of the file to understand the flow
wc -l src/rules/loaders/github_loader.py

Repository: warestack/watchflow

Length of output: 101


🏁 Script executed:

# Find where rule_id_val is used after it's assigned
rg -n "rule_id_val" src/rules/loaders/github_loader.py -A 2 -B 2

Repository: warestack/watchflow

Length of output: 569


🏁 Script executed:

# Search for acknowledgment lookup logic that uses rule_id
rg -n "acknowledgment" src/ -i -B 3 -A 3 | head -100

Repository: warestack/watchflow

Length of output: 7685


🏁 Script executed:

# See the full AVAILABLE_CONDITIONS definition
sed -n '60,100p' src/rules/registry.py

Repository: warestack/watchflow

Length of output: 1305


🏁 Script executed:

# Check how violations match rules and use rule_id for acknowledgments
rg -n "rule_id" src/core/models.py -B 2 -A 2 | head -60

Repository: warestack/watchflow

Length of output: 715


🏁 Script executed:

# Check the Violation model to understand rule_id usage
rg -n "class Violation" src/ -A 20

Repository: warestack/watchflow

Length of output: 3522


🏁 Script executed:

# Find how acknowledgments are retrieved/matched
rg -n "acknowledgments\[" src/ -B 3 -A 1

Repository: warestack/watchflow

Length of output: 1103


🏁 Script executed:

# See the full get_conditions_for_parameters implementation
sed -n '77,130p' src/rules/registry.py

Repository: warestack/watchflow

Length of output: 1732


🏁 Script executed:

# Check if there are cases where multiple conditions could match the same parameters
rg -n "get_conditions_for_parameters" src/ -B 2 -A 5

Repository: warestack/watchflow

Length of output: 1251


🏁 Script executed:

# Look at actual rule definitions to see if rules typically have multiple conditions
rg -n "conditions:" src/rules/ -A 3 | head -100

Repository: warestack/watchflow

Length of output: 719


🏁 Script executed:

# Check if there are test cases showing expected behavior with multiple conditions
find . -name "*test*.py" -type f | xargs grep -l "get_conditions_for_parameters\|rule_id" 2>/dev/null | head -5

Repository: warestack/watchflow

Length of output: 236


🏁 Script executed:

# Look for RULE_ID_TO_CONDITION to understand the mapping
rg -n "RULE_ID_TO_CONDITION" src/ -B 2 -A 15

Repository: warestack/watchflow

Length of output: 3361


🏁 Script executed:

# Check how violations are actually created and if multiple conditions per rule is normal
rg -n "Violation(" src/ -B 3 -A 3 | head -100

Repository: warestack/watchflow

Length of output: 133


🏁 Script executed:

# Search for test cases with multiple conditions
rg -n "conditions\s*=" tests/ src/agents/ -A 3 | head -80

Repository: warestack/watchflow

Length of output: 1924


🏁 Script executed:

# Check the test acknowledgment file
cat tests/unit/rules/test_acknowledgment.py | head -100

Repository: warestack/watchflow

Length of output: 3803


🏁 Script executed:

# Look for example rules in tests or YAML to see if multiple conditions per rule is normal
find . -name "*.yaml" -o -name "*.yml" | xargs grep -l "conditions" 2>/dev/null | head -3

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Check how violations are created from conditions to understand the expected flow
rg -n "rule_id_val\|rule_id =" src/ -B 2 -A 2

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Check parameter_patterns for conditions to understand if overlap is possible
rg -n "parameter_patterns" src/rules/ -A 2 | head -100

Repository: warestack/watchflow

Length of output: 5896


🏁 Script executed:

# Look for real rule examples with parameters in YAML or tests
find . -type f \( -name "*.yaml" -o -name "*.yml" \) -exec cat {} \; | head -200

Repository: warestack/watchflow

Length of output: 7274


🏁 Script executed:

# Check if conditions ever produce multiple matches in tests
rg -n "conditions" tests/ | grep -i "multiple\|length.*>\|len"

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Look at the Rule model to understand if multiple conditions are intended
cat src/rules/models.py | head -80

Repository: warestack/watchflow

Length of output: 2424


🏁 Script executed:

# Check how violations are generated from rule evaluation
rg -n "Violation\|violation_result" src/agents/engine_agent/ -A 3 -B 1 | head -150

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Search for how rules with multiple parameters are actually used
rg -n "parameters.*=" src/rules/loaders/ -A 3 | head -60

Repository: warestack/watchflow

Length of output: 1509


🏁 Script executed:

# Check if a condition can only match one rule_id
rg -n "class.*Condition" src/rules/conditions/ | wc -l

Repository: warestack/watchflow

Length of output: 65


🏁 Script executed:

# Check if parameter_patterns actually overlap between conditions
python3 << 'EOF'
import re

# Get parameter patterns from the grep output
patterns = {
    "RequiredLabelsCondition": ["required_labels"],
    "TitlePatternCondition": ["title_pattern"],
    "MinDescriptionLengthCondition": ["min_description_length"],
    "RequireLinkedIssueCondition": ["require_linked_issue"],
    "MaxFileSizeCondition": ["max_file_size_mb"],
    "MaxPrLocCondition": ["max_lines"],
    "MinApprovalsCondition": ["min_approvals"],
    "ProtectedBranchesCondition": ["protected_branches"],
    "AuthorTeamCondition": ["team"],
    "CodeOwnersCondition": ["critical_owners"],
    "PathHasCodeOwnerCondition": ["require_path_has_code_owner"],
    "RequireCodeOwnerReviewersCondition": ["require_code_owner_reviewers"],
    "FilePatternCondition": ["pattern", "condition_type"],
    "AllowedHoursCondition": ["allowed_hours", "timezone"],
    "DaysCondition": ["days"],
    "WeekendCondition": [],
    "WorkflowDurationCondition": ["minutes"],
}

# Check for overlaps
all_patterns = {}
overlaps = []
for condition, params in patterns.items():
    for param in params:
        if param in all_patterns:
            overlaps.append(f"{param}: {all_patterns[param]} + {condition}")
        else:
            all_patterns[param] = condition

if overlaps:
    print("Parameter pattern overlaps found:")
    for o in overlaps:
        print(f"  {o}")
else:
    print("No parameter pattern overlaps found - each parameter maps to exactly ONE condition")

print(f"\nTotal unique parameters: {len(all_patterns)}")
print(f"Total conditions: {len(patterns)}")
EOF

Repository: warestack/watchflow

Length of output: 195


🏁 Script executed:

# Verify that a rule typically has only one set of parameters
rg -n "parameters:" tests/ -A 5 | grep -A 3 "^[0-9]*-.*parameters:" | head -60

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Check the actual flow: rule->conditions->violations to confirm rule_id is used correctly
rg -n "rule\.rule_id\|rule_id_val" src/ -B 3 -A 3

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Find which conditions are actually in RULE_ID_TO_CONDITION mapping
python3 << 'EOF'
# From the grep output earlier, RULE_ID_TO_CONDITION maps:
rule_id_to_condition = {
    "REQUIRED_LABELS": "RequiredLabelsCondition",
    "PR_TITLE_PATTERN": "TitlePatternCondition",
    "PR_DESCRIPTION_REQUIRED": "MinDescriptionLengthCondition",
    "FILE_SIZE_LIMIT": "MaxFileSizeCondition",
    "MAX_PR_LOC": "MaxPrLocCondition",
    "REQUIRE_LINKED_ISSUE": "RequireLinkedIssueCondition",
    "PROTECTED_BRANCH_PUSH": "ProtectedBranchesCondition",
    "NO_FORCE_PUSH": "NoForcePushCondition",
    "MIN_PR_APPROVALS": "MinApprovalsCondition",
    "PATH_HAS_CODE_OWNER": "PathHasCodeOwnerCondition",
    "REQUIRE_CODE_OWNER_REVIEWERS": "RequireCodeOwnerReviewersCondition",
}

available_conditions = [
    "RequiredLabelsCondition",
    "TitlePatternCondition",
    "MinDescriptionLengthCondition",
    "RequireLinkedIssueCondition",
    "MaxFileSizeCondition",
    "MaxPrLocCondition",
    "MinApprovalsCondition",
    "ProtectedBranchesCondition",
    "AuthorTeamCondition",
    "CodeOwnersCondition",
    "PathHasCodeOwnerCondition",
    "RequireCodeOwnerReviewersCondition",
    "FilePatternCondition",
    "AllowedHoursCondition",
    "DaysCondition",
    "WeekendCondition",
    "WorkflowDurationCondition",
]

mapped_conditions = set(rule_id_to_condition.values())
available_set = set(available_conditions)

unmapped = available_set - mapped_conditions
print(f"Conditions in AVAILABLE_CONDITIONS but NOT in RULE_ID_TO_CONDITION:")
for c in sorted(unmapped):
    print(f"  - {c}")

print(f"\nConditions in RULE_ID_TO_CONDITION but NOT in AVAILABLE_CONDITIONS:")
extra = mapped_conditions - available_set
for c in sorted(extra):
    print(f"  - {c}")

print(f"\nTotal in AVAILABLE_CONDITIONS: {len(available_conditions)}")
print(f"Total mapped: {len(mapped_conditions)}")
print(f"Unmapped from available: {len(unmapped)}")
EOF

Repository: warestack/watchflow

Length of output: 471


🏁 Script executed:

# Check if rules actually do have multiple different conditions
rg -n "get_conditions_for_parameters" src/ -A 20 | grep -E "matched_conditions|append|return"

Repository: warestack/watchflow

Length of output: 198


🏁 Script executed:

# Look at actual violations to see if they ever have multiple violations from one rule
rg -n "violations" src/agents/engine_agent/ -B 2 -A 2 | head -100

Repository: warestack/watchflow

Length of output: 7490


🏁 Script executed:

# Verify the claim about conditions matching multiple rules - test a specific case
python3 << 'EOF'
# Simulate get_conditions_for_parameters for a rule with multiple parameter keys

parameters_example_1 = {"max_lines": 500}
parameters_example_2 = {"required_labels": ["bug"], "max_lines": 500}  # TWO different parameter keys

# Parameter mappings
param_to_condition = {
    "required_labels": "RequiredLabelsCondition",
    "title_pattern": "TitlePatternCondition",
    "min_description_length": "MinDescriptionLengthCondition",
    "require_linked_issue": "RequireLinkedIssueCondition",
    "max_file_size_mb": "MaxFileSizeCondition",
    "max_lines": "MaxPrLocCondition",
    "min_approvals": "MinApprovalsCondition",
    "protected_branches": "ProtectedBranchesCondition",
    "team": "AuthorTeamCondition",
    "critical_owners": "CodeOwnersCondition",
    "require_path_has_code_owner": "PathHasCodeOwnerCondition",
    "require_code_owner_reviewers": "RequireCodeOwnerReviewersCondition",
    "pattern": "FilePatternCondition",
    "condition_type": "FilePatternCondition",
    "allowed_hours": "AllowedHoursCondition",
    "timezone": "AllowedHoursCondition",
    "days": "DaysCondition",
    "minutes": "WorkflowDurationCondition",
}

def simulate_get_conditions(params):
    matched = []
    for param_key in params.keys():
        if param_key in param_to_condition:
            condition = param_to_condition[param_key]
            if condition not in matched:
                matched.append(condition)
    return matched

result1 = simulate_get_conditions(parameters_example_1)
result2 = simulate_get_conditions(parameters_example_2)

print(f"Example 1 (max_lines only):")
print(f"  Parameters: {list(parameters_example_1.keys())}")
print(f"  Matched conditions: {result1}")
print(f"  Count: {len(result1)}")

print(f"\nExample 2 (required_labels + max_lines):")
print(f"  Parameters: {list(parameters_example_2.keys())}")
print(f"  Matched conditions: {result2}")
print(f"  Count: {len(result2)}")

print(f"\nConclusion: Multiple DIFFERENT conditions can match if a rule has multiple different parameter keys")
EOF

Repository: warestack/watchflow

Length of output: 445


🏁 Script executed:

# Now check if any rules in the codebase have multiple different parameter keys
rg -n "parameters:" tests/ -A 15 | grep -B 5 -A 10 "parameters:.*{" | head -200

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Check if any actual rule definitions have multiple different parameter keys
find . -type f \( -name "*.yaml" -o -name "*.yml" \) -path "*/rules/*" | xargs cat 2>/dev/null | head -500

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Look at test rules to see if multiple parameters per rule exist
rg -n "parameters.*:" tests/unit/agents/test_engine_agent.py -A 10

Repository: warestack/watchflow

Length of output: 1705


🏁 Script executed:

# Check if there's documentation about rule definition expectations
find . -name "*.md" -o -name "README*" | xargs grep -l "rule\|parameter" 2>/dev/null | head -3

Repository: warestack/watchflow

Length of output: 114


Clarify the rule_id derivation strategy when rules have multiple conditions.

When a rule's parameters match multiple different conditions, rule_id is derived from only the first condition in the matched list (lines 107-109). While parameter patterns don't overlap—each maps to exactly one condition—a rule with multiple different parameter keys will match multiple conditions. The order depends on AVAILABLE_CONDITIONS, which is deterministic but semantically arbitrary.

Additionally, only 11 of 17 conditions in AVAILABLE_CONDITIONS are registered in CONDITION_CLASS_TO_RULE_ID, meaning 7 conditions (e.g., AuthorTeamCondition, FilePatternCondition, DaysCondition) have no rule IDs and will always result in rule_id_val = None.

Consider documenting whether rules are expected to have a single parameter (and thus single condition) or whether the first-condition-only strategy is intentional for multi-condition scenarios. If multi-condition rules are possible, consider a more explicit strategy (e.g., collecting all rule IDs or selecting based on a priority).

🤖 Prompt for AI Agents
In `@src/rules/loaders/github_loader.py` around lines 104 - 109, The current logic
in github_loader.py sets rule_id_val from only the first matched condition
(variables: conditions, rule_id_val) using CONDITION_CLASS_TO_RULE_ID, which is
ambiguous when a rule matches multiple conditions and misses mappings for
several AVAILABLE_CONDITIONS; update the logic to explicitly derive rule IDs by
mapping all entries in conditions through CONDITION_CLASS_TO_RULE_ID, collect
non-None IDs into a list (e.g., rule_ids), and then resolve a single rule_id_val
deterministically (either by returning the full list, selecting the
highest-priority ID based on AVAILABLE_CONDITIONS order, or using a defined
priority map), and ensure unmapped conditions are handled/logged so the behavior
is clear and reproducible.

Comment on lines +18 to +27
def __init__(self, queue: TaskQueue | None = None) -> None:
# Map event types to their specific business logic handlers
self.handlers: dict[str, Callable[..., Coroutine[Any, Any, Any]]] = {}
# Use provided queue or default to singleton
self.queue = queue or task_queue

Args:
event_type: The EventType to handle (e.g., EventType.PULL_REQUEST).
handler: An instance of a class that implements the EventHandler interface.
"""
if event_type in self._handlers:
logger.warning(f"Handler for event type {event_type} is being overridden.")
self._handlers[event_type] = handler
logger.info(f"Registered handler for {event_type.name}: {handler.__class__.__name__}")
def register_handler(self, event_type: str, handler: Callable[..., Coroutine[Any, Any, Any]]) -> None:
"""Registers a handler for a specific GitHub event."""
self.handlers[event_type] = handler
logger.debug("handler_registered", event_type=event_type)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize EventType keys in register_handler to prevent missed handlers.
Handlers commonly expose EventType (e.g., src/webhooks/handlers/issue_comment.py Line 17-18). If callers pass EventType, it’s stored as an enum key, while dispatch() looks up the .value string — resulting in handler_not_found and skipped processing.

🔧 Proposed fix
-def register_handler(self, event_type: str, handler: Callable[..., Coroutine[Any, Any, Any]]) -> None:
+def register_handler(
+    self,
+    event_type: str | EventType,
+    handler: Callable[..., Coroutine[Any, Any, Any]],
+) -> None:
     """Registers a handler for a specific GitHub event."""
-    self.handlers[event_type] = handler
-    logger.debug("handler_registered", event_type=event_type)
+    event_key = event_type.value if isinstance(event_type, EventType) else str(event_type)
+    self.handlers[event_key] = handler
+    logger.debug("handler_registered", event_type=event_key)
🤖 Prompt for AI Agents
In `@src/webhooks/dispatcher.py` around lines 18 - 27, register_handler currently
stores the incoming event_type as-is which can be an Enum (EventType) causing
dispatch to miss handlers that lookup by the Enum.value string; update
register_handler to normalize the key by extracting a string value (e.g., use
event_key = event_type.value if hasattr(event_type, "value") else
str(event_type)) and store handlers[event_key] = handler so dispatch (which
expects the string key) will find the handler; reference the register_handler
method and the handlers dict for the change.

- README: escape pipes in title_pattern cell, MD060 separator rows (spaced pipes)
- pull_request processor: Violation.model_validate(v) for type consistency
- github_loader: rule_id from first mapped condition, document multi-condition
- router: WebhookResponse status 'ok'/'ignored' to match Literal contract
- tests: expect status ok/ignored in router and webhook flow tests

Signed-off-by: Dimitris Kargatzis <dkargatzis@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/event_processors/pull_request/processor.py`:
- Around line 198-213: The method _convert_rules_to_new_format is dead code and
should be removed; delete the entire _convert_rules_to_new_format function from
the file so Rule objects are passed intact (as done in process()), ensuring no
other callers reference that function; run tests/static analysis to confirm no
remaining usages of _convert_rules_to_new_format.
- Around line 64-65: Remove the redundant installation_id falsy check and
ValueError raise in src/event_processors/pull_request/processor.py since an
earlier early-return already handles falsy installation_id; locate the block
that does "if not installation_id: raise ValueError(...)" (in the pull request
processing function/class handling installation_id, e.g., the pull request
processor function) and delete it so behavior remains governed by the existing
early return and no duplicate guard remains.

In `@src/webhooks/router.py`:
- Line 60: Wrap the await request.json() call (where payload = cast("dict[str,
Any]", await request.json())) in a try/except that catches json.JSONDecodeError
(imported as JSONDecodeError) and return a 400 JSON response with a clear error
message; keep using the same payload variable on success, and ensure any
JSONDecodeError is logged or included in the response body for debugging instead
of letting it raise a 500.
🧹 Nitpick comments (2)
src/webhooks/router.py (2)

58-59: Move imports to module level.

Importing Any and cast inside the function body is unconventional. These are type-only imports with no runtime cost.

♻️ Suggested refactor
+from typing import Any, cast
+
 import structlog
 from fastapi import APIRouter, Depends, HTTPException, Request

Then remove lines 58-59 from the function body.


65-75: Narrow the exception catch to ValidationError.

Catching bare Exception is overly broad and could mask unexpected errors (e.g., internal bugs) as 400 validation failures. Catching pydantic.ValidationError specifically would be more precise.

♻️ Suggested refactor
+from pydantic import ValidationError
+
 # ... in function body:
     try:
         github_event = GitHubEventModel(**payload)
         logger.info(
             "webhook_validated",
             event_type=event_name,
             repository=github_event.repository.full_name,
             sender=github_event.sender.login,
         )
-    except Exception as e:
+    except ValidationError as e:
         logger.error("webhook_validation_failed", event_type=event_name, error=str(e))
         raise HTTPException(status_code=400, detail="Invalid webhook payload structure") from e
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aab5d42 and 89fc0c4.

📒 Files selected for processing (6)
  • README.md
  • src/event_processors/pull_request/processor.py
  • src/rules/loaders/github_loader.py
  • src/webhooks/router.py
  • tests/integration/webhooks/test_webhook_flow.py
  • tests/unit/webhooks/test_router.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/integration/webhooks/test_webhook_flow.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/rules/loaders/github_loader.py (1)
src/rules/registry.py (2)
  • ConditionRegistry (81-118)
  • get_conditions_for_parameters (90-118)
src/webhooks/router.py (4)
src/core/models.py (3)
  • EventType (91-105)
  • WebhookEvent (116-144)
  • WebhookResponse (108-113)
src/webhooks/auth.py (1)
  • verify_github_signature (23-58)
src/webhooks/dispatcher.py (2)
  • WebhookDispatcher (12-52)
  • dispatch (29-52)
src/webhooks/models.py (2)
  • GitHubEventModel (23-28)
  • WebhookResponse (31-36)
🪛 markdownlint-cli2 (0.20.0)
README.md

[warning] 40-40: Table column count
Expected: 4; Actual: 6; Too many cells, extra data will be missing

(MD056, table-column-count)


[warning] 40-40: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 40-40: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 40-40: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 40-40: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)

🔇 Additional comments (12)
src/rules/loaders/github_loader.py (1)

48-56: Defensive YAML structure validation is solid.

Early returns for invalid top-level structures and non-list rules should prevent downstream parsing errors.

src/webhooks/router.py (4)

1-10: LGTM!

Clean imports with structlog migration. The separation of concerns between core models and webhook-specific models is clear.


13-16: LGTM!

Clean DI pattern for testability.


19-40: LGTM!

Solid implementation with defensive 202 handling for unsupported events. The normalization and structured logging provide good observability.


77-93: LGTM on dispatch logic and response handling.

The status values "ok" and "ignored" are correctly aligned with the WebhookResponse contract. The 202 exception handling appropriately distinguishes unsupported events from actual errors.

src/event_processors/pull_request/processor.py (7)

1-14: LGTM!

Imports are well-organized and appropriate for the processor's functionality.


17-27: LGTM!

Clean class initialization with proper component wiring.


70-104: LGTM!

Good defensive handling for missing rules configuration—creating a neutral check run and posting helpful guidance rather than failing.


118-136: LGTM!

Both previously identified issues are properly addressed:

  • Line 123: Explicit type validation via Violation.model_validate(v)
  • Line 131: Acknowledgment lookup now uses rule_id with fallback to rule_description

143-196: LGTM!

Solid error handling ensures GitHub check runs reflect the actual outcome, and non-critical failures (like posting comments) don't crash the entire processing flow.


215-227: LGTM!

Defensive early return and exception logging without re-raising is appropriate for this non-critical operation.


229-238: LGTM!

Methods correctly delegate to the enricher. The async signature on prepare_webhook_data is likely for interface consistency even though the underlying call is synchronous.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

- pr: remove redundant installation_id check and unused _convert_rules_to_new_format
- webhooks: wrap request.json() in try/except JSONDecodeError, return 400 on invalid payload

Signed-off-by: Dimitris Kargatzis <dkargatzis@gmail.com>
@dkargatzis dkargatzis merged commit 8302582 into warestack:main Jan 31, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants