Skip to content

Improve staged pipeline reliability and runtime gating#215

Merged
ComBba merged 17 commits intomainfrom
feat/staged-pipeline-reliability-gates
Mar 18, 2026
Merged

Improve staged pipeline reliability and runtime gating#215
ComBba merged 17 commits intomainfrom
feat/staged-pipeline-reliability-gates

Conversation

@ComBba
Copy link
Contributor

@ComBba ComBba commented Mar 18, 2026

Summary

  • harden the staged pipeline around contract, build, runtime, and final verdict aggregation so successful local-first runs land as real GO results instead of stale NO-GO verdicts
  • make generation deterministic where the contract already defines the shape (backend files, frontend components, api client) and keep LLM generation focused on page.tsx plus targeted repair paths
  • improve repair and observability with file-level build recovery, staged SSE events, dashboard visibility for local runs, and runtime validation that exercises real API endpoints

Verification

  • cd agent && python -m pytest tests/test_staged_pipeline_nodes.py tests/test_contract_validator.py tests/test_build_error_feedback.py tests/test_code_evaluator.py -q --tb=short
  • cd agent && python -m pytest tests/test_runtime_config.py tests/test_store.py -q --tb=short
  • full staged local-first run: CONTRACT 2/2 -> CODE_EVAL 92.4 -> BUILD PASS -> RUNTIME PASS -> DEPLOY_GATE PASS -> SESSION verdict=GO
  • verified deployed local app and API from the successful run: frontend 200, backend /api/plan 200, backend /api/insights 200

Notes

  • left untracked local document docs/20260318/nutriplan-reliability-proposal.md out of this PR intentionally

Summary by CodeRabbit

릴리스 노트

  • 새로운 기능

    • 프론트엔드 빌드 실패 시 파일 단위 자동 복구(복구 경로 추가)
    • 파이프라인 단계별 진행 이벤트 확장(여러 생성/검증 단계 표시)
    • 제로프롬프트 시작 응답이 SSE 대신 JSON을 반환하도록 변경
  • 개선 사항

    • 빌드 검증에서 실패 파일 목록 및 프론트엔드 전용 실패 플래그 추적 강화
    • LLM 호출에 대한 일관된 재시도 및 응답 API 처리 강화
    • 로컬 런타임 검증의 엔드포인트 POST 검사 추가
    • API 계약 검증 시 라우터 접두사 자동 감지 및 경로 확장
  • 테스트

    • 스테이징 관련 검사 및 제로프롬프트 경로 테스트 보강

ComBba added 15 commits March 19, 2026 04:19
…g gates over legacy heuristics

Staged mode detection via spec_frozen + wiring_validation presence.
Consistency score uses wiring_validation results (0.7) blended with legacy (0.3).
Drops 'deterministic fallback scaffold detected' blocker in staged mode.
Relaxed pass criteria: wiring_passed + match_rate>=70 + runnability>=70.
Adds provenance summary (LLM vs deterministic file counts) to eval output.
8 new tests, 158 related tests pass, 0 regressions.
…events

- llm.py: use_responses_api for gpt-5.3-codex/gpt-5.4 via model_endpoint_type(),
  max_retries via LangChain built-in, remove fallback model switching
- per_file_code_generator: inject wiring_validation.repair_instructions and
  build_errors into LLM prompt on retry, skip already-generated files unless
  they caused the build failure
- sse.py: register all staged pipeline nodes (api_contract_generator through
  deploy_gate) in NODE_EVENTS for dashboard visibility
- pipeline_runtime.py: emit structured SSE events for spec_freeze, backend_gen,
  frontend_gen, contract_validation, runtime_validation, deploy_gate results
…rompt

Sort frontend specs so non-page files generate first, then extract actual
exports (default/named), Props interface signatures, and api-client functions
from already_generated + foundation files before generating page.tsx.
Inject exact import statements with props signatures and CRITICAL IMPORT RULE
into LLM system message to prevent non-existent export references.
PARALLEL:
- frontend_generator_node: tier-based asyncio.gather for component specs
  (up to VIBEDEPLOY_MAX_PARALLEL_LLM=4 concurrent LLM calls)
- page.tsx always generated last, after all components complete

FILE-LEVEL REPAIR:
- build_validator extracts failing file paths + frontend_only_failure flag
- route_after_build_staged: frontend-only failures route to new
  frontend_file_repairer node instead of full backend+frontend rerun
- frontend_file_repairer_node: regenerates only the specific files
  identified in build errors, then re-runs build_validator
- state: build_failing_files, build_frontend_only_failure, build_errors_full

JSX TRUNCATION GUARD:
- _has_truncated_jsx: detects unclosed tags / truncated files
- _generate_file_with_llm: retries up to 3x on truncation detection

STATE:
- build_failing_files: list of failing file paths
- build_frontend_only_failure: bool for routing decision
- build_errors_full: full error text (3000 chars)
Add concise generation directive on truncation retry: shorter names,
fewer comments, properly closed JSX, explicit end-of-file requirements.
…n max_tokens limit

Root cause: LangChain ChatOpenAI(max_tokens=12000) passes max_output_tokens=12000
to Responses API, truncating large components mid-file.

Fix: _generate_file_via_responses_api() calls openai.AsyncOpenAI.responses.create()
directly without max_output_tokens. Uses reasoning={'effort':'medium'} per official
code generation best practices. LangChain path remains as fallback.

Model config: gpt-5.4 for frontend (official recommendation for one-shot UI gen)
The tag-counting approach (open < - close </ - self-close />) was
producing false positives since JSX nesting naturally has many opens.
Replace with last-line-ending check: valid if line ends with ; } ) > />
or is a comment. This matches actual truncation patterns accurately.
…enerate_file_with_llm

Responses API path handles retries internally. LangChain fallback just uses
ainvoke_with_retry(max_attempts=3) directly without truncation re-detection
loop that could cause infinite cycling.
All non-page frontend files already had stable templates:
- Hero, WorkspacePanel, StatePanel, InsightPanel, CollectionPanel
- generic component fallback
- api-client and config/style files

Use deterministic generation for all component/api/config/style files and keep
LLM only for page.tsx. This removes the main truncation hotspot and cuts most
frontend LLM calls to one.
- build_validator: infer src/app/page.tsx for prerender '/' and src_app_page errors
- frontend_file_repairer: when repairing page.tsx after build failure, use
  deterministic _page_template instead of another LLM attempt

This keeps the fast staged pipeline while ensuring page-level build/runtime
errors converge to a known-good template instead of looping through more
fragile LLM retries.
Previously session.completed always used council scoring.decision, which stays
NO-GO when skip_council=true. For staged local-first runs that successfully pass
code_eval + build + local_runtime + deploy_gate + deployer(local_running), the
final meeting result should be GO.

Also fall back score to code_eval.match_rate when council scoring is absent.
When showcase apps exist, dashboard snapshot was replacing the full meeting list
with only showcase-matched deployed apps. This hid successful local_running runs
like staged verification threads.

Keep unmatched local_running/local_error meetings in reconciled results and add
limit query support to /dashboard/results and /dashboard/brainstorms.
- contract_validator now applies include_router(prefix=...) when comparing
  FastAPI decorators against OpenAPI paths
- deterministic backend route template strips /api from decorators because
  main.py already mounts router at prefix=/api
- local_runtime_validator now calls contract POST endpoints (/api/plan,
  /api/insights, etc.) instead of only checking /health

This closes the gap where a build could pass and local runtime could still
report success even though the generated API routes were actually 404ing.
Route/service/api backend files are contract-driven and already have stable
templates. Generating them via LLM caused path drift (/api/api/...) and runtime
instability.

Use deterministic generation for backend files so runtime/API validation can
converge reliably. Keep LLM only for page.tsx.
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

Walkthrough

프론트엔드 파일 단위 복구 노드를 도입하고 빌드 검증에서 실패 파일을 추출해 복구 경로를 추가했으며, LLM 호출에 대한 재시도·Responses API 라우팅을 일관화하고 단계별(staged) 파이프라인 로직과 이벤트 스트리밍을 확장했습니다.

Changes

Cohort / File(s) Summary
Graph & per-file repair
agent/graph.py, agent/nodes/per_file_code_generator.py
frontend_file_repairer_node 추가 및 스테이지된 그래프에 연결, 빌드 실패 파일 기반의 프론트엔드 파일 복구 흐름(병렬 LLM 호출·결정론적 폴백) 구현.
LLM routing & retries
agent/llm.py
get_llmmax_retries 도입 및 use_responses_api 결정 로직 추가; 레지스트리/직접 OpenAI/DO Inference 경로에 재시도·use_responses 인자 전달 일관화.
Build validation & state
agent/nodes/build_validator.py, agent/state.py
빌드 stderr에서 실패 파일 추출 유틸(_extract_failing_file_paths), tsc 검사(_run_tsc_check) 추가; 상태에 build_errors_full, build_failing_files, build_frontend_only_failure 필드 확장 및 이벤트 페이로드에 포함.
Staged pipeline gating
agent/nodes/code_evaluator.py, agent/tests/test_code_evaluator.py
스테이지 감지·일관성 보정·블로커 필터링 헬퍼(_is_staged_pipeline, _staged_consistency, _staged_quality_blockers) 추가 및 평가·이벤트에 staged 메타데이터 포함.
Contract router prefixes
agent/nodes/contract_validator.py
extract_router_prefixesapply_router_prefixes 추가하여 main.py의 router 접두사를 엔드포인트에 적용.
Local runtime checks
agent/nodes/local_runtime_validator.py
_http_json 추가, API 계약이 있으면 POST 검사 시행 및 실패 엔드포인트 기록.
Pipeline runtime & SSE
agent/pipeline_runtime.py, agent/sse.py
파이프라인 성공 판정에 빌드/런타임/배포/코드 평가 결과 통합, 새 이벤트들(spec_freeze_gate, backend_generator, frontend_generator, frontend_file_repairer 등) SSE 이벤트 맵에 추가 및 스트리밍 이벤트 확장.
Server & API parsing
agent/server.py, web/src/lib/zero-prompt-api.ts, agent/tests/test_zp_routes.py
대시보드 엔드포인트에 limit 파라미터 추가, zero-prompt 시작동작 조건화·응답 파싱 강화(parseStartSessionResponse), 테스트를 JSON 응답 기반으로 조정.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BuildValidator
    participant FileRepairer
    participant BackendGenerator
    participant LLMProvider

    Client->>BuildValidator: 요청(build validation)
    BuildValidator->>BuildValidator: 실패 파일 추출
    alt frontend_only failures & attempts <= 3
        BuildValidator->>FileRepairer: route failing files
        FileRepairer->>FileRepairer: map files → specs
        FileRepairer->>LLMProvider: 요청(repair, use_responses_api?, max_retries)
        LLMProvider-->>FileRepairer: 생성된 코드 / 실패
        FileRepairer->>FileRepairer: deterministic 폴백(필요 시)
        FileRepairer-->>Client: 복구된 파일 반환
    else other failures or max attempts
        BuildValidator->>BackendGenerator: route to backend generator
        BackendGenerator->>LLMProvider: 요청(generation, max_retries)
        LLMProvider-->>BackendGenerator: 생성된 코드
        BackendGenerator-->>Client: 생성된 파일 반환
    end
Loading
sequenceDiagram
    participant Caller
    participant GetLLM
    participant ProviderRegistry
    participant OpenAIAPI
    participant DOInference

    Caller->>GetLLM: 요청(get_llm, model, max_retries, use_responses_api)
    GetLLM->>GetLLM: 라우팅 결정
    alt registry available
        GetLLM->>ProviderRegistry: 요청(timeout, max_retries)
        ProviderRegistry->>OpenAIAPI: 위임(timeout, max_retries)
        OpenAIAPI-->>ProviderRegistry: 응답
        ProviderRegistry-->>GetLLM: LLM 인스턴스
    else use_responses_api path
        GetLLM->>OpenAIAPI: 직접 호출(use_responses_api, max_retries)
        OpenAIAPI-->>GetLLM: LLM 인스턴스
    else DO Inference
        GetLLM->>DOInference: 호출(max_retries, use_responses_api)
        DOInference-->>GetLLM: LLM 인스턴스
    end
    GetLLM-->>Caller: LLM 인스턴스 반환
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PR #209: 스테이지드 파이프라인 및 per-file 생성 관련 변경을 도입한 PR — frontend_file_repairer 연결과 per-file 생성 확장의 직접 연관.
  • PR #85: agent/llm.py의 get_llm 라우팅/구성 변경을 다룬 PR — 본 PR의 max_retries 및 responses API 라우팅과 관련.
  • PR #206: SSE 이벤트 확장 및 파이프라인 이벤트 표면화 관련 PR — NODE_EVENTS 확장과 연관 가능.

Poem

🐰 파일들 구해냈다 빛나는 키보드,
실패한 줄 찾아 한 발짝 더 뛰네.
재시도 돌리고, 답은 또 묻네,
단계별 길 따라 복구를 품네.
작은 당근으로 축하 춤을 추네. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the primary objective: improving reliability and runtime gating for the staged pipeline, which aligns with the main changes across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/staged-pipeline-reliability-gates
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the staged pipeline's robustness and feedback mechanisms. It introduces targeted repair capabilities for frontend build issues, enhances the reliability of LLM-driven code generation through retries and parallel processing, and improves the accuracy of both contract and runtime validations. The changes also provide more granular observability into the pipeline's execution and better integration of local development runs into the dashboard, ultimately leading to a more dependable and transparent development workflow.

Highlights

  • Enhanced Staged Pipeline Reliability: Introduced a new frontend_file_repairer_node into the agent graph, allowing for targeted repair of frontend files after build failures, and updated routing logic to leverage this new repair mechanism. This hardens the pipeline against contract, build, and runtime issues, ensuring successful local-first runs are accurately reflected.
  • Improved LLM Call Robustness and Efficiency: Implemented configurable max_retries for LLM calls to handle transient errors more gracefully. The per_file_code_generator now supports parallel LLM generation and includes retry logic for truncated JSX output, enhancing the reliability and speed of code generation.
  • Advanced Build and Contract Validation: Added functionality to build_validator to extract specific failing file paths from build errors and perform tsc checks for frontend code. The contract_validator was enhanced to correctly handle FastAPI router prefixes, improving the accuracy of API endpoint detection and validation.
  • Staged Pipeline-Specific Code Evaluation: Modified the code_evaluator to incorporate distinct logic for the 'staged pipeline' mode, including adjusted consistency scoring, selective blocker filtering (e.g., allowing deterministic fallbacks), and detailed provenance tracking for code generation, providing a more nuanced assessment of code quality in this context.
  • Real-time Runtime API Endpoint Validation: The local_runtime_validator now performs actual HTTP POST requests to API endpoints defined in the contract, verifying their functionality during local runtime, which significantly improves the depth of runtime gating.
  • Enhanced Observability with Staged SSE Events: Expanded the Server-Sent Events (SSE) stream to include detailed events for various stages of the staged pipeline, such as spec_freeze.result, backend_gen.complete, contract_validation.result, and runtime_validation.result, offering better real-time feedback on pipeline progress.
  • Improved Dashboard Visibility for Local Runs: Updated the dashboard's reconciliation logic to include local-running applications in the showcase meetings, and added a limit parameter to dashboard result endpoints, making it easier to monitor and review local development progress.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the staged pipeline's reliability and observability. Key changes include adding a file-level repair loop for frontend build failures, making code generation more deterministic, and enhancing runtime validation. The introduction of a frontend_file_repairer node and more granular SSE events for the staged pipeline are excellent additions for robustness and monitoring. My review identified a few areas for improvement, including a duplicate regular expression, a redundant conditional check, some dead code, and a bug where available component exports are not being added to the LLM prompt, which could impact code generation quality.

sig = props_map.get(n, "")
props_note = f" // props: {sig}" if sig else ""
export_lines.append(f' import {{ {n} }} from "{module}";{props_note}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The export_lines variable is calculated to list available component exports, but it's never used. It seems this information was intended to be added to the prompt to guide the LLM in generating correct import statements for page components. Without it, the LLM might generate incorrect imports. You should add this information to the prompt to improve generation quality.

Suggested change
if export_lines:
prompt += "\n\n## Available Component Exports\n" + "\n".join(export_lines)

Comment on lines +42 to +43
re.compile(r"\./?(src/[^\s:>]+\.[a-z]{2,4})"),
re.compile(r"\./?(src/[^\s:>]+\.[a-z]{2,4})"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This list contains a duplicate regular expression pattern for extracting file paths. Line 43 is identical to line 42. Removing the duplicate will make the code cleaner and prevent potential maintenance issues.

Suggested change
re.compile(r"\./?(src/[^\s:>]+\.[a-z]{2,4})"),
re.compile(r"\./?(src/[^\s:>]+\.[a-z]{2,4})"),
re.compile(r"\./?(src/[^\s:>]+\.[a-z]{2,4})"),

for route in routes:
path = str(route.get("path") or "")
method = str(route.get("method") or "GET")
if path.startswith("/") and path.startswith("/api"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition path.startswith("/") is redundant here because path.startswith("/api") already implies it. Simplifying the condition will improve readability.

Suggested change
if path.startswith("/") and path.startswith("/api"):
if path.startswith("/api"):

Comment on lines +168 to +207
async def _generate_tier_parallel(
specs: list,
context: dict,
code_store: dict,
warnings: list,
file_type_filter: set[str],
is_frontend: bool,
) -> None:
model_key = "code_gen_frontend" if is_frontend else "code_gen_backend"
model = MODEL_CONFIG.get(model_key, MODEL_CONFIG["code_gen"])
semaphore = asyncio.Semaphore(_MAX_PARALLEL_LLM)

async def _generate_one(spec) -> tuple[str, dict[str, str]]:
async with semaphore:
if _use_llm_per_file_generation() and spec.file_type in file_type_filter:
if not llm_credentials_available(model):
warnings.append(f"per_file_llm_unavailable:{model}")
return spec.path, _generate_file_from_spec(spec, context)
try:
content = await _generate_file_with_llm(spec, context)
route = llm_auth_route_for_model(model) or "unknown"
target = "frontend" if is_frontend else "backend"
warnings.append(f"per_file_{target}_llm_used:{model}:{route}")
return spec.path, {spec.path: content}
except Exception as exc:
target = "frontend" if is_frontend else "backend"
logger.warning("[PER_FILE_LLM] %s fallback for %s: %s", target, spec.path, str(exc)[:200])
warnings.append(f"per_file_{target}_llm_fallback:{spec.path}")
return spec.path, _generate_file_from_spec(spec, context)
else:
try:
return spec.path, _generate_file_from_spec(spec, context)
except Exception:
return spec.path, _generate_file_from_spec(spec, context)

results = await asyncio.gather(*[_generate_one(spec) for spec in specs])
for _, generated in results:
code_store.update(generated)
context["already_generated"].update(generated)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _generate_tier_parallel function appears to be unused within the file. If it's not being called from other parts of the codebase, it should be removed to eliminate dead code.

Revert /zero-prompt/start to emit a short SSE stream again so route tests and
consumers receive zp.session.start consistently. Update the web client to accept
both SSE and JSON payloads when starting a session, and skip launching the ZP
background pipeline during test-mode requests to keep route tests deterministic.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
agent/llm.py (1)

435-440: ⚠️ Potential issue | 🟠 Major

레지스트리 경로에서 max_retries 파라미터가 누락되고 있습니다.

직접 OpenAI/DO 경로(llm.py 452, 465)는 max_retries=effective_retries를 전달하지만, 레지스트리 경로(llm.py 435)는 timeout만 전달합니다. effective_retries가 계산되고 있음에도 불구하고 registry.get_llm 호출에 포함되지 않습니다.

openai_adapter와 anthropic_adapter 모두 **kwargs를 통해 max_retries를 받지 않으므로, 레지스트리 경로로 라우팅되면 재시도 정책이 적용되지 않아 모델별 동작이 달라집니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/llm.py` around lines 435 - 440, The registry path is missing the
max_retries parameter so retry policies aren't applied; update the
registry.get_llm call to pass max_retries=effective_retries (i.e., change
registry.get_llm(canonical, temperature=temperature,
max_tokens=effective_max_tokens, timeout=effective_timeout) to include
max_retries=effective_retries) so that effective_retries computed earlier is
honored for registry-routed LLMs; this aligns behavior with the direct OpenAI/DO
branches and ensures openai_adapter and anthropic_adapter receive the retry
setting via kwargs.
agent/server.py (1)

1248-1286: ⚠️ Potential issue | 🔴 Critical

/zero-prompt/start 응답을 SSE로 바꾸면 현재 웹 클라이언트가 시작 자체를 못 합니다.

web/src/hooks/use-zero-prompt.ts:183-230startSession() 결과를 JSON 세션 객체로 받아 normalizeSession() 에 넘기고, 같은 파일 20-56행은 session_id/cards/status 가 있는 객체를 전제로 합니다. 여기서 StreamingResponse 를 반환하면 세션 초기화가 깨지고, 이 스트림도 시작 이벤트 2개 후 바로 닫혀 /zero-prompt/events 의 대체가 되지 않습니다. 기본 응답은 기존 JSON payload 를 유지하고, 스트리밍이 필요하면 별도 SSE 엔드포인트(이미 존재함)나 Accept: text/event-stream 분기로 분리하는 편이 안전합니다.

💡 최소 수정안
`@app.post`("/api/zero-prompt/start")
`@app.post`("/zero-prompt/start")
async def zero_prompt_start(request: ZPStartRequest):
    orch = _get_zp_orchestrator()
    session, _start_event = orch.create_session(goal=request.goal)
    session_id = session.session_id
    goal = request.goal or 5

    if not _test_api_enabled():
        asyncio.create_task(_run_zp_pipeline(orch, session_id, goal))
    push_zp_event(
        {"type": "zp.session.start", "session_id": session_id, "goal_go_cards": goal, "session_status": session.status}
    )
    push_zp_event({"type": "zp.pipeline.started", "session_id": session_id, "goal": goal})

-    async def event_stream() -> AsyncGenerator[str, None]:
-        yield _sse(
-            "zp.session.start",
-            {
-                "type": "zp.session.start",
-                "session_id": session_id,
-                "goal_go_cards": goal,
-                "session_status": session.status,
-            },
-        )
-        yield _sse(
-            "zp.pipeline.started",
-            {"type": "zp.pipeline.started", "session_id": session_id, "goal": goal},
-        )
-
-    return StreamingResponse(
-        event_stream(),
-        media_type="text/event-stream",
-        headers={
-            "Cache-Control": "no-cache",
-            "Connection": "keep-alive",
-            "X-Accel-Buffering": "no",
-        },
-    )
+    return session.model_dump()
🧹 Nitpick comments (1)
web/src/lib/zero-prompt-api.ts (1)

7-7: JSON 파싱에 대한 오류 처리 및 검증 누락

"{" 로 시작하지만 유효하지 않은 JSON인 경우 JSON.parse가 명확하지 않은 오류를 발생시킵니다. 또한, 파싱된 객체가 필수 필드(session_id, status, cards)를 포함하는지 검증하지 않습니다.

🛡️ 오류 처리 및 검증 추가 제안
-  if (trimmed.startsWith("{")) return JSON.parse(trimmed) as ZPSession;
+  if (trimmed.startsWith("{")) {
+    try {
+      const parsed = JSON.parse(trimmed);
+      if (!parsed.session_id || !parsed.status) {
+        throw new Error("Invalid session response: missing required fields");
+      }
+      return parsed as ZPSession;
+    } catch (e) {
+      throw new Error(`Failed to parse session JSON: ${e instanceof Error ? e.message : String(e)}`);
+    }
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/lib/zero-prompt-api.ts` at line 7, The current quick-parse branch "if
(trimmed.startsWith("{")) return JSON.parse(trimmed) as ZPSession;" can throw on
invalid JSON and doesn't validate required fields; wrap the parse in a try/catch
to catch JSON.parse errors and return/throw a controlled error, then validate
the parsed object (the ZPSession) contains the required keys (session_id,
status, cards) with expected types/structure before returning it; reference the
local variable trimmed, the ZPSession type and JSON.parse when adding the
try/catch and add explicit checks for session_id, status and cards, returning a
clear error or fallback when validation fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent/nodes/build_validator.py`:
- Around line 395-399: The new top-level build state keys (build_errors_full,
build_failing_files, build_frontend_only_failure) are only being set on failure
paths, leaving stale values on success/skip paths; update the state emission in
the function that prepares the build-state fragment (the block that currently
sets "build_errors_full": combined_stderr[:3000], "build_failing_files":
failing_paths, "build_frontend_only_failure": frontend_only_failure, and
"build_attempt_count") to explicitly overwrite those keys with empty/falsey
values on non-failure branches (e.g., set build_errors_full="",
build_failing_files=[], build_frontend_only_failure=False) so reducer-friendly
merges won't retain old data; keep build_attempt_count logic as-is and ensure
the same fragment shape is returned for all outcomes so merge_dicts/reducer
merges behave predictably.

In `@agent/nodes/contract_validator.py`:
- Around line 61-76: The current apply_router_prefixes function incorrectly
special-cases paths starting with "/api"; instead, remove the hardcoded "/api"
check and treat a route as already prefixed if its path starts with any of the
provided prefixes (normalize prefixes to begin with "/" and compare against path
prefixes), so that routes already containing one of the configured prefixes are
left unchanged; ensure prefix normalization (leading slash, trim trailing slash
for comparisons) and keep the rest of the expansion logic (building full_path
from prefix + path) and the final return behavior (expanded if non-empty,
otherwise original routes).

In `@agent/nodes/local_runtime_validator.py`:
- Around line 103-114: The POST validator is sending a fixed payload and not
honoring OpenAPI path parameters or requestBody schema, causing false failures;
update the loop that iterates over paths (the block using `for endpoint, methods
in list(paths.items())[:3]:`) to: detect and substitute any path parameters in
`endpoint` with dummy safe values (e.g., "test" or "1"); inspect the OpenAPI
operation object for "requestBody" and the schema for "application/json" (or
form data) and construct a minimal valid payload matching required properties
instead of the fixed `{"query":"test","preferences":"test"}` (and keep the
special-case for "insight" only if that matches the schema); then call
`_http_json` with the constructed URL and payload and keep appending failures to
`errors` as before (`post_ok, post_detail = await asyncio.to_thread(_http_json,
...)`) so the validator skips or correctly tests endpoints with path params and
proper request schemas.

In `@agent/nodes/per_file_code_generator.py`:
- Around line 298-320: The code inside the target == "frontend" &&
_is_page_file(spec.path) block builds import hints in export_lines from
available_exports but never injects them into the prompt/context, so the import
constraint is not applied; after building export_lines (use the
available_exports, defaults, named, props_map logic already present), join them
into a single string (with a brief header comment) and append or merge that
string into the same prompt/context payload used later to generate the page (the
variable that holds the prompt or the context passed to the generator), ensuring
the import hints are included when _is_page_file(spec.path) is true.
- Around line 1195-1198: The _is_page_file function misclassifies files because
it checks for substrings; change it to inspect only the final filename (use
os.path.basename or Path(path).name) and return True only when the basename
exactly equals "page.tsx" or "page.ts" (after normalizing slashes), so files
like "homepage.tsx" won't be treated as page files; update the _is_page_file
implementation to use the basename comparison accordingly.

In `@agent/pipeline_runtime.py`:
- Around line 203-205: The current fallback uses "if not score" which treats 0
as missing; change the logic around the score variable so you only fallback when
final_score is absent or None (not when it is 0). Retrieve score from scoring
via scoring.get("final_score") and then replace the "if not score" check with a
strict None/type check (e.g., "if score is None and
isinstance(code_eval_result.get('match_rate'), (int, float))") before assigning
the match_rate; update references to scoring, score, and code_eval_result
accordingly.
- Around line 199-201: The current logic unconditionally sets verdict = "GO"
whenever pipeline_succeeded is true, which can override a failing/hard-gate
decision; change it to first map the raw decision (use
verdict_map.get(decision_raw, "NO-GO") into a local variable like
mapped_verdict) and only set verdict = "GO" if pipeline_succeeded is true AND
mapped_verdict == "GO" (otherwise keep mapped_verdict). Update the code around
verdict_map, decision_raw and pipeline_succeeded so hard-gate or scoring
failures (scoring.decision / decision_raw) cannot be overridden by
pipeline_succeeded.

In `@web/src/lib/zero-prompt-api.ts`:
- Around line 9-22: Wrap the JSON.parse call inside the loop (the invocation
using JSON.parse(line.slice(6))) with a try-catch so malformed SSE lines are
skipped/logged instead of throwing; continue the loop on parse failure. Also
update the ZPSession interface to include goal_go_cards: number (and remove
build_queue and active_build from the returned object or add them to the
interface only if they will be used) so the returned object shape matches the
ZPSession type used elsewhere (see use-zero-prompt.ts for consumers).

---

Outside diff comments:
In `@agent/llm.py`:
- Around line 435-440: The registry path is missing the max_retries parameter so
retry policies aren't applied; update the registry.get_llm call to pass
max_retries=effective_retries (i.e., change registry.get_llm(canonical,
temperature=temperature, max_tokens=effective_max_tokens,
timeout=effective_timeout) to include max_retries=effective_retries) so that
effective_retries computed earlier is honored for registry-routed LLMs; this
aligns behavior with the direct OpenAI/DO branches and ensures openai_adapter
and anthropic_adapter receive the retry setting via kwargs.

---

Nitpick comments:
In `@web/src/lib/zero-prompt-api.ts`:
- Line 7: The current quick-parse branch "if (trimmed.startsWith("{")) return
JSON.parse(trimmed) as ZPSession;" can throw on invalid JSON and doesn't
validate required fields; wrap the parse in a try/catch to catch JSON.parse
errors and return/throw a controlled error, then validate the parsed object (the
ZPSession) contains the required keys (session_id, status, cards) with expected
types/structure before returning it; reference the local variable trimmed, the
ZPSession type and JSON.parse when adding the try/catch and add explicit checks
for session_id, status and cards, returning a clear error or fallback when
validation fails.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 83fab7fd-9b16-43d1-b046-b2d68433d8c1

📥 Commits

Reviewing files that changed from the base of the PR and between 3be9d66 and 3479732.

📒 Files selected for processing (13)
  • agent/graph.py
  • agent/llm.py
  • agent/nodes/build_validator.py
  • agent/nodes/code_evaluator.py
  • agent/nodes/contract_validator.py
  • agent/nodes/local_runtime_validator.py
  • agent/nodes/per_file_code_generator.py
  • agent/pipeline_runtime.py
  • agent/server.py
  • agent/sse.py
  • agent/state.py
  • agent/tests/test_code_evaluator.py
  • web/src/lib/zero-prompt-api.ts

Comment on lines +39 to +56
def _extract_failing_file_paths(error_text: str) -> list[str]:
paths: list[str] = []
patterns = [
re.compile(r"\./?(src/[^\s:>]+\.[a-z]{2,4})"),
re.compile(r"\./?(src/[^\s:>]+\.[a-z]{2,4})"),
re.compile(r"Module not found.*['\"](@/[^'\"]+)['\"]"),
re.compile(r"Export\s+\w+\s+doesn't exist.*['\"](@/[^'\"]+)['\"]"),
]
for pattern in patterns:
for match in pattern.finditer(error_text):
p = match.group(1)
if p not in paths:
paths.append(p)
lowered = error_text.lower()
if 'prerendering page "/"' in lowered or "src_app_page" in lowered:
if "src/app/page.tsx" not in paths:
paths.append("src/app/page.tsx")
return paths[:5]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

build_failing_files 가 흔한 frontend-only 실패를 복구 경로로 제대로 넘기지 못합니다.

여기서 추출한 @/… alias 는 repo path(src/...)로 정규화되지 않고, design 오류는 basename만 남아서 failing_paths 가 비거나 매칭 불가 값으로 남습니다. 그런데 agent/nodes/per_file_code_generator.py:620-647 는 이 리스트로 spec 을 찾고, agent/graph.py:88-99 는 non-empty build_failing_files 일 때만 frontend_file_repairer 로 보냅니다. 그래서 Module not found '@/…' 나 design-only failure 같은 케이스에서 targeted repair 가 바로 빠집니다. alias→repo path 정규화와 design 체크의 원본 filepath 보존이 필요합니다.

Also applies to: 367-399

Comment on lines +395 to +399
"build_errors_full": combined_stderr[:3000],
"build_repair_prompt": repair_prompt,
"build_attempt_count": state.get("build_attempt_count", 0) + 1,
"build_failing_files": failing_paths,
"build_frontend_only_failure": frontend_only_failure,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

새 top-level build 상태는 실패 시에만 쓰면 stale 값이 남습니다.

build_errors_full, build_failing_files, build_frontend_only_failure 를 여기서만 채우면 다음 성공/skip/다른 실패에서 이전 값이 merge 후 그대로 남습니다. 그러면 이후 라우팅과 대시보드가 오래된 frontend-only failure 를 계속 읽을 수 있으니, 비실패 경로에서도 빈 값으로 명시적으로 덮어써 주세요. As per coding guidelines, Return state fragments keyed for reducer-friendly merges; graph state uses merge_dicts for council_analysis and scoring.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/nodes/build_validator.py` around lines 395 - 399, The new top-level
build state keys (build_errors_full, build_failing_files,
build_frontend_only_failure) are only being set on failure paths, leaving stale
values on success/skip paths; update the state emission in the function that
prepares the build-state fragment (the block that currently sets
"build_errors_full": combined_stderr[:3000], "build_failing_files":
failing_paths, "build_frontend_only_failure": frontend_only_failure, and
"build_attempt_count") to explicitly overwrite those keys with empty/falsey
values on non-failure branches (e.g., set build_errors_full="",
build_failing_files=[], build_frontend_only_failure=False) so reducer-friendly
merges won't retain old data; keep build_attempt_count logic as-is and ensure
the same fragment shape is returned for all outcomes so merge_dicts/reducer
merges behave predictably.

Comment on lines +61 to +76
def apply_router_prefixes(routes: list[dict], prefixes: list[str]) -> list[dict]:
if not prefixes:
return routes
expanded: list[dict] = []
for route in routes:
path = str(route.get("path") or "")
method = str(route.get("method") or "GET")
if path.startswith("/") and path.startswith("/api"):
expanded.append({"method": method, "path": path})
continue
for prefix in prefixes:
if not prefix.startswith("/"):
prefix = "/" + prefix
full_path = prefix.rstrip("/") + (path if path.startswith("/") else f"/{path}")
expanded.append({"method": method, "path": full_path})
return expanded or routes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

/api 하드코딩 예외로 prefix 처리 로직이 취약합니다.

prefix 우회 판단이 고정 문자열(/api)에 묶여 있어, 다른 prefix를 쓰는 계약에서 잘못된 엔드포인트 매칭이 발생할 수 있습니다.

🔧 제안 수정안
 def apply_router_prefixes(routes: list[dict], prefixes: list[str]) -> list[dict]:
     if not prefixes:
         return routes
+    normalized_prefixes = [p if p.startswith("/") else f"/{p}" for p in prefixes]
     expanded: list[dict] = []
     for route in routes:
         path = str(route.get("path") or "")
         method = str(route.get("method") or "GET")
-        if path.startswith("/") and path.startswith("/api"):
+        if any(path == p or path.startswith(f"{p.rstrip('/')}/") for p in normalized_prefixes):
             expanded.append({"method": method, "path": path})
             continue
-        for prefix in prefixes:
-            if not prefix.startswith("/"):
-                prefix = "/" + prefix
+        for prefix in normalized_prefixes:
             full_path = prefix.rstrip("/") + (path if path.startswith("/") else f"/{path}")
             expanded.append({"method": method, "path": full_path})
     return expanded or routes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/nodes/contract_validator.py` around lines 61 - 76, The current
apply_router_prefixes function incorrectly special-cases paths starting with
"/api"; instead, remove the hardcoded "/api" check and treat a route as already
prefixed if its path starts with any of the provided prefixes (normalize
prefixes to begin with "/" and compare against path prefixes), so that routes
already containing one of the configured prefixes are left unchanged; ensure
prefix normalization (leading slash, trim trailing slash for comparisons) and
keep the rest of the expansion logic (building full_path from prefix + path) and
the final return behavior (expanded if non-empty, otherwise original routes).

Comment on lines +103 to +114
for endpoint, methods in list(paths.items())[:3]:
if not isinstance(methods, dict):
continue
if "post" in methods:
payload = {"query": "test", "preferences": "test"}
if "insight" in endpoint.lower():
payload = {"selection": "test", "context": "test"}
post_ok, post_detail = await asyncio.to_thread(
_http_json, f"http://127.0.0.1:{port}{endpoint}", payload
)
if not post_ok:
errors.append(f"backend_endpoint_failed:{endpoint}:{post_detail}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

OpenAPI 기반 POST 검증이 경로/스키마를 무시해서 정상 API를 오탐 실패시킬 수 있습니다.

현재는 모든 POST에 고정 payload를 보내므로, 경로 파라미터가 있는 URL이나 다른 request schema를 가진 엔드포인트에서 backend_endpoint_failed가 과도하게 발생할 수 있습니다.

🔧 제안 수정안
-                        for endpoint, methods in list(paths.items())[:3]:
+                        for endpoint, methods in list(paths.items())[:3]:
+                            # 경로 파라미터 엔드포인트는 샘플 값 매핑 없이 호출하면 오탐 위험이 큼
+                            if "{" in endpoint or "}" in endpoint:
+                                continue
                             if not isinstance(methods, dict):
                                 continue
-                            if "post" in methods:
-                                payload = {"query": "test", "preferences": "test"}
-                                if "insight" in endpoint.lower():
-                                    payload = {"selection": "test", "context": "test"}
+                            post_op = methods.get("post")
+                            if isinstance(post_op, dict):
+                                # TODO: requestBody schema를 읽어 required 필드 기반 payload 생성
+                                payload = {"query": "test", "preferences": "test"}
+                                if "insight" in endpoint.lower():
+                                    payload = {"selection": "test", "context": "test"}
                                 post_ok, post_detail = await asyncio.to_thread(
                                     _http_json, f"http://127.0.0.1:{port}{endpoint}", payload
                                 )
                                 if not post_ok:
                                     errors.append(f"backend_endpoint_failed:{endpoint}:{post_detail}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/nodes/local_runtime_validator.py` around lines 103 - 114, The POST
validator is sending a fixed payload and not honoring OpenAPI path parameters or
requestBody schema, causing false failures; update the loop that iterates over
paths (the block using `for endpoint, methods in list(paths.items())[:3]:`) to:
detect and substitute any path parameters in `endpoint` with dummy safe values
(e.g., "test" or "1"); inspect the OpenAPI operation object for "requestBody"
and the schema for "application/json" (or form data) and construct a minimal
valid payload matching required properties instead of the fixed
`{"query":"test","preferences":"test"}` (and keep the special-case for "insight"
only if that matches the schema); then call `_http_json` with the constructed
URL and payload and keep appending failures to `errors` as before (`post_ok,
post_detail = await asyncio.to_thread(_http_json, ...)`) so the validator skips
or correctly tests endpoints with path params and proper request schemas.

Comment on lines +298 to +320
if target == "frontend" and _is_page_file(spec.path):
available_exports = context.get("available_exports") or {}
if available_exports:
export_lines = []
for file_path, info in sorted(available_exports.items()):
module = (
"@/" + file_path.replace("src/", "", 1).rsplit(".", 1)[0]
if file_path.startswith("src/")
else file_path.rsplit(".", 1)[0]
)
defaults = info.get("default") or []
named = info.get("named") or []
props_map = info.get("props") or {}
if defaults:
sig = props_map.get(defaults[0], "")
props_note = f" // props: {sig}" if sig else " // default export"
export_lines.append(f' import {defaults[0]} from "{module}";{props_note}')
if named:
for n in named:
sig = props_map.get(n, "")
props_note = f" // props: {sig}" if sig else ""
export_lines.append(f' import {{ {n} }} from "{module}";{props_note}')

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

available_exports를 수집만 하고 프롬프트에 주입하지 않아 import 제약이 실제로 동작하지 않습니다.

export_lines를 만들지만 prompt에 붙이지 않아, page 생성 시 import 오작동 방지 의도가 반영되지 않습니다.

🔧 제안 수정안
     if target == "frontend" and _is_page_file(spec.path):
         available_exports = context.get("available_exports") or {}
         if available_exports:
             export_lines = []
             for file_path, info in sorted(available_exports.items()):
                 module = (
                     "@/" + file_path.replace("src/", "", 1).rsplit(".", 1)[0]
                     if file_path.startswith("src/")
                     else file_path.rsplit(".", 1)[0]
                 )
                 defaults = info.get("default") or []
                 named = info.get("named") or []
                 props_map = info.get("props") or {}
                 if defaults:
                     sig = props_map.get(defaults[0], "")
                     props_note = f"  // props: {sig}" if sig else "  // default export"
                     export_lines.append(f'  import {defaults[0]} from "{module}";{props_note}')
                 if named:
                     for n in named:
                         sig = props_map.get(n, "")
                         props_note = f"  // props: {sig}" if sig else ""
                         export_lines.append(f'  import {{ {n} }} from "{module}";{props_note}')
+            if export_lines:
+                prompt = f"{prompt}\n\n## Available Component Exports\n" + "\n".join(export_lines)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/nodes/per_file_code_generator.py` around lines 298 - 320, The code
inside the target == "frontend" && _is_page_file(spec.path) block builds import
hints in export_lines from available_exports but never injects them into the
prompt/context, so the import constraint is not applied; after building
export_lines (use the available_exports, defaults, named, props_map logic
already present), join them into a single string (with a brief header comment)
and append or merge that string into the same prompt/context payload used later
to generate the page (the variable that holds the prompt or the context passed
to the generator), ensuring the import hints are included when
_is_page_file(spec.path) is true.

Comment on lines +1195 to +1198
def _is_page_file(path: str) -> bool:
normalized = path.replace("\\", "/")
return "page.tsx" in normalized or "page.ts" in normalized

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

페이지 파일 판별이 부분 문자열 기반이라 오탐이 발생합니다.

현재 로직은 homepage.tsx 같은 파일도 페이지로 분류할 수 있어, 생성/수리 경로가 잘못 선택될 수 있습니다.

🔧 제안 수정안
 def _is_page_file(path: str) -> bool:
-    normalized = path.replace("\\", "/")
-    return "page.tsx" in normalized or "page.ts" in normalized
+    normalized = path.replace("\\", "/")
+    file_name = Path(normalized).name.lower()
+    return file_name in {"page.tsx", "page.ts"}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/nodes/per_file_code_generator.py` around lines 1195 - 1198, The
_is_page_file function misclassifies files because it checks for substrings;
change it to inspect only the final filename (use os.path.basename or
Path(path).name) and return True only when the basename exactly equals
"page.tsx" or "page.ts" (after normalizing slashes), so files like
"homepage.tsx" won't be treated as page files; update the _is_page_file
implementation to use the basename comparison accordingly.

Comment on lines +199 to +201
verdict = verdict_map.get(decision_raw, "NO-GO")
if pipeline_succeeded:
verdict = "GO"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

하드 게이트 실패 시에도 최종 verdict가 GO로 남을 수 있습니다.

현재는 성공 케이스만 GO로 덮어쓰고, 실패 케이스는 scoring.decision을 그대로 사용합니다. 이 때문에 게이트가 실패했는데도 세션 verdict가 GO가 될 수 있습니다.

🔧 제안 수정안
-    verdict = verdict_map.get(decision_raw, "NO-GO")
-    if pipeline_succeeded:
-        verdict = "GO"
+    verdict = "GO" if pipeline_succeeded else "NO-GO"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/pipeline_runtime.py` around lines 199 - 201, The current logic
unconditionally sets verdict = "GO" whenever pipeline_succeeded is true, which
can override a failing/hard-gate decision; change it to first map the raw
decision (use verdict_map.get(decision_raw, "NO-GO") into a local variable like
mapped_verdict) and only set verdict = "GO" if pipeline_succeeded is true AND
mapped_verdict == "GO" (otherwise keep mapped_verdict). Update the code around
verdict_map, decision_raw and pipeline_succeeded so hard-gate or scoring
failures (scoring.decision / decision_raw) cannot be overridden by
pipeline_succeeded.

Comment on lines +203 to +205
score = scoring.get("final_score", 0)
if not score and isinstance(code_eval_result.get("match_rate"), (int, float)):
score = code_eval_result.get("match_rate", 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

점수 fallback 조건이 0 유효값을 덮어쓸 수 있습니다.

if not scorefinal_score == 0도 누락으로 간주합니다. 값 존재 여부와 타입을 기준으로 분기하는 편이 안전합니다.

🔧 제안 수정안
-    score = scoring.get("final_score", 0)
-    if not score and isinstance(code_eval_result.get("match_rate"), (int, float)):
-        score = code_eval_result.get("match_rate", 0)
+    score = scoring.get("final_score")
+    if not isinstance(score, (int, float)):
+        score = code_eval_result.get("match_rate", 0) if isinstance(code_eval_result.get("match_rate"), (int, float)) else 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
score = scoring.get("final_score", 0)
if not score and isinstance(code_eval_result.get("match_rate"), (int, float)):
score = code_eval_result.get("match_rate", 0)
score = scoring.get("final_score")
if not isinstance(score, (int, float)):
score = code_eval_result.get("match_rate", 0) if isinstance(code_eval_result.get("match_rate"), (int, float)) else 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/pipeline_runtime.py` around lines 203 - 205, The current fallback uses
"if not score" which treats 0 as missing; change the logic around the score
variable so you only fallback when final_score is absent or None (not when it is
0). Retrieve score from scoring via scoring.get("final_score") and then replace
the "if not score" check with a strict None/type check (e.g., "if score is None
and isinstance(code_eval_result.get('match_rate'), (int, float))") before
assigning the match_rate; update references to scoring, score, and
code_eval_result accordingly.

Comment on lines +9 to +22
for (const line of trimmed.split(/\r?\n/)) {
if (!line.startsWith("data: ")) continue;
const payload = JSON.parse(line.slice(6));
if (payload.type === "zp.session.start") {
return {
session_id: String(payload.session_id),
status: String(payload.session_status || "exploring"),
goal_go_cards: Number(payload.goal_go_cards || 0),
cards: [],
build_queue: [],
active_build: null,
} as ZPSession;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# ZPSession 인터페이스 정의 확인
echo "=== ZPSession TypeScript interface ==="
cat web/src/types/zero-prompt.ts | head -50

echo ""
echo "=== Usage of goal_go_cards, build_queue, active_build in codebase ==="
rg -n "goal_go_cards|build_queue|active_build" web/src/ --type ts

Repository: Two-Weeks-Team/vibeDeploy

Length of output: 2739


JSON 파싱 오류 처리 누락 및 ZPSession 인터페이스 타입 불일치

두 가지 실제 문제가 있습니다:

  1. Line 11의 JSON.parse(line.slice(6))는 잘못된 JSON에 대해 처리되지 않은 예외를 발생시킵니다.

  2. 반환하는 객체의 goal_go_cards, build_queue, active_build 필드가 ZPSession 인터페이스에 정의되어 있지 않습니다. 그러나 goal_go_cardsuse-zero-prompt.ts에서 실제로 사용되고 있으므로, 인터페이스를 업데이트해야 합니다.

♻️ 개선 제안
  1. JSON 파싱에 try-catch 추가:
-    const payload = JSON.parse(line.slice(6));
+    let payload;
+    try {
+      payload = JSON.parse(line.slice(6));
+    } catch {
+      continue;
+    }
  1. ZPSession 인터페이스를 업데이트하여 goal_go_cards 필드 추가 (실제 사용을 반영):
export interface ZPSession {
  session_id: string;
  status: "exploring" | "paused" | "completed";
  goal_go_cards: number;
  cards: ZPCard[];
}

build_queueactive_build는 현재 사용되지 않으므로 제거하거나, 향후 필요시 인터페이스에 추가할 수 있습니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/lib/zero-prompt-api.ts` around lines 9 - 22, Wrap the JSON.parse call
inside the loop (the invocation using JSON.parse(line.slice(6))) with a
try-catch so malformed SSE lines are skipped/logged instead of throwing;
continue the loop on parse failure. Also update the ZPSession interface to
include goal_go_cards: number (and remove build_queue and active_build from the
returned object or add them to the interface only if they will be used) so the
returned object shape matches the ZPSession type used elsewhere (see
use-zero-prompt.ts for consumers).

Keep /zero-prompt/start on JSON session bootstrap for the current merge-ref
contract and update route tests to match the actual API shape used in CI.
Retain the test-mode guard so zero-prompt route tests stay deterministic and do
not launch the background exploration pipeline.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agent/server.py (1)

1256-1261: ⚠️ Potential issue | 🟡 Minor

테스트 모드에서 zp.pipeline.started 이벤트는 조건부로 보내는 편이 일관됩니다.

현재는 파이프라인 실행을 건너뛰어도 zp.pipeline.started가 발행됩니다. 테스트 모드에서는 해당 이벤트를 생략(또는 별도 skipped 이벤트)하는 편이 상태 해석이 명확합니다.

🔧 제안 패치
-    if not _test_api_enabled():
-        asyncio.create_task(_run_zp_pipeline(orch, session_id, goal))
+    if not _test_api_enabled():
+        asyncio.create_task(_run_zp_pipeline(orch, session_id, goal))
+        push_zp_event({"type": "zp.pipeline.started", "session_id": session_id, "goal": goal})
     push_zp_event(
         {"type": "zp.session.start", "session_id": session_id, "goal_go_cards": goal, "session_status": session.status}
     )
-    push_zp_event({"type": "zp.pipeline.started", "session_id": session_id, "goal": goal})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/server.py` around lines 1256 - 1261, The code currently always pushes
the "zp.pipeline.started" event even when pipeline execution is skipped in test
mode; update the logic around _test_api_enabled(), _run_zp_pipeline, and
push_zp_event so that when _test_api_enabled() is true you either omit emitting
"zp.pipeline.started" or emit a distinct "zp.pipeline.skipped" event instead;
specifically, move or conditionally call push_zp_event({"type":
"zp.pipeline.started", ...}) only in the branch where you create the asyncio
task (the path that invokes _run_zp_pipeline), or add an else that pushes
{"type":"zp.pipeline.skipped", "session_id": session_id, "goal": goal} to make
the state explicit.
🧹 Nitpick comments (3)
agent/tests/test_zp_routes.py (1)

18-20: /api 경로도 응답 shape를 동일하게 검증해 route parity 회귀를 잡아주세요.

현재는 session_id만 확인해서 /zero-prompt/start/api/zero-prompt/start의 응답 스키마 불일치를 놓칠 수 있습니다.

🔧 제안 패치
 async def test_zp_start_api_prefix(app_client):
     resp = await app_client.post("/api/zero-prompt/start", json={})
     assert resp.status_code == 200
     body = resp.json()
     assert body["session_id"]
+    assert body["goal_go_cards"] == 5
+    assert body["status"] == "exploring"
+    assert body["build_queue"] == []
+    assert body["active_build"] is None

As per coding guidelines: Keep route parity between /api/... and local bare paths when changing request/response shapes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/tests/test_zp_routes.py` around lines 18 - 20, The test only asserts
body["session_id"] for one endpoint, which misses schema differences between
/zero-prompt/start and /api/zero-prompt/start; update
agent/tests/test_zp_routes.py to call both endpoints (e.g., '/zero-prompt/start'
and '/api/zero-prompt/start'), parse each resp.json() into variables (e.g., body
and api_body) and assert full shape parity — either compare key sets and
required fields (including "session_id") or assert body == api_body — so the
test fails on any response-schema regression between the two routes.
agent/server.py (2)

326-330: 로컬 배포 URL 감지가 일부 필드를 놓칠 수 있습니다.

현재 localUrl/local_url만 보는데, 저장 포맷에 따라 local_frontend_url/local_backend_url만 채워진 경우가 있어 로컬 항목이 누락될 수 있습니다.

🔧 제안 패치
-            local_url = str(deployment.get("localUrl") or deployment.get("local_url") or "").strip()
+            local_url = str(
+                deployment.get("localUrl")
+                or deployment.get("local_url")
+                or deployment.get("local_frontend_url")
+                or deployment.get("local_backend_url")
+                or ""
+            ).strip()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/server.py` around lines 326 - 330, The local deployment detection only
checks deployment.get("localUrl") and deployment.get("local_url"), which misses
cases where only deployment keys like "local_frontend_url" or
"local_backend_url" are set; update the logic around the deployment, status, and
local_url variables in server.py (the block that defines deployment = dict(...),
status = ..., local_url = ...) to consider any of these keys as indicating a
local URL (e.g., treat local_frontend_url or local_backend_url as valid
local_url values) before deciding to append meeting to reconciled so meetings
with those fields are not missed.

1269-1270: 응답 필드는 하드코딩 대신 세션 객체를 직접 사용하면 드리프트를 줄일 수 있습니다.

create_session의 기본값과 응답값이 미래에 달라질 수 있어, 실제 session 값으로 반환하는 편이 안전합니다.

🔧 제안 패치
-            "build_queue": [],
-            "active_build": None,
+            "build_queue": list(session.build_queue),
+            "active_build": session.active_build,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/server.py` around lines 1269 - 1270, 응답에서 "build_queue": [],
"active_build": None처럼 값을 하드코딩하지 말고 create_session에서 생성된 실제 session 객체를 사용해
반환하도록 변경하세요; 서버 코드의 해당 응답 생성 부분(참조: create_session 및 변수 session)을 찾아 session을
직렬화(예: session.to_dict() 혹은 pydantic/.dict()/JSON 직렬화 방식)에 사용해 현재 세션 상태를 그대로 응답에
포함시키고, 필요한 경우 민감 필드를 필터링하도록 처리하세요.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@agent/tests/test_zp_routes.py`:
- Around line 53-57: The test currently allows resp.status_code == 200 which is
too permissive; update the assertions in the test for queue_build to require a
422 response and validate the error payload shape: assert resp.status_code ==
422 and then parse resp.json() and assert that body["type"] == "zp.action.error"
and that body["detail"] is one of the expected error identifiers (e.g.,
"session_not_found", "card_not_found", "card_not_go_ready"); ensure you remove
the {200, 422} set check and the conditional branch so the test fails on
unexpected 200 responses.

---

Outside diff comments:
In `@agent/server.py`:
- Around line 1256-1261: The code currently always pushes the
"zp.pipeline.started" event even when pipeline execution is skipped in test
mode; update the logic around _test_api_enabled(), _run_zp_pipeline, and
push_zp_event so that when _test_api_enabled() is true you either omit emitting
"zp.pipeline.started" or emit a distinct "zp.pipeline.skipped" event instead;
specifically, move or conditionally call push_zp_event({"type":
"zp.pipeline.started", ...}) only in the branch where you create the asyncio
task (the path that invokes _run_zp_pipeline), or add an else that pushes
{"type":"zp.pipeline.skipped", "session_id": session_id, "goal": goal} to make
the state explicit.

---

Nitpick comments:
In `@agent/server.py`:
- Around line 326-330: The local deployment detection only checks
deployment.get("localUrl") and deployment.get("local_url"), which misses cases
where only deployment keys like "local_frontend_url" or "local_backend_url" are
set; update the logic around the deployment, status, and local_url variables in
server.py (the block that defines deployment = dict(...), status = ...,
local_url = ...) to consider any of these keys as indicating a local URL (e.g.,
treat local_frontend_url or local_backend_url as valid local_url values) before
deciding to append meeting to reconciled so meetings with those fields are not
missed.
- Around line 1269-1270: 응답에서 "build_queue": [], "active_build": None처럼 값을
하드코딩하지 말고 create_session에서 생성된 실제 session 객체를 사용해 반환하도록 변경하세요; 서버 코드의 해당 응답 생성
부분(참조: create_session 및 변수 session)을 찾아 session을 직렬화(예: session.to_dict() 혹은
pydantic/.dict()/JSON 직렬화 방식)에 사용해 현재 세션 상태를 그대로 응답에 포함시키고, 필요한 경우 민감 필드를 필터링하도록
처리하세요.

In `@agent/tests/test_zp_routes.py`:
- Around line 18-20: The test only asserts body["session_id"] for one endpoint,
which misses schema differences between /zero-prompt/start and
/api/zero-prompt/start; update agent/tests/test_zp_routes.py to call both
endpoints (e.g., '/zero-prompt/start' and '/api/zero-prompt/start'), parse each
resp.json() into variables (e.g., body and api_body) and assert full shape
parity — either compare key sets and required fields (including "session_id") or
assert body == api_body — so the test fails on any response-schema regression
between the two routes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d6da19de-7a3e-46e4-926b-46047a3555c3

📥 Commits

Reviewing files that changed from the base of the PR and between 3479732 and 162a568.

📒 Files selected for processing (2)
  • agent/server.py
  • agent/tests/test_zp_routes.py

Comment on lines +53 to +57
assert resp.status_code in {200, 422}
if resp.status_code == 200:
body = resp.json()
assert body["type"] == "zp.action.error"
assert body["error"] in {"session_not_found", "card_not_found", "card_not_go_ready"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

오류 케이스에서 200 허용은 테스트를 지나치게 느슨하게 만듭니다.

queue_build에서 zp.action.error는 서버가 422로 변환하므로(서버 핸들러 Line 1367-1368), 200 허용은 계약 회귀를 숨깁니다. 이 케이스는 422와 detail 값을 명확히 고정하는 편이 맞습니다.

🔧 제안 패치
-    assert resp.status_code in {200, 422}
-    if resp.status_code == 200:
-        body = resp.json()
-        assert body["type"] == "zp.action.error"
-        assert body["error"] in {"session_not_found", "card_not_found", "card_not_go_ready"}
+    assert resp.status_code == 422
+    body = resp.json()
+    assert body["detail"] == "card_not_found"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert resp.status_code in {200, 422}
if resp.status_code == 200:
body = resp.json()
assert body["type"] == "zp.action.error"
assert body["error"] in {"session_not_found", "card_not_found", "card_not_go_ready"}
assert resp.status_code == 422
body = resp.json()
assert body["detail"] == "card_not_found"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent/tests/test_zp_routes.py` around lines 53 - 57, The test currently
allows resp.status_code == 200 which is too permissive; update the assertions in
the test for queue_build to require a 422 response and validate the error
payload shape: assert resp.status_code == 422 and then parse resp.json() and
assert that body["type"] == "zp.action.error" and that body["detail"] is one of
the expected error identifiers (e.g., "session_not_found", "card_not_found",
"card_not_go_ready"); ensure you remove the {200, 422} set check and the
conditional branch so the test fails on unexpected 200 responses.

@ComBba ComBba merged commit d31b55f into main Mar 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant