Phase 1 Live Validation Results — All 15 Live Tests Pass (Epic #291) #349

umyunsang · 2026-04-13T14:37:11Z

umyunsang
Apr 13, 2026
Maintainer

Phase 1 Live API Validation Results

Epic: #291 — Phase 1 Final Validation & Stabilization (Live)
PR: #348
Date: 2026-04-13
Commit: 1b5e57e

Test Results Summary

Category	Tests	Status
Unit + Integration + E2E	870	✅ ALL PASS
Live — KOROAD (data.go.kr)	3	✅ ALL PASS
Live — KMA (data.go.kr)	4	✅ ALL PASS
Live — LLM (FriendliAI K-EXAONE)	3	✅ ALL PASS
Live — Composite (road_risk_score)	3	✅ ALL PASS
Live — E2E (full Scenario 1 flow)	2	✅ ALL PASS
Total	885	✅ 0 FAILURES

Live-Only Defects Discovered & Fixed

These defects were invisible behind mocks and only surfaced when hitting real APIs:

1. KOROAD API `type` parameter (not `_type`)

Symptom: KOROAD API returned XML instead of JSON (Content-Type: text/xml;charset=UTF-8)
Root cause: Code used _type=json (KMA convention) but KOROAD uses type=json
Fix: Changed request parameter from "_type": "json" → "type": "json"
Why mocks couldn't catch it: Fixtures always contained pre-recorded JSON; never tested actual HTTP parameter naming

2. KOROAD flat JSON response structure

Symptom: KeyError parsing response — expected nested response.header/body structure
Root cause: KOROAD type=json returns flat structure {"resultCode":"00","items":{...},"totalCount":N}, not nested like KMA
Fix: Rewrote _parse_response() to read from top-level keys
Why mocks couldn't catch it: Fixtures were authored with assumed (incorrect) structure

3. KOROAD `afos_fid` returned as integer

Symptom: pydantic_core.ValidationError: Input should be a valid string for afos_fid
Root cause: Real API returns "afos_fid": 7192978 (int), fixtures had "afos_fid": "0001" (str)
Fix: Added coerce_numbers_to_str=True to AccidentHotspot Pydantic model
Why mocks couldn't catch it: Fixtures only contained string values; real API wire format differs

4. KOROAD NODATA_ERROR treated as hard error

Symptom: E2E multi-turn test failed — Turn 2 KOROAD call returned resultCode='03' and adapter threw ToolExecutionError
Root cause: _parse_response() raised on any resultCode != "00", but code 03 (NODATA_ERROR) means "no matching records" — a valid empty result
Fix: Handle resultCode == "03" as empty result set (0 hotspots)
Why mocks couldn't catch it: Fixtures never returned NODATA_ERROR; real queries with specific parameters can legitimately find no data

5. K-EXAONE reasoning token budget exhaustion

Symptom: LLM stream test got 0 content_delta events
Root cause: K-EXAONE uses reasoning_content tokens before content tokens, sharing the same max_tokens budget. With max_tokens=1024, all tokens were consumed by reasoning, leaving none for actual content
Fix: Increased max_tokens from 1024 → 4096 for streaming tests
Why mocks couldn't catch it: MockLLM returns fixed responses immediately; real model has variable reasoning depth

6. `gu_gun` parameter was optional (should be required)

Symptom: KOROAD API returned errors when guGun parameter was missing
Root cause: gu_gun was defined as Optional with a default of None in the Pydantic input model, but the KOROAD API documents it as required (항목구분: @1)
Fix: Made gu_gun: GugunCode required across KoroadAccidentSearchInput and RoadRiskScoreInput
Why mocks couldn't catch it: Test fixtures didn't validate parameter completeness against the real API spec

Files Changed (15 files, +222/-153)

File	Change
`src/kosmos/tools/koroad/koroad_accident_search.py`	`type=json`, flat parse, coerce, NODATA handling
`src/kosmos/tools/composite/road_risk_score.py`	`gu_gun` required
`tests/tools/koroad/fixtures/*.json` (4 files)	Rewritten to flat JSON format
`tests/tools/koroad/test_koroad_accident_search.py`	+2 test cases (coercion, NODATA)
`tests/live/test_live_*.py` (4 files)	`gu_gun` parameter, max_tokens
`tests/tools/composite/test_road_risk_score.py`	`gu_gun` required
`tests/tools/test_search_integration.py`	`gu_gun` required
`tests/e2e/conftest.py`	`gu_gun` in E2E fixtures
`tests/live/conftest.py`	Rate-limit delay fixture

Validation of Epic #291 Risk Matrix

Layer	Risk	Result
LLM Client	FriendliAI SSE chunk boundary	✅ Streaming works correctly
LLM Client	K-EXAONE reasoning token budget	⚠️ Fixed (max_tokens increase)
API Adapters	data.go.kr XML-in-JSON gateway	⚠️ Fixed (`type` parameter)
API Adapters	Response schema drift	⚠️ Fixed (flat structure + int coercion)
API Adapters	NODATA_ERROR handling	⚠️ Fixed (graceful empty result)
Recovery	Real network timeout + retry	✅ Works with 30s timeout
Context	Actual K-EXAONE token count	✅ UsageTracker records real counts
CLI	Korean I/O encoding	✅ Rich rendering works

Key Takeaway

6 out of 8 predicted live-only risk points manifested as real defects. This validates the Epic #291 hypothesis: mock-based tests create a false sense of security for external API integrations. Live validation is essential before closing Phase 1.

Quality Gates

uv run pytest — 870 passed, 15 skipped
uv run pytest -m live — 15 passed
uv run mypy src/kosmos — no issues
uv run ruff check src/ tests/ — all checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 1 Live Validation Results — All 15 Live Tests Pass (Epic #291) #349

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Phase 1 Live Validation Results — All 15 Live Tests Pass (Epic #291) #349

Uh oh!

umyunsang Apr 13, 2026 Maintainer

Phase 1 Live API Validation Results

Test Results Summary

Live-Only Defects Discovered & Fixed

1. KOROAD API type parameter (not _type)

2. KOROAD flat JSON response structure

3. KOROAD afos_fid returned as integer

4. KOROAD NODATA_ERROR treated as hard error

5. K-EXAONE reasoning token budget exhaustion

6. gu_gun parameter was optional (should be required)

Files Changed (15 files, +222/-153)

Validation of Epic #291 Risk Matrix

Key Takeaway

Quality Gates

Replies: 0 comments

umyunsang
Apr 13, 2026
Maintainer

1. KOROAD API `type` parameter (not `_type`)

3. KOROAD `afos_fid` returned as integer

6. `gu_gun` parameter was optional (should be required)