[feat/#63] 커밋 매칭 평가용 golden dataset 및 평가 스크립트 추가 by wantkdd · Pull Request #68 · WhyLog-App/Whylog-AI

wantkdd · 2026-06-01T11:03:45Z

✨ 작업 개요

실제 Whylog-AI 커밋 해시를 기준으로 만든 golden dataset과 /api/commit/match 응답 평가 스크립트를 추가했습니다.

이 PR은 운영 로그를 직접 읽는 PR이 아니라, 저장된 /api/commit/match 응답 JSON을 입력으로 받아 정답 커밋 포함 여부와 오탐 여부를 정량 평가하는 오프라인 평가 도구입니다.

📄 작업 내용

실제 커밋 기반 golden case fixture 추가
Recall@K, Precision@K, MRR, no-match accuracy, false positive 지표 계산 로직 추가
hard negative 확인용 distractor_commit_hashes 집계 추가
confidence threshold 이상 false positive 별도 집계 추가
--fail-on-false-positive, --confidence-threshold, --fail-on-failure, --json CLI 옵션 추가
평가 실행 방법과 지표 설명 문서화
평가 로직 회귀 테스트 추가

📌 관련 이슈

close [feat] 커밋 매칭 평가용 golden dataset 및 평가 스크립트 추가 #63

🔌 API 변경사항 (해당 시)

API 변경 없음
오프라인 평가 스크립트와 fixture만 추가

✅ 검증

uv run pytest tests/test_commit_matching_evaluation.py -q
uv run ruff check app/domains/commit/services/matching_evaluation.py scripts/evaluate_commit_matching.py tests/test_commit_matching_evaluation.py
pre-commit: ruff, ruff-format, trim trailing whitespace, fix end of files, check for added large files

💬 기타 사항

평가 입력은 로그가 아니라 /api/commit/match 응답 JSON입니다.
API 호출, ChromaDB 조회, Gemini/LLM 요청은 수행하지 않습니다.
실제 운영 데이터 응답을 저장한 뒤 이 스크립트에 넘기면 점수식/threshold 변경 전후 품질 비교에 사용할 수 있습니다.

실제 커밋 해시 기반 golden case와 저장된 /api/commit/match 응답 평가기를 추가해 threshold와 hard negative 회귀를 비교할 수 있게 한다. Constraint: 기존 브랜치/커밋 컨벤션을 유지하면서 Lore trailer로 검증 맥락 기록 Rejected: 평가 시점에 LLM 또는 매칭 API를 직접 호출 | 재현 가능한 오프라인 응답 평가가 우선이라서 제외 Confidence: high Scope-risk: narrow Directive: 운영 로그가 아니라 저장된 /api/commit/match 응답 JSON을 평가 입력으로 사용 Tested: uv run pytest tests/test_commit_matching_evaluation.py -q; uv run ruff check app/domains/commit/services/matching_evaluation.py scripts/evaluate_commit_matching.py tests/test_commit_matching_evaluation.py Not-tested: 실제 운영 ChromaDB 응답에 대한 end-to-end 매칭 품질

coderabbitai · 2026-06-01T11:03:53Z

Warning

Review limit reached

@wantkdd, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 12 minutes and 51 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1ef1da61-0c87-4faa-812f-6417ccab1100

📥 Commits

Reviewing files that changed from the base of the PR and between bf510d2 and 074b661.

📒 Files selected for processing (5)

app/domains/commit/services/matching_evaluation.py
docs/commit_matching_evaluation.md
scripts/evaluate_commit_matching.py
tests/fixtures/commit_matching_golden_cases.json
tests/test_commit_matching_evaluation.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/wantkdd-match-eval#63

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

wantkdd self-assigned this Jun 1, 2026

wantkdd requested a review from Yujin1219 June 1, 2026 11:19

wantkdd added the 🚀 Feat 기능 구현 및 수정 label Jun 1, 2026

wantkdd merged commit 3930965 into develop Jun 1, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat/#63] 커밋 매칭 평가용 golden dataset 및 평가 스크립트 추가#68

[feat/#63] 커밋 매칭 평가용 golden dataset 및 평가 스크립트 추가#68
wantkdd merged 1 commit into
developfrom
feat/wantkdd-match-eval#63

wantkdd commented Jun 1, 2026

Uh oh!

coderabbitai Bot commented Jun 1, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wantkdd commented Jun 1, 2026

✨ 작업 개요

📄 작업 내용

📌 관련 이슈

🔌 API 변경사항 (해당 시)

✅ 검증

💬 기타 사항

Uh oh!

coderabbitai Bot commented Jun 1, 2026

Review limit reached

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant