[feat/#63] 커밋 매칭 평가용 golden dataset 및 평가 스크립트 추가#68
Conversation
실제 커밋 해시 기반 golden case와 저장된 /api/commit/match 응답 평가기를 추가해 threshold와 hard negative 회귀를 비교할 수 있게 한다. Constraint: 기존 브랜치/커밋 컨벤션을 유지하면서 Lore trailer로 검증 맥락 기록 Rejected: 평가 시점에 LLM 또는 매칭 API를 직접 호출 | 재현 가능한 오프라인 응답 평가가 우선이라서 제외 Confidence: high Scope-risk: narrow Directive: 운영 로그가 아니라 저장된 /api/commit/match 응답 JSON을 평가 입력으로 사용 Tested: uv run pytest tests/test_commit_matching_evaluation.py -q; uv run ruff check app/domains/commit/services/matching_evaluation.py scripts/evaluate_commit_matching.py tests/test_commit_matching_evaluation.py Not-tested: 실제 운영 ChromaDB 응답에 대한 end-to-end 매칭 품질
|
Warning Review limit reached
More reviews will be available in 12 minutes and 51 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✨ 작업 개요
실제 Whylog-AI 커밋 해시를 기준으로 만든 golden dataset과
/api/commit/match응답 평가 스크립트를 추가했습니다.이 PR은 운영 로그를 직접 읽는 PR이 아니라, 저장된
/api/commit/match응답 JSON을 입력으로 받아 정답 커밋 포함 여부와 오탐 여부를 정량 평가하는 오프라인 평가 도구입니다.📄 작업 내용
Recall@K,Precision@K,MRR,no-match accuracy, false positive 지표 계산 로직 추가distractor_commit_hashes집계 추가--fail-on-false-positive,--confidence-threshold,--fail-on-failure,--jsonCLI 옵션 추가📌 관련 이슈
🔌 API 변경사항 (해당 시)
✅ 검증
uv run pytest tests/test_commit_matching_evaluation.py -quv run ruff check app/domains/commit/services/matching_evaluation.py scripts/evaluate_commit_matching.py tests/test_commit_matching_evaluation.pyruff,ruff-format,trim trailing whitespace,fix end of files,check for added large files💬 기타 사항
/api/commit/match응답 JSON입니다.