Skip to content

[feat/#63] 커밋 매칭 평가용 golden dataset 및 평가 스크립트 추가#68

Merged
wantkdd merged 1 commit into
developfrom
feat/wantkdd-match-eval#63
Jun 1, 2026
Merged

[feat/#63] 커밋 매칭 평가용 golden dataset 및 평가 스크립트 추가#68
wantkdd merged 1 commit into
developfrom
feat/wantkdd-match-eval#63

Conversation

@wantkdd
Copy link
Copy Markdown
Collaborator

@wantkdd wantkdd commented Jun 1, 2026

✨ 작업 개요

실제 Whylog-AI 커밋 해시를 기준으로 만든 golden dataset과 /api/commit/match 응답 평가 스크립트를 추가했습니다.

이 PR은 운영 로그를 직접 읽는 PR이 아니라, 저장된 /api/commit/match 응답 JSON을 입력으로 받아 정답 커밋 포함 여부와 오탐 여부를 정량 평가하는 오프라인 평가 도구입니다.

📄 작업 내용

  • 실제 커밋 기반 golden case fixture 추가
  • Recall@K, Precision@K, MRR, no-match accuracy, false positive 지표 계산 로직 추가
  • hard negative 확인용 distractor_commit_hashes 집계 추가
  • confidence threshold 이상 false positive 별도 집계 추가
  • --fail-on-false-positive, --confidence-threshold, --fail-on-failure, --json CLI 옵션 추가
  • 평가 실행 방법과 지표 설명 문서화
  • 평가 로직 회귀 테스트 추가

📌 관련 이슈

🔌 API 변경사항 (해당 시)

  • API 변경 없음
  • 오프라인 평가 스크립트와 fixture만 추가

✅ 검증

  • uv run pytest tests/test_commit_matching_evaluation.py -q
  • uv run ruff check app/domains/commit/services/matching_evaluation.py scripts/evaluate_commit_matching.py tests/test_commit_matching_evaluation.py
  • pre-commit: ruff, ruff-format, trim trailing whitespace, fix end of files, check for added large files

💬 기타 사항

  • 평가 입력은 로그가 아니라 /api/commit/match 응답 JSON입니다.
  • API 호출, ChromaDB 조회, Gemini/LLM 요청은 수행하지 않습니다.
  • 실제 운영 데이터 응답을 저장한 뒤 이 스크립트에 넘기면 점수식/threshold 변경 전후 품질 비교에 사용할 수 있습니다.

실제 커밋 해시 기반 golden case와 저장된 /api/commit/match 응답 평가기를 추가해 threshold와 hard negative 회귀를 비교할 수 있게 한다.

Constraint: 기존 브랜치/커밋 컨벤션을 유지하면서 Lore trailer로 검증 맥락 기록

Rejected: 평가 시점에 LLM 또는 매칭 API를 직접 호출 | 재현 가능한 오프라인 응답 평가가 우선이라서 제외

Confidence: high

Scope-risk: narrow

Directive: 운영 로그가 아니라 저장된 /api/commit/match 응답 JSON을 평가 입력으로 사용

Tested: uv run pytest tests/test_commit_matching_evaluation.py -q; uv run ruff check app/domains/commit/services/matching_evaluation.py scripts/evaluate_commit_matching.py tests/test_commit_matching_evaluation.py

Not-tested: 실제 운영 ChromaDB 응답에 대한 end-to-end 매칭 품질
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Warning

Review limit reached

@wantkdd, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 12 minutes and 51 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1ef1da61-0c87-4faa-812f-6417ccab1100

📥 Commits

Reviewing files that changed from the base of the PR and between bf510d2 and 074b661.

📒 Files selected for processing (5)
  • app/domains/commit/services/matching_evaluation.py
  • docs/commit_matching_evaluation.md
  • scripts/evaluate_commit_matching.py
  • tests/fixtures/commit_matching_golden_cases.json
  • tests/test_commit_matching_evaluation.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/wantkdd-match-eval#63

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@wantkdd wantkdd self-assigned this Jun 1, 2026
@wantkdd wantkdd requested a review from Yujin1219 June 1, 2026 11:19
@wantkdd wantkdd added the 🚀 Feat 기능 구현 및 수정 label Jun 1, 2026
@wantkdd wantkdd merged commit 3930965 into develop Jun 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🚀 Feat 기능 구현 및 수정

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feat] 커밋 매칭 평가용 golden dataset 및 평가 스크립트 추가

1 participant