[codex] Add geobench visibility benchmark#72
Draft
dotoricode wants to merge 1 commit into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
문제
Tink의 LLM 답변 노출도를 확인하려면 지금까지 README나 npm 설명을 바꾼 뒤 실제로 언급률, 추천 순위, citation이 달라졌는지 반복 측정할 기준 파일이 없었습니다. geobench spec/runbook 패턴처럼 제품 스펙과 실행 절차를 repo에 두면, 향후 포지셔닝 변경 전후를 같은 기준으로 비교할 수 있습니다.
해결
geobench/tink-harness.yaml을 추가해 Tink의 alias, competitor set, cited domains, 대상 audience, public-source 기반 enriched profile을 정의했습니다.docs/geobench.md를 추가해 estimate/profile/bench/dashboard 실행 흐름과 공개 경계를 문서화했습니다.package.json과 package contents 테스트를 갱신해 spec/runbook이 npm tarball에도 포함되도록 했습니다.Unreleased에 변경 내용을 기록했습니다.검증
HOME=/private/tmp/tink-test-home npm_config_cache=/private/tmp/tink-npm-cache npm testgit diff --check참고
이번 PR은 측정 준비물과 runbook만 추가합니다. 실제 provider benchmark 실행, API key 생성, raw provider answer 저장은 하지 않았습니다. runbook에는 aggregate metrics만 공개하고 raw answers/secrets/private logs는 공개하지 않는 경계를 명시했습니다.