Skip to content

feat: support kilo provider for judging#384

Merged
olearycrew merged 1 commit into
pinchbench:mainfrom
shssoichiro:kilo-judge-provider
May 14, 2026
Merged

feat: support kilo provider for judging#384
olearycrew merged 1 commit into
pinchbench:mainfrom
shssoichiro:kilo-judge-provider

Conversation

@shssoichiro
Copy link
Copy Markdown
Contributor

Given that Kilo is supported as a benchmark provider in OpenClaw, and this software is from the Kilo team, it made sense to support Kilo as a judging provider. Requires KILO_API_KEY to be set in similar fashion to the other judging providers.

@shssoichiro shssoichiro force-pushed the kilo-judge-provider branch from 9d3c26a to 6e5eb5e Compare May 13, 2026 11:41
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 13, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

This is a well-implemented addition. The _judge_via_kilo function follows the exact same pattern as _judge_via_openrouter — consistent with the existing codebase. The routing dispatch correctly inserts the kilo/ check before the other providers, and the new test file provides solid coverage (missing key, correct request shape, and dispatch isolation). Nothing to flag.

Files Reviewed (4 files)
  • README.md
  • scripts/benchmark.py
  • scripts/lib_agent.py
  • tests/test_lib_agent_judge.py

Fix these issues in Kilo Cloud

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 13, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

This is a clean, well-structured addition. The _judge_via_kilo implementation follows the exact same pattern as the existing _judge_via_openrouter and _judge_via_openai providers — consistent error handling, KILO_API_KEY guard, model prefix stripping, and delegation to the shared _judge_via_openai_compat helper. The new test file covers the key cases (missing key, correct URL/payload, dispatch isolation) thoroughly.

Files Reviewed (4 files)
  • README.md
  • scripts/benchmark.py
  • scripts/lib_agent.py
  • tests/test_lib_agent_judge.py

Reviewed by claude-sonnet-4.6 · 119,399 tokens

@olearycrew olearycrew merged commit 52ef238 into pinchbench:main May 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants