Skip to content

Commit e80c7ae

Browse files
authored
feat: add generic /v1/completions client and token debug visualization (#428)
* feat: add generic /v1/completions client and token debug visualization Add a generic FireworksV1CompletionsClient that handles local tokenization via HuggingFace transformers and calls the /v1/completions endpoint with token-in/token-out. Tool-call parsing is pluggable via a callback rather than hardcoded to any specific domain. Also add TokenDebugView component for the evaluation dashboard with: - Text view: readable colored text with mask overlay (prompt vs completion) - Episode view: per-token visualization with mask or logprob coloring - Turns view: per-turn breakdown of prompt/completion tokens - Token ID chips with hover tooltips showing detokenized text and logprob - Smooth gradient logprob coloring (green=high confidence, red=low) Made-with: Cursor * fix: address CI failures and review comments - Rewrite tests to match the generic client API (remove references to old domain-specific methods like _parse_tool_call_with_optional_fallback) - Fix TokenDebugSection guard to also check extra?.full_episode - Fix zero reward styled as red/negative — now uses neutral gray - Fix tools=[] vs None: explicit empty list no longer falls back to default_tools Made-with: Cursor * fix: pass thinking kwargs in build_assistant_turn_token_ids, handle missing fullEpisode - build_assistant_turn_token_ids now passes _thinking_kwargs() to apply_chat_template, consistent with _build_prompt_token_ids and build_tool_response_suffix_token_ids - View selector no longer silently renders nothing when fullEpisode is null and user selected text/episode view — falls back to turns view or shows a placeholder message Made-with: Cursor * fix: guard against None after Mapping extraction, prevent request_params overriding core fields - Add second None check in _normalize_token_id_sequence after extracting from a Mapping, since .get() can return None even when the key exists - Move request_params spread before explicit keys (model, prompt, temperature, max_tokens, logprobs) so they cannot be silently overridden by stray entries in request_params Made-with: Cursor
1 parent 4aca272 commit e80c7ae

File tree

5 files changed

+1256
-1
lines changed

5 files changed

+1256
-1
lines changed

eval_protocol/integrations/__init__.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,17 @@
33
from .openeval import adapt
44
from .trl import create_trl_adapter
55
from .openai_rft import build_python_grader_from_evaluation_test
6+
from .fireworks_v1_completions_client import (
7+
FireworksV1CompletionsClient,
8+
ParsedToolCall,
9+
to_openai_tool_calls,
10+
)
611

7-
__all__ = ["adapt", "create_trl_adapter", "build_python_grader_from_evaluation_test"]
12+
__all__ = [
13+
"adapt",
14+
"create_trl_adapter",
15+
"build_python_grader_from_evaluation_test",
16+
"FireworksV1CompletionsClient",
17+
"ParsedToolCall",
18+
"to_openai_tool_calls",
19+
]

0 commit comments

Comments
 (0)