Skip to content

Commit bbbfdbb

Browse files
committed
sync(bfmono): fix(gambit-verify): align verify turn labels and stabilize initial run filtering (+19 more) (bfmono@84952a652)
This PR is an automated gambitmono sync of bfmono Gambit packages. - Source: `packages/gambit/` - Core: `packages/gambit/packages/gambit-core/` - bfmono rev: 84952a652 Changes: - 84952a652 fix(gambit-verify): align verify turn labels and stabilize initial run filtering - c56b7f52f feat(gambit): improve verify report controls and harden concurrent calibrate persistence - beb9435c0 feat(gambit-simulator-ui): extend listbox trigger and popover options - 25f9fdcfc fix(gambit-simulator-ui): align verify outlier chip semantics and display - a010b0ee1 feat(gambit-simulator-ui): add verify outliers to workbench chat chips - 383f2500a refactor(simulator-ui): replace nested ternaries in main routing - 13c4c8c22 fix(gambit): preserve shared references in safe session serialization - ae392aa24 feat(gambit-simulator-ui): add grader error chips to workbench chat - 1de6b335c fix(gambit): clamp deck-level maxTurns bounds in test run selection - 01d7abbb9 fix(gambit): default verify tab bootstrap flag to enabled - f3d186c7b fix(gambit): include extension schemas in exports and default serve to restored workspace - a83b7cbe7 fix(gambit): move unbounded build timeout to deck opt-in - acb2de627 fix(gambit): avoid strict json_schema 400s in openrouter responses - 8aba573b6 fix(gambit-simulator-ui): treat errored calibrate runs as failed - ca2028cf8 fix(gambit): prevent circular trace crashes in workspace test run API - 7e41517e5 fix(gambit): make build assistant run timeout unbounded - 24341143d feat(gambit): add deterministic verify fixture seeding - ff2c2d33d feat(gambit): add verify tab consistency UI - 91f0c93bb feat(gambit): add feature-flagged verify routing - 1392f8b65 feat(gambit): support deck-level maxTurns override Do not edit this repo directly; make changes in bfmono and re-run the sync.
1 parent 79e0ab5 commit bbbfdbb

2 files changed

Lines changed: 34 additions & 4 deletions

File tree

simulator-ui/src/VerifyPage.tsx

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -374,12 +374,42 @@ function VerifyPage(
374374
}, [selectedSession?.gradingRuns]);
375375

376376
const filteredRuns = useMemo(() => {
377+
const latestScenarioRunIdFromRuns = sessionRuns
378+
.map((run) => scenarioRunIdFromCalibrationRun(run))
379+
.find((runId): runId is string => Boolean(runId));
380+
const hasOption = (runId: string | null | undefined): runId is string =>
381+
Boolean(
382+
runId &&
383+
scenarioRunOptions.some((entry) => entry.scenarioRunId === runId),
384+
);
385+
const meta = sessionDetail?.meta && typeof sessionDetail.meta === "object"
386+
? sessionDetail.meta as Record<string, unknown>
387+
: {};
388+
const currentScenarioRunId = typeof meta.scenarioRunId === "string" &&
389+
meta.scenarioRunId.trim().length > 0
390+
? meta.scenarioRunId
391+
: null;
392+
const activeScenarioRunFilterId = hasOption(workspaceRouting.testRunId)
393+
? workspaceRouting.testRunId
394+
: hasOption(selectedScenarioRunId)
395+
? selectedScenarioRunId
396+
: hasOption(currentScenarioRunId)
397+
? currentScenarioRunId
398+
: scenarioRunOptions[0]?.scenarioRunId ?? latestScenarioRunIdFromRuns ??
399+
null;
377400
return sessionRuns.filter((run) => {
378401
if (selectedGraderId && run.graderId !== selectedGraderId) return false;
379-
if (!selectedScenarioRunId) return true;
380-
return scenarioRunIdFromCalibrationRun(run) === selectedScenarioRunId;
402+
if (!activeScenarioRunFilterId) return true;
403+
return scenarioRunIdFromCalibrationRun(run) === activeScenarioRunFilterId;
381404
});
382-
}, [selectedGraderId, selectedScenarioRunId, sessionRuns]);
405+
}, [
406+
scenarioRunOptions,
407+
selectedGraderId,
408+
selectedScenarioRunId,
409+
sessionDetail?.meta,
410+
sessionRuns,
411+
workspaceRouting.testRunId,
412+
]);
383413

384414
const runConsistencySample = useCallback(async (payload: {
385415
workspaceId: string;

simulator-ui/src/verify_metrics.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ const flattenRunExamples = (
129129
? turnRecord.messageRefId
130130
: undefined;
131131
const key = messageRefId ? `ref:${messageRefId}` : `turn:${index}`;
132-
const label = `Turn ${index + 1}`;
132+
const label = `Assistant turn ${fallbackIndex + 1}`;
133133
const parsed = extractScoreReasonPass(turnRecord.result);
134134
buckets.push({
135135
key,

0 commit comments

Comments
 (0)