Skip to content

Commit 3ae7577

Browse files
committed
sync(bfmono): feat(gambit): add tool-call-aware grader schemas and root-deck guards (+19 more) (bfmono@70ec3b942)
This PR is an automated gambitmono sync of bfmono Gambit packages. - Source: `packages/gambit/` - Core: `packages/gambit-core/` - bfmono rev: 70ec3b942 Changes: - 70ec3b942 feat(gambit): add tool-call-aware grader schemas and root-deck guards - cae381f00 feat(gambit): align scaffolds with product command and hourglass policies - 5faa48b35 feat(gambit): move bot policy to folder and enforce policy summarizer flow - 9a36c4a7e fix(gambit): align env loading with init and block .gambit env writes - dbe7c54ca feat(gambit-bot): add file actions and scenario deck structure - 855784d6b docs(gambit): add public permissions guide and API jsdoc - 8f0ca0a85 feat(gambit): trace effective permission layers at runtime - 90b4b5071 feat(gambit-core): add phase-1 permission contract primitives - df9280f6a fix(gambit): restore build-bot deck path compatibility - daca46555 feat(simulator-ui): wire build, test, and grade to workspace sessions - e404a17d7 feat(gambit): add workspace-backed serve and bot sandbox flow - 5f4fa86b9 feat(gambit): scaffold workspace defaults in init - cf9b23778 feat(gambit-core): add schema guards and model param passthrough - d0e5a9617 [gambit] move chat message into transcript so it scrolls - 5c6125d99 feat(simulator-ui): open workbench drawer by default - 7c9cd05f8 feat(simulator): gate chat accordion by env flag - a2599068e feat(simulator-ui): add build chat history loading - 9911dae22 feat(simulator-ui): add workbench chat drawer accordion - 8cab8ec1f feat(simulator-ui): dock calibrate drawer and sync updates - d41ba101d Add AAR for phase 3.1.5 deck format build tab Do not edit this repo directly; make changes in bfmono and re-run the sync.
1 parent bab24a2 commit 3ae7577

27 files changed

Lines changed: 559 additions & 343 deletions

File tree

docs/external/guides/authoring.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,11 @@ deno run -A packages/gambit/scripts/migrate-schema-terms.ts <repo-root>
124124
invalid JSON or schema-violating output blocks the run with a clear error.
125125
- `graderDecks` describe calibration decks that score transcripts/artifacts. The
126126
simulator Calibrate page will run these decks against stored runs.
127+
- For graders that inspect assistant tool usage, set
128+
`contextSchema = "gambit://schemas/graders/contexts/turn_tools.zod.ts"` so
129+
`session.messages[*].tool_calls` is available in the grader input.
130+
- For conversation-level tool-call grading (single score for the whole run), use
131+
`contextSchema = "gambit://schemas/graders/contexts/conversation_tools.zod.ts"`.
127132
- Configure `acceptsUserTurns` alongside these references:
128133
- Markdown roots default to `true`; TypeScript decks default to `false`
129134
everywhere. Set it to `false` for any workflow deck that should never accept
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import { z } from "zod";
2+
3+
const graderToolCallSchema = z.object({
4+
id: z.string().optional(),
5+
type: z.string().optional(),
6+
function: z.object({
7+
name: z.string(),
8+
arguments: z.string().optional(),
9+
}),
10+
});
11+
12+
export const graderConversationMessageWithToolsSchema = z.object({
13+
role: z.string(),
14+
content: z.any().optional(),
15+
name: z.string().optional(),
16+
tool_calls: z.array(graderToolCallSchema).optional(),
17+
});
18+
19+
export const graderConversationWithToolsSchema = z.object({
20+
messages: z.array(graderConversationMessageWithToolsSchema).optional(),
21+
meta: z.record(z.any()).optional(),
22+
notes: z.object({ text: z.string().optional() }).optional(),
23+
});
24+
25+
export default z.object({
26+
session: graderConversationWithToolsSchema,
27+
});
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
export { default } from "./conversation_tools.ts";
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
export { default } from "./turn_tools.ts";
2+
export {
3+
graderConversationWithToolsSchema,
4+
graderMessageWithToolsSchema,
5+
} from "./turn_tools.ts";
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
export { default } from "./turn_tools.ts";
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
import { z } from "zod";
2+
3+
const graderToolCallSchema = z.object({
4+
id: z.string().optional(),
5+
type: z.string().optional(),
6+
function: z.object({
7+
name: z.string(),
8+
arguments: z.string().optional(),
9+
}),
10+
});
11+
12+
export const graderMessageWithToolsSchema = z.object({
13+
role: z.string(),
14+
content: z.any().optional(),
15+
name: z.string().optional(),
16+
tool_calls: z.array(graderToolCallSchema).optional(),
17+
});
18+
19+
export const graderConversationWithToolsSchema = z.object({
20+
messages: z.array(graderMessageWithToolsSchema).optional(),
21+
meta: z.record(z.any()).optional(),
22+
notes: z.object({ text: z.string().optional() }).optional(),
23+
});
24+
25+
export default z.object({
26+
session: graderConversationWithToolsSchema,
27+
messageToGrade: graderMessageWithToolsSchema,
28+
});
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
export { default } from "./turn_tools.ts";

packages/gambit-core/src/markdown.test.ts

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,101 @@ Schema deck.
101101
assertEquals(parsed, { status: 200 });
102102
});
103103

104+
Deno.test("markdown deck resolves tool-call-aware grader context schema", async () => {
105+
const dir = await Deno.makeTempDir();
106+
107+
const deckPath = await writeTempDeck(
108+
dir,
109+
"turn-tools-schema.deck.md",
110+
`+++
111+
label = "turn-tools-schema"
112+
contextSchema = "gambit://schemas/graders/contexts/turn_tools.zod.ts"
113+
+++
114+
115+
Schema deck.
116+
`,
117+
);
118+
119+
const deck = await loadMarkdownDeck(deckPath);
120+
121+
assert(deck.contextSchema, "expected context schema to resolve");
122+
const parsed = deck.contextSchema.parse({
123+
session: {
124+
messages: [
125+
{
126+
role: "assistant",
127+
tool_calls: [
128+
{
129+
function: {
130+
name: "bot_write",
131+
arguments: '{"path":"PROMPT.md"}',
132+
},
133+
},
134+
],
135+
},
136+
],
137+
},
138+
messageToGrade: {
139+
role: "assistant",
140+
tool_calls: [
141+
{
142+
function: {
143+
name: "bot_write",
144+
},
145+
},
146+
],
147+
},
148+
});
149+
150+
assertEquals(parsed.messageToGrade.role, "assistant");
151+
assertEquals(
152+
parsed.session.messages?.[0].tool_calls?.[0].function.name,
153+
"bot_write",
154+
);
155+
});
156+
157+
Deno.test("markdown deck resolves conversation-level tool-call grader context schema", async () => {
158+
const dir = await Deno.makeTempDir();
159+
160+
const deckPath = await writeTempDeck(
161+
dir,
162+
"conversation-tools-schema.deck.md",
163+
`+++
164+
label = "conversation-tools-schema"
165+
contextSchema = "gambit://schemas/graders/contexts/conversation_tools.zod.ts"
166+
+++
167+
168+
Schema deck.
169+
`,
170+
);
171+
172+
const deck = await loadMarkdownDeck(deckPath);
173+
174+
assert(deck.contextSchema, "expected context schema to resolve");
175+
const parsed = deck.contextSchema.parse({
176+
session: {
177+
messages: [
178+
{
179+
role: "assistant",
180+
tool_calls: [
181+
{
182+
function: {
183+
name: "bot_write",
184+
arguments: '{"path":"faq-bot/PROMPT.md"}',
185+
},
186+
},
187+
],
188+
},
189+
],
190+
},
191+
});
192+
193+
assertEquals(
194+
parsed.session.messages?.[0].tool_calls?.[0].function.name,
195+
"bot_write",
196+
);
197+
});
198+
104199
Deno.test("markdown deck warns on legacy schema URIs", async () => {
105200
const dir = await Deno.makeTempDir();
106201
const deckPath = await writeTempDeck(

src/decks/gambit-bot/PROMPT.md

Lines changed: 15 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -55,30 +55,25 @@ label = "Deck format policy guard (turn) LLM"
5555
path = "./graders/deck_format_policy_llm/PROMPT.md"
5656
description = "LLM guard for policy-compliant deck editing behavior."
5757

58-
[[scenarios]]
59-
label = "Recipe selection on-ramp tester"
60-
path = "./scenarios/recipe_selection/PROMPT.md"
61-
description = "Synthetic user that asks Gambit Bot to build a recipe selection chatbot."
62-
63-
[[scenarios]]
64-
label = "Recipe selection (no skip)"
65-
path = "./scenarios/recipe_selection_no_skip/PROMPT.md"
66-
description = "Synthetic user that completes the question flow without skipping to building."
58+
[[graders]]
59+
label = "First deck location guard (turn)"
60+
path = "./graders/first_deck_root_prompt_guard/PROMPT.md"
61+
description = "Checks that the first created deck is root PROMPT.md (not a subfolder PROMPT.md)."
6762

68-
[[scenarios]]
69-
label = "Build tab demo prompt"
70-
path = "./scenarios/build_tab_demo/PROMPT.md"
71-
description = "Synthetic user prompt for the build tab demo."
63+
[[graders]]
64+
label = "First deck location guard (tools)"
65+
path = "./graders/first_deck_root_prompt_guard_tools/PROMPT.md"
66+
description = "Checks first created deck location using tool-call-aware grading context."
7267

73-
[[scenarios]]
74-
label = "NUX from scratch demo prompt"
75-
path = "./scenarios/nux_from_scratch_demo/PROMPT.md"
76-
description = "Synthetic user prompt for the NUX from-scratch build demo."
68+
[[graders]]
69+
label = "First deck location guard (tools, conversation)"
70+
path = "./graders/first_deck_root_prompt_guard_tools_conversation/PROMPT.md"
71+
description = "Conversation-level check of first created deck location with tool-call-aware context."
7772

7873
[[scenarios]]
79-
label = "Investor FAQ regression"
80-
path = "./scenarios/investor_faq_regression/PROMPT.md"
81-
description = "Replays the investor FAQ build flow that previously produced a non-v1.0 deck format."
74+
label = "FAQ bot build flow"
75+
path = "./scenarios/faq_bot_build_flow/PROMPT.md"
76+
description = "Synthetic user flow that builds an FAQ bot, checks policy alignment, and requests a root-level deck move."
8277
+++
8378

8479
You are GambitBot, an AI assistant designed to help people build other AI
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
+++
2+
label = "First deck location guard (turn)"
3+
description = "Deterministic guard that checks whether the first created deck is root PROMPT.md."
4+
contextSchema = "gambit://schemas/graders/contexts/turn_tools.zod.ts"
5+
responseSchema = "gambit://schemas/graders/grader_output.zod.ts"
6+
execute = "./first_deck_root_prompt_guard.deck.ts"
7+
+++
8+
9+
Compute grader that enforces first deck location policy.

0 commit comments

Comments
 (0)