feat(tool,runner): async tools — pause/resume, typed tool.Func, pre-1.0 API cleanups by weeco · Pull Request #147 · redpanda-data/ai-sdk-go

weeco · 2026-06-09T20:10:05Z

Async tools for the agent runtime: tools (and interceptors) can now pause an invocation — for external jobs, human approval, or user input — and the runner can resume it later, across process boundaries. Ships with the typed tool.Func path and a set of pre-1.0 API cleanups.

Builds on #142 (sealed Part).

Why

Previously every tool had to return its final result inside one Execute call. That ruled out deployments, batch jobs, human approval gates, MCP elicitation — anything that takes minutes or needs a human in the middle. The SDK had wire format for "paused" but no mechanism to produce, persist, or resume a pause.

The API

Sync tool (the 90% path)

type WeatherInput struct {
	City string `json:"city" jsonschema:"City name"`
}

type WeatherOutput struct {
	TempC float64 `json:"temp_c"`
}

var weather = tool.Must(tool.Func(
	tool.Spec{Name: "get_weather", Description: "Current weather for a city."},
	func(ctx context.Context, in WeatherInput) (tool.Result[WeatherOutput], error) {
		return tool.Done(WeatherOutput{TempC: 21.5}), nil
	},
))

Schema is inferred from WeatherInput. No JSON plumbing, no Definition() boilerplate.

Async tool (external job)

var deploy = tool.Must(tool.Func(
	tool.Spec{
		Name:        "deploy",
		Description: "Trigger a production deployment.",
		Async:       tool.AsyncExternalResult(), // model is told not to re-call while pending
	},
	func(ctx context.Context, in DeployInput) (tool.Result[DeployStatus], error) {
		jobID := startDeployment(in)

		return tool.Pending(
			DeployStatus{ID: jobID, Status: "queued"}, // placeholder the model sees meanwhile
			tool.WithCorrelationID(jobID),
			tool.WithAwaitTimeout(30*time.Minute),
		), nil
	},
))

The invocation ends with FinishReasonPaused; the pending call is persisted on the session (session.PendingToolCall, kvstore included) and surfaced on InvocationEndEvent.PendingCalls.

Resuming (e.g. from a webhook handler)

stream, err := r.Resume(ctx, actorID, sessionID,
	runner.Resumption{CallID: callID, Output: finalResultJSON})
if err != nil {
	// eager: ErrResumeConflict, ErrPendingCallNotFound, ErrInvalidResumePayload
}

for evt, err := range stream { // agent continuation (model reacts to the result)
	...
}

Resume mutates eagerly — by the time it returns, the session is updated and saved; dropping the stream skips only the model continuation. It is idempotent for at-least-once delivery: duplicate submissions with the same payload are acknowledged, different payloads conflict (Stripe-style). Runner.Progress records non-terminal updates; Runner.Cancel aborts.

Human-in-the-loop approval (interceptor)

func (g *Gate) InterceptToolExecution(
	ctx context.Context, info *agent.ToolCallInfo, next agent.ToolExecutionNext,
) (tool.Execution, error) {
	if info.Resume == nil { // first entry: pause instead of executing
		return tool.Execution{
			Output: json.RawMessage(`{"status":"awaiting_approval"}`),
			Await:  &tool.Await{Reason: tool.AwaitReasonApproval, Message: "Approve " + info.Req.Name + "?"},
		}, nil
	}

	decision := info.Resume
	info.Resume = nil // consume the decision; the tool runs fresh below

	if decision.Error != "" {
		return tool.Execution{}, errors.New(decision.Error) // denied → model-visible tool error
	}

	return next(ctx, info) // approved → tool executes exactly once
}

Re-entry goes through the interceptor chain, so denial means the tool never runs. Chained pauses work: approve → tool itself returns Pending → resume again with the external result (covered by an end-to-end test).

Breaking changes (pre-1.0, intentional)

tool.Tool is now Name()/Description()/InputSchema() + Execute(ctx, Call) (Execution, error); errors returned from Execute become model-visible tool errors.
tool.Registry is a concrete struct; Execute/ExecuteAll removed (use Run(...).Response()); NewRegistry() is variadic. MCP accepts a narrow mcp.ToolRegistry interface.
Runner.Resume/Cancel return (iter.Seq2[agent.Event, error], error); Progress returns (agent.ToolProgressEvent, error).
agent.ToolInterceptor returns tool.Execution; ToolCallInfo gained Resume.
Removed: Call.Args, Await.ExpiresAt (use Timeout), Execution/Action/Config.Metadata (write-only), Spec.OutputSchema (advisory-only), tool.PendingReentry (the typed path never re-runs the function; implement Tool directly for re-entry logic), coalescing fields. Await.Resume may be left empty (defaults from Reason).
kvstore persists PendingToolCalls/ResumeReceipts (new proto fields 4/5, backward compatible).

Known gaps (deliberate, follow-ups)

Crash recovery re-executes incomplete tools and cannot re-bind pauses produced during recovery (warned loudly at runtime).
A2A/Vercel adapters surface pauses via InvocationEndEvent.PendingCalls but don't yet map ToolProgressEvent/ToolArtifactEvent to their wire formats.
Session locks are process-local; distributed deployments need an external advisory lock (documented on session.Store).

…pause/resume Tools (and interceptors) can pause an invocation — for external jobs, human approval, or user input — and resume it later, across process boundaries. - tool.Tool is now Name/Description/InputSchema + Execute(ctx, Call) returning Execution{Output, Await, Actions}. The typed path is tool.Func[In, Out] with schema inference; results built via Done/Pending/NeedInput. - Await{Reason, Resume, ...} describes a pause; the closed Reason/Resume table is validated by the registry. Pauses persist as typed session.PendingToolCall records (kvstore included: new proto map fields, opaque JSON values carrying their own schema_version). - Runner gains Resume/Progress/Cancel with hash-based resume receipts: duplicate submissions are acknowledged, conflicting ones rejected (ErrResumeConflict). - Re-entry routes through the agent's tool interceptor chain (LLMAgent.ExecuteToolResume), so an approval interceptor consumes the decision before the tool runs — denied approvals never execute the tool. funcTool never re-runs the typed function on re-entry. Chained pauses (approve → external work) keep the call resumable. - llmagent persists tool response parts in request order while streaming events in completion order. - Examples migrated to the new API; task build:examples wired into CI.

Resume previously returned a lazy iter.Seq2 — calling it and dropping the return value compiled cleanly and did nothing. Now validation, authorization, receipts, mutation, and the session save all happen before Resume returns; the stream replays mutation-phase events and then runs the agent continuation (re-acquiring the session lock). All-rejected batches fail eagerly. Progress returns its single ToolProgressEvent directly; Cancel reports ResumeOperationCancel to the ResumeAuthorizer.

Export SpecProvider{ToolSpec() Spec} and Unwrapper{Unwrap() Tool}; SpecOf follows Unwrap chains so decorators preserve the wrapped tool's async hints. AsyncSpec enforcement moves from funcTool into the registry, covering every Tool implementation. agenttool declares AsyncHandoff.

Execution.Actions was write-only: builtins emitted ActionArtifact but nothing read it. llmagent now yields a ToolArtifactEvent per artifact action; adapters and applications own persistence.

- runner.Result → runner.Resumption (collided with tool.Result[T]). - tool.Registry is a concrete struct; Execute/ExecuteAll deleted (they passed a zero InvocationInfo and swallowed pauses); NewRegistry is variadic; MCP accepts a narrow ToolRegistry interface. - Removed Call.Args, Await.ExpiresAt (use Timeout), write-only metadata bags, and unimplemented coalescing fields. Await.Resume may be left empty — Normalize defaults it from Reason; interceptor-created pauses are normalized and validated too. - Run's userID is used: message-resume goes through the ResumeAuthorizer. ToolType constants; deterministic PendingCalls ordering.

- Validate JSON at trust boundaries (tool output, resume output, progress payloads; new ErrInvalidResumePayload). Async-hint separator normalized; stale tool/doc.go rewritten; crash recovery warns loudly when it strands a pause. - fakellm works under the agent loop: ThenRespondText wires streaming, the echo fallback carries its message, conversation keys are stable across turns; llmagent sets Request.Metadata[session_id]. - Review findings: agenttool re-entry resolves with the resume payload instead of re-running the child agent; xid session IDs; SpecOf is cycle-bounded; a2a empty-reasoning filter restored; shared SummarizePendingCall projection; Await metadata cloned into pending state. - API trims for a leaner surface: PendingReentry and Spec.OutputSchema removed. Resume on an agent without re-entry support fails eagerly (ErrReentryNotSupported) instead of silently recording the payload; tool.Definition treats a provided Spec as the source of truth.

weeco · 2026-06-10T14:45:04Z

@claude Review this, specifically whether API and usage for users is good

claude · 2026-06-10T14:45:24Z

Claude finished @weeco's task in 4m 1s —— View job

API & Usage Review — async tools / typed `tool.Func`

I read the core surfaces (tool/{tool,func,result,spec,await,call,execution,registry}.go), the runner resume path (runner/resume.go, config.go), the interceptor/pause surfaces (agent/interceptors.go, pause_events.go), and the examples. Overall this is a strong, well-documented design — the doc comments are genuinely excellent and the layering (control-flow Await vs. model-visible Output vs. side-effect Actions) is principled. Below is my assessment of the public API ergonomics, organized by how much friction a user is likely to hit.

What's very good

The 90% path is clean. tool.Func(Spec{...}, fn) → tool.Done(v) with inferred schema is exactly the right default; no JSON plumbing. The examples (tools.go) read well.
Helper/spec pairing is intuitive. Pending ↔ AsyncExternalResult, NeedInput ↔ AsyncUserInput. The With* option sets compose predictably.
Method-signature consistency. Run / Resume / Progress / Cancel all lead with (ctx, userID, sessionID, …). Easy to learn one, know the rest.
Idempotent resume (Stripe-style) with ErrResumeConflict / ResumeAcknowledgedEvent and a metadata-excluded hash is the right contract for at-least-once webhook delivery, and it's documented at the call site.
Eager-mutation semantics of Resume are clearly spelled out ("dropping the stream skips only the model continuation"). The drain idiom in the doc is helpful.
Error-vs-structured-failure (return error → model-visible tool error; return structured Output with nil error → non-error payload) is documented right on the Tool interface. Good.

Friction points worth addressing before freeze

1. The interceptor re-entry "consume" protocol is a footgun. agent/interceptors.go:199 + the approval pattern require the author to manually info.Resume = nil on re-entry before calling next. If they forget, funcTool.Execute sees call.Resume != nil and short-circuits — returning the approval decision JSON as the tool's output instead of ever running the tool (tool/func.go:101-107). That's a silent correctness bug, not a loud failure, and the only thing standing between a user and it is a doc comment. Consider either:

a tiny helper, e.g. decision, ok := info.TakeResume() that returns the payload and clears the field atomically; or
having the runtime clear info.Resume for the interceptor that pauses, so the contract is "read it, don't manage it."

Fix this →

2. No runnable example of the marquee feature. The PR's headline is async pause/resume + human-in-the-loop approval, but examples/agent_interceptors/tool_approval_hook.go still uses the old synchronous bufio stdin + errors.New("denied") path — it never returns a tool.Await and never calls Runner.Resume. There is no example anywhere that exercises Pending → InvocationEndEvent.PendingCalls → Runner.Resume(...). For a feature this central to adoption, a small end-to-end example (deploy-job or approval-gate, resumed from a second goroutine/handler) would do more for usability than any amount of doc text.

3. "Resume" is heavily overloaded. Across the public surface the word means six different things: Await.Resume (a ResumeMode), Call.Resume/ToolCallInfo.Resume (a *ResumePayload), Runner.Resume (method), Resumption (runner input), and ResumePayload (re-entry input). Each is individually defensible, but a newcomer has to hold all six apart. At minimum I'd reconsider Await.Resume ResumeMode — naming it Await.ContinueAs or Await.Via would remove the worst collision (a field literally named Resume that is not a resume payload, sitting next to fields that are).

4. The typed story stops at the placeholder. Pending[T](v T) types the placeholder, but the actual resumed result arrives as Resumption.Output json.RawMessage — hand-marshaled and unchecked against T. Given how much the rest of the API leans on generics, a typed resume helper (e.g. runner.ResultOf(callID, outVal) that marshals) would close the loop and prevent shape drift between the placeholder and the real result.

5. Spec.Async must exactly equal the returned Await. validateAwaitAgainstSpec (registry.go:391) rejects any reason/resume mismatch. This is fine for the static case, but a tool that wants to pause for different reasons depending on input can't — it must pick one. That's a reasonable v1 constraint, but it's only discoverable at runtime as an ErrAwaitInvalid. Worth a one-line note in the AsyncSpec doc that the declared pair is a hard contract, not a hint.

Minor / nits

RequireInputResponse.Status (tool/builtin/require_input.go:40) is explicitly retained "for callers that grep for the old reconciler status string." In a PR whose theme is pre-1.0 cleanup, keeping a field alive for string-grepping consumers is the kind of debt this PR is otherwise removing. Consider dropping it now.
Empty-resume semantics — Resumption{} with no Output/Error resolving successfully against LastOutput (resume.go:471-476) is a surprising-but-documented default. Fine; just make sure it's in the package doc, not only the struct comment.
executeReentry builds tool.InvocationInfo{SessionID: …} only (resume.go:459) — Turn/AgentName/InvocationID are empty on re-entry. Tools that log those will see blanks. Probably acceptable, but tools shouldn't assume those are populated; a doc note on InvocationInfo would help.
Must panics at init — correct call, but worth one line in Func's doc reminding users that a bad inferred schema (unsupported In type) turns into an init-time panic, since schema inference is now implicit.

Verdict

The API is in good shape and the typed tool.Func path is a clear win. The two things I'd genuinely gate on before freezing are (1) the manual info.Resume = nil footgun and (2) shipping at least one runnable async-resume example — both directly affect whether users land the feature correctly on the first try. The "Resume" overloading (#3) and typed-resume gap (#4) are worth a deliberate decision now since they're hard to change post-1.0.

I did not run the build/tests (review-only). Happy to draft the resume example or the TakeResume() helper if you want it implemented.

This comment has been minimized.

Sign in to view

weeco force-pushed the refactor/async-tools branch from eefb172 to 3bb7ea3 Compare June 9, 2026 20:16

weeco added 5 commits June 9, 2026 22:48

feat(agent): surface tool artifact actions as ToolArtifactEvent

1292b0c

Execution.Actions was write-only: builtins emitted ActionArtifact but nothing read it. llmagent now yields a ToolArtifactEvent per artifact action; adapters and applications own persistence.

weeco force-pushed the refactor/async-tools branch 2 times, most recently from 0e2ebd3 to 9c07425 Compare June 9, 2026 20:57

weeco force-pushed the refactor/async-tools branch from 9c07425 to aaf4796 Compare June 9, 2026 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tool,runner): async tools — pause/resume, typed tool.Func, pre-1.0 API cleanups#147

feat(tool,runner): async tools — pause/resume, typed tool.Func, pre-1.0 API cleanups#147
weeco wants to merge 6 commits into
mainfrom
refactor/async-tools

weeco commented Jun 9, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

weeco commented Jun 10, 2026

Uh oh!

claude Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

weeco commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

The API

Sync tool (the 90% path)

Async tool (external job)

Resuming (e.g. from a webhook handler)

Human-in-the-loop approval (interceptor)

Breaking changes (pre-1.0, intentional)

Known gaps (deliberate, follow-ups)

Uh oh!

This comment has been minimized.

weeco commented Jun 10, 2026

Uh oh!

claude Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API & Usage Review — async tools / typed tool.Func

What's very good

Friction points worth addressing before freeze

Minor / nits

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weeco commented Jun 9, 2026 •

edited

Loading

claude Bot commented Jun 10, 2026 •

edited

Loading

API & Usage Review — async tools / typed `tool.Func`