Skip to content

feat(tool,runner): async tools — pause/resume, typed tool.Func, pre-1.0 API cleanups#147

Draft
weeco wants to merge 6 commits into
mainfrom
refactor/async-tools
Draft

feat(tool,runner): async tools — pause/resume, typed tool.Func, pre-1.0 API cleanups#147
weeco wants to merge 6 commits into
mainfrom
refactor/async-tools

Conversation

@weeco

@weeco weeco commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Async tools for the agent runtime: tools (and interceptors) can now pause an invocation — for external jobs, human approval, or user input — and the runner can resume it later, across process boundaries. Ships with the typed tool.Func path and a set of pre-1.0 API cleanups.

Builds on #142 (sealed Part).

Why

Previously every tool had to return its final result inside one Execute call. That ruled out deployments, batch jobs, human approval gates, MCP elicitation — anything that takes minutes or needs a human in the middle. The SDK had wire format for "paused" but no mechanism to produce, persist, or resume a pause.

The API

Sync tool (the 90% path)

type WeatherInput struct {
	City string `json:"city" jsonschema:"City name"`
}

type WeatherOutput struct {
	TempC float64 `json:"temp_c"`
}

var weather = tool.Must(tool.Func(
	tool.Spec{Name: "get_weather", Description: "Current weather for a city."},
	func(ctx context.Context, in WeatherInput) (tool.Result[WeatherOutput], error) {
		return tool.Done(WeatherOutput{TempC: 21.5}), nil
	},
))

Schema is inferred from WeatherInput. No JSON plumbing, no Definition() boilerplate.

Async tool (external job)

var deploy = tool.Must(tool.Func(
	tool.Spec{
		Name:        "deploy",
		Description: "Trigger a production deployment.",
		Async:       tool.AsyncExternalResult(), // model is told not to re-call while pending
	},
	func(ctx context.Context, in DeployInput) (tool.Result[DeployStatus], error) {
		jobID := startDeployment(in)

		return tool.Pending(
			DeployStatus{ID: jobID, Status: "queued"}, // placeholder the model sees meanwhile
			tool.WithCorrelationID(jobID),
			tool.WithAwaitTimeout(30*time.Minute),
		), nil
	},
))

The invocation ends with FinishReasonPaused; the pending call is persisted on the session (session.PendingToolCall, kvstore included) and surfaced on InvocationEndEvent.PendingCalls.

Resuming (e.g. from a webhook handler)

stream, err := r.Resume(ctx, actorID, sessionID,
	runner.Resumption{CallID: callID, Output: finalResultJSON})
if err != nil {
	// eager: ErrResumeConflict, ErrPendingCallNotFound, ErrInvalidResumePayload
}

for evt, err := range stream { // agent continuation (model reacts to the result)
	...
}

Resume mutates eagerly — by the time it returns, the session is updated and saved; dropping the stream skips only the model continuation. It is idempotent for at-least-once delivery: duplicate submissions with the same payload are acknowledged, different payloads conflict (Stripe-style). Runner.Progress records non-terminal updates; Runner.Cancel aborts.

Human-in-the-loop approval (interceptor)

func (g *Gate) InterceptToolExecution(
	ctx context.Context, info *agent.ToolCallInfo, next agent.ToolExecutionNext,
) (tool.Execution, error) {
	if info.Resume == nil { // first entry: pause instead of executing
		return tool.Execution{
			Output: json.RawMessage(`{"status":"awaiting_approval"}`),
			Await:  &tool.Await{Reason: tool.AwaitReasonApproval, Message: "Approve " + info.Req.Name + "?"},
		}, nil
	}

	decision := info.Resume
	info.Resume = nil // consume the decision; the tool runs fresh below

	if decision.Error != "" {
		return tool.Execution{}, errors.New(decision.Error) // denied → model-visible tool error
	}

	return next(ctx, info) // approved → tool executes exactly once
}

Re-entry goes through the interceptor chain, so denial means the tool never runs. Chained pauses work: approve → tool itself returns Pending → resume again with the external result (covered by an end-to-end test).

Breaking changes (pre-1.0, intentional)

  • tool.Tool is now Name()/Description()/InputSchema() + Execute(ctx, Call) (Execution, error); errors returned from Execute become model-visible tool errors.
  • tool.Registry is a concrete struct; Execute/ExecuteAll removed (use Run(...).Response()); NewRegistry() is variadic. MCP accepts a narrow mcp.ToolRegistry interface.
  • Runner.Resume/Cancel return (iter.Seq2[agent.Event, error], error); Progress returns (agent.ToolProgressEvent, error).
  • agent.ToolInterceptor returns tool.Execution; ToolCallInfo gained Resume.
  • Removed: Call.Args, Await.ExpiresAt (use Timeout), Execution/Action/Config.Metadata (write-only), Spec.OutputSchema (advisory-only), tool.PendingReentry (the typed path never re-runs the function; implement Tool directly for re-entry logic), coalescing fields. Await.Resume may be left empty (defaults from Reason).
  • kvstore persists PendingToolCalls/ResumeReceipts (new proto fields 4/5, backward compatible).

Known gaps (deliberate, follow-ups)

  • Crash recovery re-executes incomplete tools and cannot re-bind pauses produced during recovery (warned loudly at runtime).
  • A2A/Vercel adapters surface pauses via InvocationEndEvent.PendingCalls but don't yet map ToolProgressEvent/ToolArtifactEvent to their wire formats.
  • Session locks are process-local; distributed deployments need an external advisory lock (documented on session.Store).

@blacksmith-sh

This comment has been minimized.

@weeco weeco force-pushed the refactor/async-tools branch from eefb172 to 3bb7ea3 Compare June 9, 2026 20:16
weeco added 5 commits June 9, 2026 22:48
…pause/resume

Tools (and interceptors) can pause an invocation — for external jobs,
human approval, or user input — and resume it later, across process
boundaries.

- tool.Tool is now Name/Description/InputSchema + Execute(ctx, Call)
  returning Execution{Output, Await, Actions}. The typed path is
  tool.Func[In, Out] with schema inference; results built via
  Done/Pending/NeedInput.
- Await{Reason, Resume, ...} describes a pause; the closed Reason/Resume
  table is validated by the registry. Pauses persist as typed
  session.PendingToolCall records (kvstore included: new proto map
  fields, opaque JSON values carrying their own schema_version).
- Runner gains Resume/Progress/Cancel with hash-based resume receipts:
  duplicate submissions are acknowledged, conflicting ones rejected
  (ErrResumeConflict).
- Re-entry routes through the agent's tool interceptor chain
  (LLMAgent.ExecuteToolResume), so an approval interceptor consumes the
  decision before the tool runs — denied approvals never execute the
  tool. funcTool never re-runs the typed function on re-entry. Chained
  pauses (approve → external work) keep the call resumable.
- llmagent persists tool response parts in request order while
  streaming events in completion order.
- Examples migrated to the new API; task build:examples wired into CI.
Resume previously returned a lazy iter.Seq2 — calling it and dropping
the return value compiled cleanly and did nothing. Now validation,
authorization, receipts, mutation, and the session save all happen
before Resume returns; the stream replays mutation-phase events and
then runs the agent continuation (re-acquiring the session lock).
All-rejected batches fail eagerly. Progress returns its single
ToolProgressEvent directly; Cancel reports ResumeOperationCancel to the
ResumeAuthorizer.
Export SpecProvider{ToolSpec() Spec} and Unwrapper{Unwrap() Tool};
SpecOf follows Unwrap chains so decorators preserve the wrapped tool's
async hints. AsyncSpec enforcement moves from funcTool into the
registry, covering every Tool implementation. agenttool declares
AsyncHandoff.
Execution.Actions was write-only: builtins emitted ActionArtifact but
nothing read it. llmagent now yields a ToolArtifactEvent per artifact
action; adapters and applications own persistence.
- runner.Result → runner.Resumption (collided with tool.Result[T]).
- tool.Registry is a concrete struct; Execute/ExecuteAll deleted (they
  passed a zero InvocationInfo and swallowed pauses); NewRegistry is
  variadic; MCP accepts a narrow ToolRegistry interface.
- Removed Call.Args, Await.ExpiresAt (use Timeout), write-only metadata
  bags, and unimplemented coalescing fields. Await.Resume may be left
  empty — Normalize defaults it from Reason; interceptor-created pauses
  are normalized and validated too.
- Run's userID is used: message-resume goes through the
  ResumeAuthorizer. ToolType constants; deterministic PendingCalls
  ordering.
@weeco weeco force-pushed the refactor/async-tools branch 2 times, most recently from 0e2ebd3 to 9c07425 Compare June 9, 2026 20:57
- Validate JSON at trust boundaries (tool output, resume output,
  progress payloads; new ErrInvalidResumePayload). Async-hint separator
  normalized; stale tool/doc.go rewritten; crash recovery warns loudly
  when it strands a pause.
- fakellm works under the agent loop: ThenRespondText wires streaming,
  the echo fallback carries its message, conversation keys are stable
  across turns; llmagent sets Request.Metadata[session_id].
- Review findings: agenttool re-entry resolves with the resume payload
  instead of re-running the child agent; xid session IDs; SpecOf is
  cycle-bounded; a2a empty-reasoning filter restored; shared
  SummarizePendingCall projection; Await metadata cloned into pending
  state.
- API trims for a leaner surface: PendingReentry and Spec.OutputSchema
  removed. Resume on an agent without re-entry support fails eagerly
  (ErrReentryNotSupported) instead of silently recording the payload;
  tool.Definition treats a provided Spec as the source of truth.
@weeco weeco force-pushed the refactor/async-tools branch from 9c07425 to aaf4796 Compare June 9, 2026 21:03
@weeco

weeco commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

@claude Review this, specifically whether API and usage for users is good

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown

Claude finished @weeco's task in 4m 1s —— View job


API & Usage Review — async tools / typed tool.Func

I read the core surfaces (tool/{tool,func,result,spec,await,call,execution,registry}.go), the runner resume path (runner/resume.go, config.go), the interceptor/pause surfaces (agent/interceptors.go, pause_events.go), and the examples. Overall this is a strong, well-documented design — the doc comments are genuinely excellent and the layering (control-flow Await vs. model-visible Output vs. side-effect Actions) is principled. Below is my assessment of the public API ergonomics, organized by how much friction a user is likely to hit.


What's very good

  • The 90% path is clean. tool.Func(Spec{...}, fn) → tool.Done(v) with inferred schema is exactly the right default; no JSON plumbing. The examples (tools.go) read well.
  • Helper/spec pairing is intuitive. PendingAsyncExternalResult, NeedInputAsyncUserInput. The With* option sets compose predictably.
  • Method-signature consistency. Run / Resume / Progress / Cancel all lead with (ctx, userID, sessionID, …). Easy to learn one, know the rest.
  • Idempotent resume (Stripe-style) with ErrResumeConflict / ResumeAcknowledgedEvent and a metadata-excluded hash is the right contract for at-least-once webhook delivery, and it's documented at the call site.
  • Eager-mutation semantics of Resume are clearly spelled out ("dropping the stream skips only the model continuation"). The drain idiom in the doc is helpful.
  • Error-vs-structured-failure (return error → model-visible tool error; return structured Output with nil error → non-error payload) is documented right on the Tool interface. Good.

Friction points worth addressing before freeze

1. The interceptor re-entry "consume" protocol is a footgun. agent/interceptors.go:199 + the approval pattern require the author to manually info.Resume = nil on re-entry before calling next. If they forget, funcTool.Execute sees call.Resume != nil and short-circuits — returning the approval decision JSON as the tool's output instead of ever running the tool (tool/func.go:101-107). That's a silent correctness bug, not a loud failure, and the only thing standing between a user and it is a doc comment. Consider either:

  • a tiny helper, e.g. decision, ok := info.TakeResume() that returns the payload and clears the field atomically; or
  • having the runtime clear info.Resume for the interceptor that pauses, so the contract is "read it, don't manage it."

Fix this →

2. No runnable example of the marquee feature. The PR's headline is async pause/resume + human-in-the-loop approval, but examples/agent_interceptors/tool_approval_hook.go still uses the old synchronous bufio stdin + errors.New("denied") path — it never returns a tool.Await and never calls Runner.Resume. There is no example anywhere that exercises PendingInvocationEndEvent.PendingCallsRunner.Resume(...). For a feature this central to adoption, a small end-to-end example (deploy-job or approval-gate, resumed from a second goroutine/handler) would do more for usability than any amount of doc text.

3. "Resume" is heavily overloaded. Across the public surface the word means six different things: Await.Resume (a ResumeMode), Call.Resume/ToolCallInfo.Resume (a *ResumePayload), Runner.Resume (method), Resumption (runner input), and ResumePayload (re-entry input). Each is individually defensible, but a newcomer has to hold all six apart. At minimum I'd reconsider Await.Resume ResumeMode — naming it Await.ContinueAs or Await.Via would remove the worst collision (a field literally named Resume that is not a resume payload, sitting next to fields that are).

4. The typed story stops at the placeholder. Pending[T](v T) types the placeholder, but the actual resumed result arrives as Resumption.Output json.RawMessage — hand-marshaled and unchecked against T. Given how much the rest of the API leans on generics, a typed resume helper (e.g. runner.ResultOf(callID, outVal) that marshals) would close the loop and prevent shape drift between the placeholder and the real result.

5. Spec.Async must exactly equal the returned Await. validateAwaitAgainstSpec (registry.go:391) rejects any reason/resume mismatch. This is fine for the static case, but a tool that wants to pause for different reasons depending on input can't — it must pick one. That's a reasonable v1 constraint, but it's only discoverable at runtime as an ErrAwaitInvalid. Worth a one-line note in the AsyncSpec doc that the declared pair is a hard contract, not a hint.


Minor / nits

  • RequireInputResponse.Status (tool/builtin/require_input.go:40) is explicitly retained "for callers that grep for the old reconciler status string." In a PR whose theme is pre-1.0 cleanup, keeping a field alive for string-grepping consumers is the kind of debt this PR is otherwise removing. Consider dropping it now.
  • Empty-resume semanticsResumption{} with no Output/Error resolving successfully against LastOutput (resume.go:471-476) is a surprising-but-documented default. Fine; just make sure it's in the package doc, not only the struct comment.
  • executeReentry builds tool.InvocationInfo{SessionID: …} only (resume.go:459) — Turn/AgentName/InvocationID are empty on re-entry. Tools that log those will see blanks. Probably acceptable, but tools shouldn't assume those are populated; a doc note on InvocationInfo would help.
  • Must panics at init — correct call, but worth one line in Func's doc reminding users that a bad inferred schema (unsupported In type) turns into an init-time panic, since schema inference is now implicit.

Verdict

The API is in good shape and the typed tool.Func path is a clear win. The two things I'd genuinely gate on before freezing are (1) the manual info.Resume = nil footgun and (2) shipping at least one runnable async-resume example — both directly affect whether users land the feature correctly on the first try. The "Resume" overloading (#3) and typed-resume gap (#4) are worth a deliberate decision now since they're hard to change post-1.0.

I did not run the build/tests (review-only). Happy to draft the resume example or the TakeResume() helper if you want it implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant