MCP: add agent-safe workflow tools and failure triage

## Summary

Make gflow's MCP surface safer and more useful for coding agents and experiment assistants.

## Motivation

gflow already exposes MCP tools, which is a meaningful differentiator from traditional schedulers. The next step is to design agent-safe workflows so agents can inspect, explain, and propose scheduler actions without accidentally mutating shared infrastructure.

## Proposed scope

- Add dry-run/preview-oriented MCP tools for job submission and updates.
- Provide compact failure triage tools: recent log excerpt, exit status, runtime, GPU assignment, retry hints.
- Expose queue pressure and GPU availability summaries suitable for agent planning.
- Mark destructive operations clearly and require caller-side confirmation conventions in docs.
- Add examples for Codex, Claude Code, Cursor, and OpenCode that prefer read-before-write behavior.
- Consider structured tool responses optimized for LLM consumption rather than raw CLI text.

## Acceptance criteria

- An agent can answer why a job is queued or failed without manually shelling through several commands.
- Agents can preview intended submissions before creating jobs.
- Destructive operations are documented and semantically separated from read-only tools.
- Existing CLI behavior remains the source of truth; MCP wraps scheduler semantics rather than inventing a separate workflow.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MCP: add agent-safe workflow tools and failure triage #158

Summary

Motivation

Proposed scope

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

MCP: add agent-safe workflow tools and failure triage #158

Description

Summary

Motivation

Proposed scope

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions