Skip to content

MCP: add agent-safe workflow tools and failure triage #158

@AndPuQing

Description

@AndPuQing

Summary

Make gflow's MCP surface safer and more useful for coding agents and experiment assistants.

Motivation

gflow already exposes MCP tools, which is a meaningful differentiator from traditional schedulers. The next step is to design agent-safe workflows so agents can inspect, explain, and propose scheduler actions without accidentally mutating shared infrastructure.

Proposed scope

  • Add dry-run/preview-oriented MCP tools for job submission and updates.
  • Provide compact failure triage tools: recent log excerpt, exit status, runtime, GPU assignment, retry hints.
  • Expose queue pressure and GPU availability summaries suitable for agent planning.
  • Mark destructive operations clearly and require caller-side confirmation conventions in docs.
  • Add examples for Codex, Claude Code, Cursor, and OpenCode that prefer read-before-write behavior.
  • Consider structured tool responses optimized for LLM consumption rather than raw CLI text.

Acceptance criteria

  • An agent can answer why a job is queued or failed without manually shelling through several commands.
  • Agents can preview intended submissions before creating jobs.
  • Destructive operations are documented and semantically separated from read-only tools.
  • Existing CLI behavior remains the source of truth; MCP wraps scheduler semantics rather than inventing a separate workflow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiREST API and serverclientClient librarypriority: mediumMedium priority issuetype: featureNew feature or enhancement request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions