Skip to content

Shared-hardware (GPU) coordination protocol between agents over A2A #893

@jaylfc

Description

@jaylfc

Need (Jay 2026-06-14)

@taos and @taOSmd share hardware (Fedora RTX 3060 on linstation, 12GB GPU). They must coordinate over A2A: post when claiming/using, post when free again, CHECK before use. Found when @taos FLUX image-gen OOMd because @taOSmd had ~9.4GB of Ollama models loaded on the same GPU despite an earlier free signal.

Interim protocol (agents follow now via the A2A integration channel)

GPU LEASE for a shared node:

  1. CHECK before loading: scan the channel for an open [GPU CLAIM] on that node AND run nvidia-smi. Load only if free + unclaimed.
  2. CLAIM: [GPU CLAIM] node=<host> holder=@you vram=~Xgb reason=... eta=... before loading.
  3. RELEASE: [GPU RELEASE] node=<host> holder=@you immediately when done.
  4. REQUEST when blocked: [GPU REQUEST] node=<host> need=~Xgb; holder releases or negotiates a split/window.
  5. Courtesy: do not hold the GPU idle; use keep-alive auto-unload.

Future (productize)

Fold into the cluster scheduler/ClusterManager as a real resource lease: per-device VRAM accounting, a lease registry, admission control (a backend wont load if it cannot fit alongside current leases), auto-eviction by keep-alive. The A2A text protocol is the stop-gap. Relates to #890/#892 (worker contract) and dynamic resource scheduling.

Acceptance

Both agents check + claim + release over A2A and never silently co-load past the GPU VRAM; then a scheduler-enforced version automates it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions