Skip to content

Universal control plane: live queue + full-trace observability/audit that every interaction flows through #896

@jaylfc

Description

@jaylfc

Vision (Jay 2026-06-14)

One control plane where you can SEE and CONTROL everything taOS does. Every human and agent interaction goes through it -- chat turns, agent runs, tool calls, image gen, memory ops, model requests. It is the observability + audit + governance spine, with live queue control on top.

Must-haves

  • Live queue view: see the scheduler queue in real time (what is queued / running / draining, depth, per-resource).
  • On-the-fly control: redirect a request to a different resource/backend mid-flight; throttle the queue (insert a delay between tasks); pause/resume; cancel.
  • Full trace per task: the messages, logs, tool calls, WHICH USER requested it, what model + provider, timing, and the agent's reasoning/decision context (not just prompts/tokens/latency).
  • Modern, organised UI: clean live list; double-click a task for the full drill-down (timeline, tool-call tree, inputs/outputs, model/provider/user, cost, errors).

Build on what exists (do NOT rebuild)

  • tinyagentos/scheduler/history_store.py -- the queue's dispatch history (already feeds the Activity app).
  • tinyagentos/scheduler/scheduler.py -- the dispatcher (Phase 1; queue control = the unbuilt Phase 2, see Scheduler Phase 2: VRAM-accounted admission + queue + eviction (the GPU arbiter both agents use) #894).
  • tinyagentos/trace_store.py -- trace storage.
  • tinyagentos/otel/receiver.py + emitter.py + judge.py -- the OTLP push/receive layer the agent-audit design specified.
  • desktop/src/apps/ActivityApp.tsx -- the existing Activity UI to evolve into this control plane.

The mandate (universal coverage)

A single instrumentation point every interaction passes through, so there is no blind path: chat (taos_agent), agent runs (deployed frameworks via the adapters), tool calls (skill_exec), image gen (scheduler -> backends), memory ops, model/provider requests (LiteLLM). This universal coverage is what unlocks the audit value the agent community wants: catch 'silent success' (agent reasons wrongly but the run looks green), answer 'why did the agent do that?', per-user attribution, PII redaction, audit trails that stay inside the self-hosted perimeter.

Control surface (new, Phase 2)

Tie-ins

Process

Umbrella + vision capture. Brainstorm -> spec with @taOSmd -> phased build (read-only live view first, then drill-down, then control surface). Do not block the storybook demo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions