Skip to content

Epic: Configurable backup & restore for agent data volumes #1208

@vybe

Description

@vybe

Summary

Trinity has no backup for agent data volumes. All agent-generated binary/large-file data lives on per-agent named Docker volumes — agent-{name}-workspace (mounted at /home/developer), plus agent-{name}-public (FILES-001) and agent-{name}-shared (shared folders) — and the gitignored content/ directory inside the workspace volume. Git sync covers source code only; binaries are deliberately excluded (src/backend/services/git_service.py) and there is no Git LFS. The only backup tool, scripts/deploy/backup-database.sh, copies trinity.db and nothing else.

Net effect: host disk loss = total, unrecoverable loss of every agent's accumulated data. This epic develops the strategy for — and delivers — a configurable backup & restore capability for agent volumes, shipped as an entitled enterprise feature.

Context

Surfaced while reviewing distributed/binary storage for agents. The data itself is durable against container restart and recreation (the volume is re-attached to the new container) and even survives reset-to-main-preserve-state (the recovery path runs git reset --hard origin/main with no git clean, so gitignored files are left on disk). The gap is not ephemerality — it is the complete absence of any backup / snapshot / restore / portability story. The sharpest risk is single-host disk failure.

Positioned under the Enterprise epic because operators running fleets need configurable, governed backup: OSS ships the minimal primitives; enterprise adds policy, scheduling, retention, off-host destinations, and restore UX.

Design must respect sovereign infrastructure (Principle #6, docs/planning/TARGET_ARCHITECTURE.md): work on a single commodity server by default (local-disk target), with off-host / object-store destinations as opt-in configuration — never a hard cloud dependency.

Strategy questions to resolve (Phase 0)

Acceptance Criteria (epic — tracked via child issues)

  • Phase 0 strategy doc landed in docs/planning/ covering mechanism, destinations, consistency, retention, restore, and the Agent runtime data volumes: declared data paths with snapshot/restore and portable export #1169 relationship.
  • Configurable backup policy (schedule, retention, included volumes/paths, destination) — persisted and admin-editable.
  • At least one off-host destination (object-store / S3-compatible), with local disk as default and no mandatory cloud dependency.
  • Backups encrypted at rest (AES-256-GCM via src/backend/services/credential_encryption.py).
  • Consistent backup of a running agent (documented quiesce or crash-consistency guarantee).
  • Restore path for per-agent and full-fleet, validated by a round-trip test (backup → wipe → restore → data intact).
  • Shipped behind the enterprise entitlement seam (Spike: enterprise edition architecture — private module strategy for compliance features (SSO, SCIM, SIEM) #847): entitlement_service.register_module() + requires_entitlement() gating; OSS-only builds degrade gracefully.
  • Observability: backup success/failure surfaced (operator queue / monitoring); last-backup timestamp per agent.

Technical Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    complexity-highComplexity: high (board points 13)priority-p2Importantstatus-incubatingIdea under consideration — pre-Todo, not yet greenlit for developmenttheme-infrastructureTheme: Infrastructuretype-epicParent epic issue (groups child sub-issues)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions