Skip to content

feat(storage): blob store with orphaned artifact cleanup (#3911)#3920

Open
jiangyj545 wants to merge 1 commit into
orchestration-agent:mainfrom
jiangyj545:fix/orphaned-blobs-cleanup-3911
Open

feat(storage): blob store with orphaned artifact cleanup (#3911)#3920
jiangyj545 wants to merge 1 commit into
orchestration-agent:mainfrom
jiangyj545:fix/orphaned-blobs-cleanup-3911

Conversation

@jiangyj545
Copy link
Copy Markdown

Summary

Implements a storage subsystem for tracking task artifacts with automatic cleanup of orphaned blobs left behind by failed tasks.

Problem

When a task fails after uploading intermediate artifacts but before writing final task state, those blobs become orphaned. The existing cleanup only follows completed task records, so:

  • Storage costs grow over time
  • Stale outputs remain beyond intended retention
  • No mechanism exists to detect or remove unreferenced artifacts

Solution

BlobStore — File-based blob storage

  • SHA-256 based content-addressable storage (automatic dedup)
  • Tags each blob with task_id, run_id, content_type
  • finalize() marks blobs as referenced by completed tasks
  • Thread-safe with metadata persistence to disk

OrphanedBlobCleanup — Background sweep worker

  • Scans all tracked blobs for orphans (non-finalized + past grace period)
  • Deletes unreferenced artifacts, preserves finalized ones
  • Runs as a daemon thread with configurable sweep interval
  • Returns detailed report: orphan count, bytes freed, referenced count

Acceptance Criteria

Criteria Status
Failed-task intermediates cleaned after grace period ✅ via OrphanedBlobCleanup.sweep()
Cleanup reports orphan counts and deleted bytes ✅ report dict + format_report()
Referenced final artifacts never deleted finalized check in sweep

Test Results

8 passed, 2 skipped in 0.02s
  • 5 BlobStore tests (put/get, dedup, finalize, delete, persistence)
  • 3 OrphanedBlobCleanup tests (no false positives, grace period, format)
  • 2 skipped (timing-sensitive lock contention scenarios)

Fixes #3911

…ration-agent#3911)

Implements storage subsystem for tracking task artifacts with automatic
cleanup of orphaned blobs from failed tasks.

Components:
- BlobStore: File-based blob storage with SHA-256 deduplication,
  task/run metadata tagging, and finalize/delete operations
- OrphanedBlobCleanup: Background sweep worker that identifies and
  removes unreferenced intermediate artifacts after a grace period
- Sweep report includes orphan count, bytes freed, referenced blob count
- Thread-safe with locking for concurrent access

Acceptance criteria:
- Failed-task intermediate artifacts cleaned after grace period
- Cleanup reports orphan counts and deleted bytes
- Referenced final artifacts are never deleted by orphan sweep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ Bounty $4k ] [ Storage ] Clean orphaned blobs after failed tasks — artifact retention

1 participant