This application implements a robust backup and persistence system for ChromaDB that ensures data is preserved across Replit restarts and deployments. The system uses Replit's Object Storage for cloud persistence while implementing efficient disk space management for the local filesystem.
-
Automatic Backups
- Automatic backup after document upload if enough time has passed (1 hour interval)
- Backup rotation limiting historical backups to 24
- Persistence across Replit restarts and deployments
-
Disk Space Optimization
- Skip local backups during restore when disk space is constrained
- Cleanup utilities for both local backups and object storage history
- Efficient storage management to prevent filling quotas
-
Monitoring and Recovery
- Automatic recovery from cloud storage on startup
- Detailed logging of all backup/restore operations
- Separate utilities for manual backup and restore
The system automatically:
- Restores from object storage on application startup
- Backs up after document uploads (if interval passed)
- Rotates backups to prevent excessive storage usage
# Manually trigger backup to object storage
python utils/object_storage.py backup
# Restore from object storage (with local backup)
python utils/object_storage.py restore
# Restore without creating local backup (for disk-constrained environments)
python utils/object_storage.py restore --skip-backup# Clean up local backup directories (keep most recent)
python utils/clean_local_backups.py
# Keep more than 1 recent backup
python utils/clean_local_backups.py --keep 3
# Legacy method (not recommended)
python utils/delete_backup_history.py --force
# NEW RECOMMENDED METHOD: Clean up history with more control
python utils/simple_cleanup.py --dry-run # Preview what would be deleted
python utils/simple_cleanup.py --force # Delete old backups, keep 5 most recent
python utils/simple_cleanup.py --force --keep 10 # Keep more recent backups
python utils/simple_cleanup.py --force --limit 50 # Delete in small batches
# Monitor cleanup progress and storage usage
python utils/monitor_cleanup.py- Local ChromaDB:
/home/runner/data/chromadb/ - Local Backups:
/home/runner/data/chromadb_local_backup_* - Object Storage:
chromadb/prefix in Replit Object Storage - History Backups:
chromadb/history/YYYYMMDD_HHMMSS/in Object Storage
The system maintains up to 24 historical backups in object storage and automatically cleans up older backups during the backup process.
Each ChromaDB backup is approximately 152MB. The system uses strategies to manage disk space:
- Skip creating local backups during restore operations with
--skip-backup - Clean local backups with
clean_local_backups.py - Limit historical backups to 24 through automatic rotation
-
Disk Quota Issues:
- Run
python utils/clean_local_backups.pyto remove old local backups - Use
--skip-backupwhen restoring to avoid creating additional local backups
- Run
-
Data Not Persisting Across Restarts:
- Check if Object Storage is enabled for your Replit
- Verify
main.pyinitializesVectorStorewhich triggers the restore
-
Excessive Storage Usage:
- Run
python utils/simple_cleanup.py --forceto clear old backups (recommended) - For more control, use options like
--keep-recentand--limit - Check if backup rotation is working properly in the logs
- Run
utils/object_storage.py: Primary backup/restore functionalityutils/clean_local_backups.py: Local backup cleanup utilityutils/simple_cleanup.py: Modern storage history cleanup (recommended)utils/monitor_cleanup.py: Track cleanup progress and storage usageutils/STORAGE_CLEANUP_README.md: Detailed cleanup documentationservices/vector_store.py: Integrates backup system with ChromaDB