You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
~81% of bucket objects are version-related; ~67% are empty (one R2 object per edit, metadata only, no body).
.da-versions at org root ({Org}/.da-versions/{FileID}/) is a single huge prefix: slow to list and doesn't scale.
Two concepts mixed: (1) real version snapshots (contentLength > 0, explicit "Save version" or Restore Point), (2) audit-only entries (empty objects created on every PUT for "Collab Parse" and similar). The latter explode object count without adding real versions.
Plan (condensed)
1. Labelled versions only as R2 objects
Remove "Collab Parse" version: stop creating the automatic first-save snapshot and empty version objects on every PUT. Only create version objects for explicit labelled version (Save version, future preview/publish) or Restore Point.
New path: {Org}/{Repo}/.da-versions/{FileID}/{VersionUUID}.{ext} — move under repo so listing is per-repo, not org-wide.
2. Single audit file per file (read-before-write dedupe)
Format: One line per entry (tab-separated): timestamp \t users \t path \t versionLabel \t versionId
path: stored without repo prefix (e.g. /surf-copy.html) so the file is readable.
versionLabel: human-readable name when entry is a labelled version (e.g. "v1", "Restore Point"); empty for edits.
versionId: snapshot id without extension when entry is a version (e.g. UUID); empty for edits.
Backward compat: 3-column (path only) and 4-column (path + versionId) lines are still parsed.
Write: On every versionable PUT, append or update audit.txt. Read-before-write with 30 min window: if last line is same user, within 30 min, and both last and new entries are edits (no version), overwrite that line with new timestamp; else append. Labelled version entries always append and are never replaced — they "interrupt" the dedup window (e.g. edit at 12:23, version at 12:25, edit at 12:40 → three entries). No empty version objects.
3. API behaviour during migration (progressive rollout)
Env: VERSIONS_AUDIT_FILE_ORGS — comma-separated org slugs: version list from audit.txt for those orgs; by default still mergeorg/.da-versions/{id}/ until skip-legacy is enabled.
Env: VERSIONS_AUDIT_SKIP_LEGACY_ORGS — for orgs also in VERSIONS_AUDIT_FILE_ORGS, stop reading org/.da-versions. Orgs not in VERSIONS_AUDIT_FILE_ORGS list onlyorg/.da-versions/{fileId}/ (no audit.txt, no repo/.da-versions/{fileId}/).
GET: Try new key first, then legacy key.
PUT/POST: New writes only to new structure (snapshots + audit.txt). No new writes under org/.da-versions.
4. Migration
Scripts (in scripts/): (1) Analyse — list version folders, count empty vs non-empty; (2) Migrate — copy snapshots to org/repo/.da-versions/fileId/, build audit.txt from empty-object metadata using the same 5-column format (path without repo, versionId without extension), same dedup rule (30 min window; version entries do not collapse), merge with any existing audit.txt in new path (hybrid case); (3) Validate — compare list/GET old vs new for a sample path.
Dual-read: Keep supporting both old and new paths until migration is complete; then remove legacy fallback.
5. Benefits
Far fewer objects: no per-edit empty version files; one audit.txt per file with collapsed entries.
Faster listing: .da-versions scoped per repo, not one giant org prefix.
Clear separation: real versions (snapshots) vs audit log (single file, deduped, human-readable labels in file).
Problem
.da-versionsat org root ({Org}/.da-versions/{FileID}/) is a single huge prefix: slow to list and doesn't scale.Plan (condensed)
1. Labelled versions only as R2 objects
{Org}/{Repo}/.da-versions/{FileID}/{VersionUUID}.{ext}— move under repo so listing is per-repo, not org-wide.2. Single audit file per file (read-before-write dedupe)
{Org}/{Repo}/.da-versions/{FileID}/audit.txttimestamp \t users \t path \t versionLabel \t versionId/surf-copy.html) so the file is readable.audit.txt. Read-before-write with 30 min window: if last line is same user, within 30 min, and both last and new entries are edits (no version), overwrite that line with new timestamp; else append. Labelled version entries always append and are never replaced — they "interrupt" the dedup window (e.g. edit at 12:23, version at 12:25, edit at 12:40 → three entries). No empty version objects.3. API behaviour during migration (progressive rollout)
VERSIONS_AUDIT_FILE_ORGS— comma-separated org slugs: version list from audit.txt for those orgs; by default still mergeorg/.da-versions/{id}/until skip-legacy is enabled.VERSIONS_AUDIT_SKIP_LEGACY_ORGS— for orgs also inVERSIONS_AUDIT_FILE_ORGS, stop readingorg/.da-versions. Orgs not inVERSIONS_AUDIT_FILE_ORGSlist onlyorg/.da-versions/{fileId}/(noaudit.txt, norepo/.da-versions/{fileId}/).audit.txt). No new writes underorg/.da-versions.4. Migration
scripts/): (1) Analyse — list version folders, count empty vs non-empty; (2) Migrate — copy snapshots toorg/repo/.da-versions/fileId/, buildaudit.txtfrom empty-object metadata using the same 5-column format (path without repo, versionId without extension), same dedup rule (30 min window; version entries do not collapse), merge with any existingaudit.txtin new path (hybrid case); (3) Validate — compare list/GET old vs new for a sample path.5. Benefits
audit.txtper file with collapsed entries..da-versionsscoped per repo, not one giant org prefix.