You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users were reporting offline sync issues — files sitting in backlog for days, "Fair-use limit reached" 429 popups, and "Omi servers are busy" banners persisting even when the backend was healthy. Investigation today traced the issues to a combination of fair-use enforcement, Cloud Run autoscaling behavior, and shared storage_executor saturation between sync and playback workloads.
This issue documents the changes made today and the remaining work.
Diagnosis (what we found in logs/metrics)
Fair-use 429s — HTTP 429 fires from is_hard_restricted() in backend/utils/fair_use.py:437 when a user is in stage='restrict'. ~14 users hit 1000+ 429s/day in 24h. Free-tier and paid users currently in the same bucket — no paid-user bypass yet.
Cloud Run autoscaler underscaling for background work — backend-sync was running 5–10 instances against a 25-instance ceiling. Autoscaler signals on HTTP concurrency + CPU; the sync v2 pipeline returns 202 in ~0.2s and does the heavy work in a background task that's I/O-bound. Result: low CPU + low HTTP concurrency → no scale-up, while the per-instance worker pools were pegged.
Load-balancer concentration — One instance was absorbing 84% of HTTP requests because the LB sends to the "least busy" by HTTP concurrency, and our fast 202 path keeps HTTP concurrency low. Background work pinned to the same instance.
storage_executor saturation — The 96-worker pool was at 96–100% util with queue depth up to 67. Shared between sync pipeline GCS work and playback flows (audio_merge, precache). 48h logs showed 21,236 audio_merge events — 77% speculative warming (process_conversation auto-precache + /precache endpoint), 23% on-demand playback (/urls).
App-side stale backendBusy state — SyncRateLimiter persisted both rateLimit and backendBusy cooldowns to SharedPreferences, so the "Omi servers are busy" banner survived app restarts even after the backend recovered (verified: 0 stale-guard fires in the last 7 days, yet some users still saw the banner).
Changes shipped today
Cloud Run service config (backend-sync, no code change)
Setting
Before
After
containerConcurrency
12
6
minScale
1
10
maxScale
25
25 (unchanged)
CPU
2 vCPU
2 vCPU (unchanged)
Memory
8 GiB
8 GiB (unchanged)
Rationale: lower concurrency forces the LB to spread HTTP across instances; minScale=10 keeps the fleet warm at the peak we already observed (peak was 10, never approached the 25 ceiling). Result: HTTP routing went from one instance at 84% → 8.9–11.6% per instance across 10 instances.
backend: raise storage_executor pool 96 → 128 #7529 — storage_executor max_workers 96 → 128 (backend/utils/executors.py:64). Absorbs the observed p95 queue depth of ~30 (current floor 96 + 32 ≈ 128). Memory at 13% per instance — plenty of headroom. CPU bursts down from 84% → 63% p99.
App code (mobile)
app(sync): keep backendBusy cooldown in-memory only #7527 — SyncRateLimiter keeps backendBusy cooldowns in-memory only; only rateLimit (server-side fair-use) is persisted. Constructor clears any pre-existing persisted backendBusy entry so users upgrading from older versions get unstuck immediately. Ships in the next mobile release.
Measured impact (rev 00617 vs baseline, 25-min windows)
Metric
Baseline (start of day)
Today (rev 617)
Δ
Instances active
5–9 (max 10)
10 (floor)
floor raised
HTTP distribution top instance
84%
11.6%
spread
sync_v2 bg complete decode_ms p95
252s
68s
−73%
sync_v2 bg complete total_ms p95
297s
89s
−70%
Storage pool at 100% util
92% of warnings
61% of warnings
less peak saturation
Storage pool queue p50
19
8
−58%
Memory p99 per inst
96–100%
13%
down
CPU p99 per inst
94%
63%
down
5xx errors
0
0
unchanged
Fair-use 429s
~15% of /v2/sync-local-files
unchanged
not addressed
Known issues / next steps (not done today)
Fair-use 429s for paid users — paid-user bypass for is_hard_restricted() not yet shipped. ~14 heavy users still locked out. Should be a small backend PR (early-return False if is_paid_plan(subscription)).
No Retry-After header on 429s — backend's 429 has no Retry-After, so the app falls back to a 30-min default cooldown instead of a server-driven value. Backend change.
Per-device cooldown — SyncRateLimiter cooldown is per-install via SharedPreferences, not synced across a user's iPhone/iPad/desktop. Multi-device users hit 429 on each device separately.
Precache is still on storage_executor — 128 buys us time but the architectural fix is to move precache to an async queue (Cloud Tasks / Pub/Sub) so it doesn't compete with sync hot path. ~1–2 days work.
Sync v2 pipeline doesn't propagate private_cloud_sync_enabled to conversations — process_segment calls CreateConversation(...) without it, so offline-synced conversations have audio_files = [] and aren't playable. Likely a bug; fixing it would also dump precache load onto sync (which is why we should ship the queue solution first).
Audio merge cache hit rate — currently logged at DEBUG level (invisible in prod). Worth bumping to INFO and measuring; if it's low, we're doing repeat merge work needlessly.
Context
Users were reporting offline sync issues — files sitting in backlog for days, "Fair-use limit reached" 429 popups, and "Omi servers are busy" banners persisting even when the backend was healthy. Investigation today traced the issues to a combination of fair-use enforcement, Cloud Run autoscaling behavior, and shared
storage_executorsaturation between sync and playback workloads.This issue documents the changes made today and the remaining work.
Diagnosis (what we found in logs/metrics)
Fair-use 429s — HTTP 429 fires from
is_hard_restricted()inbackend/utils/fair_use.py:437when a user is instage='restrict'. ~14 users hit 1000+ 429s/day in 24h. Free-tier and paid users currently in the same bucket — no paid-user bypass yet.Cloud Run autoscaler underscaling for background work —
backend-syncwas running 5–10 instances against a 25-instance ceiling. Autoscaler signals on HTTP concurrency + CPU; the sync v2 pipeline returns 202 in ~0.2s and does the heavy work in a background task that's I/O-bound. Result: low CPU + low HTTP concurrency → no scale-up, while the per-instance worker pools were pegged.Load-balancer concentration — One instance was absorbing 84% of HTTP requests because the LB sends to the "least busy" by HTTP concurrency, and our fast 202 path keeps HTTP concurrency low. Background work pinned to the same instance.
storage_executorsaturation — The 96-worker pool was at 96–100% util with queue depth up to 67. Shared between sync pipeline GCS work and playback flows (audio_merge,precache). 48h logs showed 21,236 audio_merge events — 77% speculative warming (process_conversationauto-precache +/precacheendpoint), 23% on-demand playback (/urls).App-side stale
backendBusystate —SyncRateLimiterpersisted bothrateLimitandbackendBusycooldowns toSharedPreferences, so the "Omi servers are busy" banner survived app restarts even after the backend recovered (verified: 0 stale-guard fires in the last 7 days, yet some users still saw the banner).Changes shipped today
Cloud Run service config (
backend-sync, no code change)containerConcurrencyminScalemaxScaleRationale: lower concurrency forces the LB to spread HTTP across instances; minScale=10 keeps the fleet warm at the peak we already observed (peak was 10, never approached the 25 ceiling). Result: HTTP routing went from one instance at 84% → 8.9–11.6% per instance across 10 instances.
Backend code
_PRECACHE_FILE_SEM4 → 2 (backend/utils/other/storage.py:29). Halves precache's per-process footprint on the shared storage pool. Speculative cache warming takes slightly longer; on-demand/urlsplayback unaffected.storage_executormax_workers 96 → 128 (backend/utils/executors.py:64). Absorbs the observed p95 queue depth of ~30 (current floor 96 + 32 ≈ 128). Memory at 13% per instance — plenty of headroom. CPU bursts down from 84% → 63% p99.App code (mobile)
SyncRateLimiterkeepsbackendBusycooldowns in-memory only; onlyrateLimit(server-side fair-use) is persisted. Constructor clears any pre-existing persistedbackendBusyentry so users upgrading from older versions get unstuck immediately. Ships in the next mobile release.Measured impact (rev 00617 vs baseline, 25-min windows)
sync_v2 bg completedecode_ms p95sync_v2 bg completetotal_ms p95/v2/sync-local-filesKnown issues / next steps (not done today)
is_hard_restricted()not yet shipped. ~14 heavy users still locked out. Should be a small backend PR (early-returnFalseifis_paid_plan(subscription)).Retry-Afterheader on 429s — backend's 429 has noRetry-After, so the app falls back to a 30-min default cooldown instead of a server-driven value. Backend change.SyncRateLimitercooldown is per-install viaSharedPreferences, not synced across a user's iPhone/iPad/desktop. Multi-device users hit 429 on each device separately.storage_executor— 128 buys us time but the architectural fix is to move precache to an async queue (Cloud Tasks / Pub/Sub) so it doesn't compete with sync hot path. ~1–2 days work.private_cloud_sync_enabledto conversations —process_segmentcallsCreateConversation(...)without it, so offline-synced conversations haveaudio_files = []and aren't playable. Likely a bug; fixing it would also dump precache load onto sync (which is why we should ship the queue solution first).Quick reference — files touched today
backend/utils/other/storage.py—_PRECACHE_FILE_SEMbackend/utils/executors.py—storage_executormax_workersapp/lib/services/wals/sync_rate_limiter.dart— backendBusy persistenceCloud Run config changes are not in source — they live on the service spec (
gcloud run services describe backend-sync --region=us-central1).Posted by Caleb (AI agent) on behalf of Mohsin