Problem Statement
Google-connected users can silently stop receiving Google-to-Compass updates when Google Watches expire, go missing, become stale, or are only partially registered. This is especially visible in local development and self-hosted setups because those environments may not run the scheduled maintenance that staging and production rely on. Even in higher environments, a missed maintenance run, temporary outage, or very short watch expiration window can leave Compass unable to receive public watch notifications until someone manually repairs the sync state.
From the user's perspective, Google Calendar appears connected, but new Google-side changes stop appearing in Compass. The user should not need to understand watch expiration, Cloudflare tunnels, cron, or maintenance endpoints to keep Google sync working.
Solution
Add self-healing Google Watch repair. Compass should detect when a Google-connected user's watch state is broken, recreate the necessary Google Watches, and catch up missed Google-side changes. The same repair behavior should be shared by scheduled maintenance and the user sync health/start path so dev, self-host, staging, and production recover through one consistent mechanism.
The intended behavior is: check regularly, repair rarely. Healthy checks should mostly read Compass-owned state and should not call Google. Google requests should happen only when watches are expiring soon, expired, missing, stale, incomplete, or otherwise not trustworthy.
When usable sync tokens exist, watch repair should recreate watches and then run incremental import to catch up missed Google-side changes. When sync tokens are missing or invalid, Compass should fall back to the existing full Google sync repair path. If Google access has been revoked, Compass should keep the existing revocation behavior: prune Google-origin data and notify the user.
User Stories
- As a Google-connected user, I want Compass to keep receiving Google Calendar updates, so that my Compass calendar stays current without manual repair.
- As a Google-connected user, I want expired Google Watches to be recreated automatically, so that sync does not silently stop after a watch expiration.
- As a Google-connected user, I want missing Google Watches to be recreated automatically, so that partial watch setup does not leave some calendars out of sync.
- As a Google-connected user, I want stale Google Watch records to be cleaned up safely, so that Compass does not trust dead webhook state.
- As a Google-connected user, I want Compass to catch up changes that happened while notifications were unavailable, so that no Google-side edits are lost during a watch outage.
- As a Google-connected user, I want repair to fall back to full Google sync repair when incremental sync tokens cannot be used, so that broken sync metadata can still recover.
- As a Google-connected user, I want Google access revocation to be handled clearly, so that revoked access does not look like a temporary watch repair problem.
- As a local developer, I want opening Compass to self-heal broken Google Watches, so that I do not have to manually call maintenance endpoints after a short watch expiration.
- As a self-hosted operator, I want Compass to recover when scheduled maintenance is not configured yet, so that Google sync is less fragile during setup.
- As a production operator, I want scheduled maintenance to repair broken watch state, so that missed or failed refresh windows do not require manual intervention.
- As a production operator, I want health/start checks to be throttled, so that browser refreshes or multiple tabs do not repeatedly call Google.
- As a production operator, I want watch repair to be idempotent, so that running the same repair path twice does not create duplicate active watches.
- As a production operator, I want repair logs to distinguish healthy checks, watch refresh, watch repair, incremental catch-up, full repair, and revoked access, so that sync incidents are easier to diagnose.
- As a maintainer, I want scheduled maintenance and user-triggered health/start repair to share one implementation, so that behavior stays consistent across environments.
- As a maintainer, I want watch state inspection to be testable independently, so that edge cases around healthy, expiring, expired, missing, and incomplete watches are easy to verify.
- As a maintainer, I want watch repair decisions to use the existing domain language, so that future contributors understand the difference between refresh, repair, import, and public watch notifications.
- As a maintainer, I want Google API calls to happen only when repair or refresh is actually needed, so that watch health checks do not create unnecessary cost or quota pressure.
- As a maintainer, I want documentation to explain the self-healing behavior and remaining self-hosting requirements, so that local and self-hosted Google sync expectations stay accurate.
Implementation Decisions
- Use the existing domain term Repair for user-facing recovery behavior.
- Use Google Watch repair as the narrower term for recreating missing, expired, stale, or incomplete Google Watches.
- Treat refresh and repair as different operations: refresh renews a still-valid watch before expiration; repair rebuilds watch state that is already broken or incomplete.
- Introduce or extract a watch state inspection module with a small interface that can answer whether a user's Google Watches are healthy, expiring soon, expired, missing, incomplete, or not applicable.
- Introduce or extract a shared Google Watch repair coordinator that owns the end-to-end repair flow: inspect state, respect lock/cooldown, clean stale records, recreate expected watches, and trigger catch-up sync.
- Keep scheduled maintenance as the primary proactive caller for watch refresh and repair.
- Add the user sync health/start path as a defensive caller so local dev, self-host, and missed-maintenance cases can recover when an active Google-connected user opens or uses Compass.
- Recreate both the calendarlist watch and all expected event watches when repair determines the user's watch set is incomplete or expired.
- After recreating watches, run incremental import when existing sync tokens are usable so Compass catches up Google-side changes missed while notifications were unavailable.
- Fall back to the existing full Google sync repair path when sync tokens are missing, invalid, or otherwise cannot support incremental catch-up.
- Preserve existing Google access revoked behavior rather than treating revoked credentials as a repairable watch problem.
- Add per-user repair locking to avoid concurrent repair attempts within a running backend process.
- Add a short repair cooldown persisted in Compass-owned state so repeated health checks, browser refreshes, multiple tabs, or backend restarts do not spam Google.
- Keep healthy checks cheap: they should read Compass state and should not call Google when watches are already healthy.
- Keep watch creation idempotent: a repeated repair attempt must not create duplicate active watches.
- Update Google sync health so healthy means expected sync tokens exist and expected active watches exist when public watch notifications are configured.
- Update self-hosting and Google sync docs to explain that Compass can self-heal broken watch state, while public HTTPS webhook configuration is still required for continuous Google-to-Compass notifications.
- Consider increasing the normal watch expiration default away from the short development-friendly value, or documenting clearly that short expirations are intended only for development and testing.
Testing Decisions
- Tests should focus on externally visible behavior: which watches are considered healthy, which repair actions are taken, whether Google calls are avoided when no repair is needed, and whether missed changes are caught up after repair.
- Watch state inspection should have focused tests for healthy watches, expiring watches, expired watches, missing calendarlist watches, missing event watches, incomplete watch sets, missing sync tokens, and non-Google-connected users.
- The shared repair coordinator should have tests for recreating watches, avoiding duplicate active watches, cleaning stale records, respecting lock/cooldown, running incremental catch-up, falling back to full repair, and handling revoked Google access.
- Scheduled maintenance tests should verify that expiring valid watches still refresh, while expired or missing watches for active Google-connected users repair instead of only pruning.
- Sync health/start tests should verify that broken watch state can trigger background repair, while repeated checks inside the cooldown do not repeatedly call Google.
- Existing Google sync service tests, watch maintenance tests, sync controller tests, and user metadata/health tests are the closest prior art for this behavior.
- Add tests around logging/result summaries where practical, especially to make sure repair, refresh, ignored healthy checks, and revoked access are distinguishable.
- Use mocked Google Calendar clients for unit/integration tests; live Google webhook verification can remain an acceptance/manual runbook concern.
Out of Scope
- Building a new Google Calendar connection UI.
- Adding calendar selection controls.
- Changing the public watch notification endpoint contract.
- Replacing the existing Google import or event propagation model.
- Guaranteeing continuous Google sync without a public HTTPS webhook URL.
- Replacing staging/production cron or Cloud Function infrastructure.
- Solving all Google API quota management beyond avoiding unnecessary repair spam.
- Changing how Google access revocation prunes Google-origin data.
Further Notes
This PRD came from debugging a local Google-to-Compass notification outage where the Cloudflare tunnel and notification endpoint were healthy, but local Google Watches had expired after the short configured expiration window. The current maintenance behavior can refresh watches before they expire, but once watches are expired it prunes them instead of recreating them. The desired behavior is for expired or missing watch state on an active Google-connected user to be repairable without manual intervention.
The key design principle is: check regularly, repair rarely. Compass should be able to inspect watch health often, but Google API calls should only happen when watch state actually needs refresh or repair.
Problem Statement
Google-connected users can silently stop receiving Google-to-Compass updates when Google Watches expire, go missing, become stale, or are only partially registered. This is especially visible in local development and self-hosted setups because those environments may not run the scheduled maintenance that staging and production rely on. Even in higher environments, a missed maintenance run, temporary outage, or very short watch expiration window can leave Compass unable to receive public watch notifications until someone manually repairs the sync state.
From the user's perspective, Google Calendar appears connected, but new Google-side changes stop appearing in Compass. The user should not need to understand watch expiration, Cloudflare tunnels, cron, or maintenance endpoints to keep Google sync working.
Solution
Add self-healing Google Watch repair. Compass should detect when a Google-connected user's watch state is broken, recreate the necessary Google Watches, and catch up missed Google-side changes. The same repair behavior should be shared by scheduled maintenance and the user sync health/start path so dev, self-host, staging, and production recover through one consistent mechanism.
The intended behavior is: check regularly, repair rarely. Healthy checks should mostly read Compass-owned state and should not call Google. Google requests should happen only when watches are expiring soon, expired, missing, stale, incomplete, or otherwise not trustworthy.
When usable sync tokens exist, watch repair should recreate watches and then run incremental import to catch up missed Google-side changes. When sync tokens are missing or invalid, Compass should fall back to the existing full Google sync repair path. If Google access has been revoked, Compass should keep the existing revocation behavior: prune Google-origin data and notify the user.
User Stories
Implementation Decisions
Testing Decisions
Out of Scope
Further Notes
This PRD came from debugging a local Google-to-Compass notification outage where the Cloudflare tunnel and notification endpoint were healthy, but local Google Watches had expired after the short configured expiration window. The current maintenance behavior can refresh watches before they expire, but once watches are expired it prunes them instead of recreating them. The desired behavior is for expired or missing watch state on an active Google-connected user to be repairable without manual intervention.
The key design principle is: check regularly, repair rarely. Compass should be able to inspect watch health often, but Google API calls should only happen when watch state actually needs refresh or repair.