Add Google Calendar sync support#418
Conversation
The c885bdb message_type work covered store/api.go and the vector backends, but internal/query/{sqlite,duckdb}.go were untouched — buildSearchQueryParts and the DuckDB Search fallback dropped q.MessageTypes, and MergeFilterIntoQuery never mapped MessageFilter.MessageType. So `msgvault search --mode=fts` (which routes through internal/query) silently ignored message_type scoping for every non-email type (sms, whatsapp, and soon calendar_event). Add the m.message_type IN (...) clause to both engines and carry MessageFilter.MessageType into MergeFilterIntoQuery, mirroring the store/api.go clause. dbtest.MessageOpts gains an optional MessageType. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
Two store-code additions calendar sync depends on, with no schema change: - SetMessageMetadata(id, sql.NullString): writes the messages.metadata column (JSON/JSONB), which the hot upsertMessageSQL path never touches. Non-email importers that carry structured per-message metadata (calendar events: end/all_day/status/recurrence) call it right after UpsertMessage. Uses dialect.JSONBindExpr() for the PG ?::JSONB cast; an invalid NullString clears the column. - GetSourcesByTypeAndAccount(type, accountEmail): enumerates the sources of one OAuth account by filtering source_type + sync_config.account_email in Go (dialect-portable). Config-driven sources decouple the per-source identifier (natural calendar key) from the token key (account_email). Round-trip + clear + account-scoping tests; dual-dialect (SQLite + PG). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
The Gmail token bucket hardcodes capacity=250; reusing it for the Calendar API (600 req/min/user) would permit a ~25x burst. Add a capacity-explicit constructor so the Calendar client builds a correctly sized bucket (capacity=10, refill=8 tok/s ≈ 80% of the 10 req/s/user ceiling). Calendar operations (calendarList.list, events.list, events.get) join the shared Operation enum at cost 1 so internal/gcal can reuse this limiter and its adaptive Throttle/RecoverRate backoff. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
New internal/gcal package mirroring internal/gmail: a hand-rolled net/http client over the Calendar API, an API interface (CalendarReader + EventReader) with an in-memory MockAPI, and unexported wire types mapped to exported domain types (Calendar, Event, Attendee, Person, EventDateTime, EventsPage). - Reuses internal/gmail's token-bucket RateLimiter via the shared Operation enum; the default limiter is sized for the Calendar per-user budget (capacity=10, refill=8 tok/s). - request() retries network/429/quota-403/5xx with full-jitter backoff; surfaces *GoneError on 410 (stale syncToken) and *NotFoundError on 404; permission-403/401/other-4xx are terminal. - ListEvents drops timeMin/timeMax when a syncToken is set (API requires this), and forwards singleEvents/showDeleted. NextSyncToken arrives only on the final page. - MockAPI owns pagination so tests describe pages as plain event slices; supports incremental deltas, 410 injection, and call counting. httptest client tests cover pagination, sync-token-on-final-page, 410/404, retry vs terminal 403, incremental param exclusion, and ctx cancellation. Mapping covers timed/all-day/recurring/cancelled events. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
- oauth.ScopeCalendarReadonly / ScopesCalendar / ScopesGmailCalendar. Re-consent REPLACES scopes (ApprovalForce, no include_granted_scopes), so the bundle must carry Gmail + Calendar together to avoid silently dropping Gmail access for an existing account. - Extract a generic promptScopeEscalation(requiredScopes, headline, bodyLines, cancelHint) from the deletion-hardwired version; keep a thin promptDeletionScopeEscalation wrapper. Both existing deletion call sites migrate with no behavior change. Calendar opt-in reuses the generic helper with ScopesGmailCalendar. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
New internal/calsync mirrors internal/sync for Calendar. Events persist as messages rows (message_type=calendar_event) through the canonical write path plus SetMessageMetadata; calendars are sources keyed on a natural per-calendar identifier (accountEmail/calendarId), with the token key decoupled into sync_config.account_email. - Full: enumerate calendars (access-role filtered, default owner+writer), per-calendar StartSync → paginated events.list (singleEvents=false, showDeleted=true) → ingest → checkpoint per page; capture NextSyncToken only on the final page; CompleteSync; RecomputeConversationStats; resumable from a checkpointed pageToken. - Incremental: per existing source, list from the stored syncToken; advance the cursor even on per-item errors; 410 self-heals into a full resync (ErrSyncTokenExpired). - Persist: organizer/attendees via the email-keyed participant path (dedupes with Gmail contacts); is_from_me = organizer is the account; metadata JSON carries end/all_day/status/recurrence/links; raw event JSON preserved verbatim (raw_format=gcal_json). Attendee emails reach FTS via to_addr only (never body_text) to avoid double-encoding. - Cancellations are RETAINED, never soft-deleted: an existing row flips metadata.status to cancelled (preserving all other fields); a never-seen cancellation inserts a tombstone. - Source identifier key: standalone/master = event.id; recurring instance/exception = recurringEventId|originalStartTime. Also adds store.GetMessageMetadata (read counterpart of SetMessageMetadata) and preserves raw event bytes in gcal.Event.Raw. Behavioral tests: full persist + dedup, idempotency, access-role filter, cancellation-retain, 410 resync, recurrence master/occurrence, embed enqueue. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
The aggregate path applies no message_type predicate, so calendar_event rows (and their attendee message_recipients) would leak into the email Senders/Recipients/Domains/Time aggregates while the stats header gates them out — desyncing per-view counts from the header. Exclude calendar_event rows from the messages COPY and gate the recipients/ labels/attachments junction exports to non-calendar message_ids. Bump cacheSchemaVersion 5→6 to force a rebuild of caches that already exported leaked rows. Regression test builds the Parquet cache over an email + a calendar event and asserts the event and its attendee are absent while the email and its recipient remain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
Each [[gcal]] entry configures a calendar sync target (email = OAuth account/token key, optional oauth_app, calendar filter, cron schedule). Adds GetGCalSource (lookup by name or email), ScheduledGCalSources (for the daemon), and applyGCalDefaults (name defaults to email). Mirrors the synctech_sms config helpers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
…uling - add-calendar <email>: authorizes Gmail+Calendar (bundled re-consent so Gmail isn't dropped), then enumerates and registers the account's calendars (calendarList.list doubles as a live scope smoke test). - sync-calendar <name|email>: resolves the account from a [[gcal]] config entry or a bare email; first run (or --full) full-syncs and registers calendars, later runs are incremental via syncToken. Opens vector features so events embed. --after/--before bound full sync only. - buildCalendarClient keys the OAuth token on the account email (never a calendar source identifier), reauths with the combined Gmail+Calendar scope, and sizes the Calendar limiter (capacity=10, refill=8). - runConfiguredGCalSync is shared by the CLI and the daemon; serve.go schedules [[gcal]] sources through the generic scheduler.AddJob path (single Store, vf.Enqueuer), mirroring synctech. - calsync.RegisterCalendars enumerates + creates source rows without syncing events (used by add-calendar). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
README: add-calendar/sync-calendar commands, a Google Calendar section, and a scheduled [[gcal]] daemon example. configuration.md: [[gcal]] source reference. cli-reference.md: add-calendar and sync-calendar entries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
- Add internal/calsync real-client integration test: drives the REAL gcal.Client over an httptest server against byte-realistic Calendar API v3 JSON (verified against the Events resource + sync guide), through the real calsync pipeline into a real store. Covers a dozen event shapes (timed, all-day, tentative, unicode, recurring master + moved exception + cancelled occurrence, organizer-not-me) with field-by-field comparison, then an incremental create+cancel cycle. Exercises the full production path; only the Google TCP endpoint + OAuth token are swapped. - Export gcal.WithBaseURL / WithHTTPClient (custom endpoint / proxy / test server) and drop the redundant unexported variants. - Convert assertion-heavy tests to the testify bound-helper pattern so `make testify-helper-check` passes (also fixes the pre-existing api_search_test.go violation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
Five defects found by an adversarial multi-agent review, all with regression tests: - HIGH: a --limit full sync on a single-page calendar captured the final nextSyncToken and advanced the incremental cursor past the un-ingested events, so the next incremental sync never saw them (silent data loss). A limited run is now a preview that never advances the cursor. - MEDIUM: re-syncing an event that lost its organizer or all attendees left stale from/to message_recipients rows (the writes were guarded by non-empty checks, skipping the DELETE) while FTS to_addr was cleared — desyncing the two. ReplaceMessageRecipients is now unconditional so an empty set clears stale rows. - MEDIUM: incremental sync advanced the cursor even when an event failed to persist; the Calendar syncToken never re-delivers unchanged events, so the failure was permanent. The cursor now stays put (run fails) so the next sync re-delivers and retries, mirroring the full-sync path. - LOW: full-sync resume reused the prior run's id, bypassing StartSync's writer lock and letting concurrent runs clobber one sync_run row. Resume now reads the checkpoint then always StartSync (which supersedes under lock). - LOW: resume reset checkpoint counters to zero, under-reporting a resumed run's stats; prior counters are now seeded forward. The OAuth, SQL/dialect, API-client, and daemon-concurrency review dimensions found no genuine defects. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
The function header still described the old always-advance-cursor behavior; update it to match the new fail-and-retry semantics (cursor is not advanced when any event failed to persist). No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
… headless add-calendar A live-deployment audit (against the running shapiroserver2 msgvault container) surfaced three faults that make Calendar bootstrap unsafe on the headless server, where one Gmail-only token drives a scheduled 4am Gmail sync. 1. promptScopeEscalation deleted the existing token BEFORE re-authorizing. On a headless host the browser re-auth can never complete, so the delete-first ordering left the account with no token and would break the live Gmail sync irrecoverably. Authorize already overwrites the token atomically only after a validated grant, so the up-front DeleteToken was pure downside — removed. The old token now survives any cancelled/failed re-auth. This also hardens the deletion scope-escalation flow, which shares the helper. The now-unused *oauth.Manager parameter is dropped from both helpers and their call sites. 2. add-calendar --headless was broken/misleading: it was ignored on the scope-escalation path (forced a browser) and even on the no-token path ran a localhost-callback flow that cannot complete on a true headless box. It now mirrors add-account --headless: prints copy-the-token instructions (PrintCalendarHeadlessInstructions) and stops, without a browser or touching the existing token. Once a dual-scope token is copied in, re-running it registers calendars headlessly. 3. serve now warns when an enabled [[gcal]] source has no schedule — such a source is never daemon-synced, so its freshness drifts stale and the hourly monitor eventually alarms RED. Adds regression tests proving the token survives a failed escalation and that headless add-calendar prints instructions without deleting the token. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
A calendar event that lists the same person twice (a duplicate attendee, or two address forms that resolve to one participant) produced duplicate (message_id, participant_id, recipient_type) rows, tripping the UNIQUE constraint and aborting the entire calendar's sync. This surfaced on real data: two large calendars failed full sync with "replace to recipients: UNIQUE constraint failed". ReplaceMessageRecipients now collapses duplicate participant IDs within a set before inserting (first display name wins). The table can hold only one row per (message_id, participant_id, recipient_type), so this is always correct and hardens every importer that writes recipients (Gmail, calendar, iMessage, Google Voice, Synctech SMS, Messenger), not just calendar. Regression tests at both layers: a store unit test for duplicate participant IDs, and a calsync end-to-end full sync of an event with a duplicate attendee. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
…ries title from master Two faults found by an adversarial audit of the calendar ingest path: 1. A time-bounded full sync (--after/--before) advanced the incremental sync cursor over only that window. Future incremental syncs carry no time bounds, so out-of-window events would never be archived — silent data loss. The cursor-suppression guard that already covered --limit now also covers TimeMin/TimeMax; a bounded run ingests its window but does not establish an incremental baseline. 2. A recurring series' conversation title was overwritten by each per-instance exception's edited summary (last-writer-wins), so the navigation label flapped across syncs. Only the series master (or a standalone event) now sets the conversation title; an exception keeps its edited summary on its own message row. Regression tests for both, exercised through the real Full sync path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
The CLI reference and configuration already covered the calendar commands, but the published docs site had no dedicated guide for the feature. Add one and wire it into the navigation and feature surfaces: - docs/usage/calendar.md: end-to-end guide (authorize, sync, what gets archived, finding events, scheduled [[gcal]] sync, headless server setup, privacy) - nav: list "Google Calendar" under CLI Usage - index: add a Calendar Sync feature card and mention calendar in the supported-sources line - searching: document --message-type filtering (calendar_event, sms, ...) - remote-deployment: tip pointing NAS users at the headless calendar flow - README: add the feature to the intro line and feature list - check_markdown_sources allowlist: register the new page so its frontmatter is validated like its siblings Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
roborev: Combined Review (
|
Summary
Testing