Skip to content

Add Google Calendar sync support#418

Open
danshapiro wants to merge 22 commits into
kenn-io:mainfrom
danshapiro:codex/gcal-docs-pr-readiness
Open

Add Google Calendar sync support#418
danshapiro wants to merge 22 commits into
kenn-io:mainfrom
danshapiro:codex/gcal-docs-pr-readiness

Conversation

@danshapiro

Copy link
Copy Markdown
Contributor

Summary

  • add read-only Google Calendar sync with CLI commands, daemon scheduling, and [[gcal]] config
  • store calendar events as searchable messages with message_type filtering and scoped vector refresh support
  • update README, setup, OAuth, CLI, config, search, and changelog docs for PR readiness

Testing

  • timeout 45m env GOMAXPROCS=2 GOFLAGS=-p=1 make test
  • timeout 10m make docs-check

danshapiro and others added 22 commits June 24, 2026 23:59
The c885bdb message_type work covered store/api.go and the vector
backends, but internal/query/{sqlite,duckdb}.go were untouched —
buildSearchQueryParts and the DuckDB Search fallback dropped
q.MessageTypes, and MergeFilterIntoQuery never mapped
MessageFilter.MessageType. So `msgvault search --mode=fts` (which routes
through internal/query) silently ignored message_type scoping for every
non-email type (sms, whatsapp, and soon calendar_event).

Add the m.message_type IN (...) clause to both engines and carry
MessageFilter.MessageType into MergeFilterIntoQuery, mirroring the
store/api.go clause. dbtest.MessageOpts gains an optional MessageType.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
Two store-code additions calendar sync depends on, with no schema change:

- SetMessageMetadata(id, sql.NullString): writes the messages.metadata
  column (JSON/JSONB), which the hot upsertMessageSQL path never touches.
  Non-email importers that carry structured per-message metadata (calendar
  events: end/all_day/status/recurrence) call it right after UpsertMessage.
  Uses dialect.JSONBindExpr() for the PG ?::JSONB cast; an invalid
  NullString clears the column.
- GetSourcesByTypeAndAccount(type, accountEmail): enumerates the sources of
  one OAuth account by filtering source_type + sync_config.account_email in
  Go (dialect-portable). Config-driven sources decouple the per-source
  identifier (natural calendar key) from the token key (account_email).

Round-trip + clear + account-scoping tests; dual-dialect (SQLite + PG).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
The Gmail token bucket hardcodes capacity=250; reusing it for the
Calendar API (600 req/min/user) would permit a ~25x burst. Add a
capacity-explicit constructor so the Calendar client builds a correctly
sized bucket (capacity=10, refill=8 tok/s ≈ 80% of the 10 req/s/user
ceiling). Calendar operations (calendarList.list, events.list,
events.get) join the shared Operation enum at cost 1 so internal/gcal
can reuse this limiter and its adaptive Throttle/RecoverRate backoff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
New internal/gcal package mirroring internal/gmail: a hand-rolled
net/http client over the Calendar API, an API interface (CalendarReader
+ EventReader) with an in-memory MockAPI, and unexported wire types
mapped to exported domain types (Calendar, Event, Attendee, Person,
EventDateTime, EventsPage).

- Reuses internal/gmail's token-bucket RateLimiter via the shared
  Operation enum; the default limiter is sized for the Calendar per-user
  budget (capacity=10, refill=8 tok/s).
- request() retries network/429/quota-403/5xx with full-jitter backoff;
  surfaces *GoneError on 410 (stale syncToken) and *NotFoundError on 404;
  permission-403/401/other-4xx are terminal.
- ListEvents drops timeMin/timeMax when a syncToken is set (API requires
  this), and forwards singleEvents/showDeleted. NextSyncToken arrives
  only on the final page.
- MockAPI owns pagination so tests describe pages as plain event slices;
  supports incremental deltas, 410 injection, and call counting.

httptest client tests cover pagination, sync-token-on-final-page,
410/404, retry vs terminal 403, incremental param exclusion, and ctx
cancellation. Mapping covers timed/all-day/recurring/cancelled events.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
- oauth.ScopeCalendarReadonly / ScopesCalendar / ScopesGmailCalendar.
  Re-consent REPLACES scopes (ApprovalForce, no include_granted_scopes),
  so the bundle must carry Gmail + Calendar together to avoid silently
  dropping Gmail access for an existing account.
- Extract a generic promptScopeEscalation(requiredScopes, headline,
  bodyLines, cancelHint) from the deletion-hardwired version; keep a thin
  promptDeletionScopeEscalation wrapper. Both existing deletion call
  sites migrate with no behavior change. Calendar opt-in reuses the
  generic helper with ScopesGmailCalendar.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
New internal/calsync mirrors internal/sync for Calendar. Events persist
as messages rows (message_type=calendar_event) through the canonical
write path plus SetMessageMetadata; calendars are sources keyed on a
natural per-calendar identifier (accountEmail/calendarId), with the
token key decoupled into sync_config.account_email.

- Full: enumerate calendars (access-role filtered, default owner+writer),
  per-calendar StartSync → paginated events.list (singleEvents=false,
  showDeleted=true) → ingest → checkpoint per page; capture NextSyncToken
  only on the final page; CompleteSync; RecomputeConversationStats;
  resumable from a checkpointed pageToken.
- Incremental: per existing source, list from the stored syncToken;
  advance the cursor even on per-item errors; 410 self-heals into a full
  resync (ErrSyncTokenExpired).
- Persist: organizer/attendees via the email-keyed participant path
  (dedupes with Gmail contacts); is_from_me = organizer is the account;
  metadata JSON carries end/all_day/status/recurrence/links; raw event
  JSON preserved verbatim (raw_format=gcal_json). Attendee emails reach
  FTS via to_addr only (never body_text) to avoid double-encoding.
- Cancellations are RETAINED, never soft-deleted: an existing row flips
  metadata.status to cancelled (preserving all other fields); a
  never-seen cancellation inserts a tombstone.
- Source identifier key: standalone/master = event.id; recurring
  instance/exception = recurringEventId|originalStartTime.

Also adds store.GetMessageMetadata (read counterpart of
SetMessageMetadata) and preserves raw event bytes in gcal.Event.Raw.

Behavioral tests: full persist + dedup, idempotency, access-role filter,
cancellation-retain, 410 resync, recurrence master/occurrence, embed
enqueue.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
The aggregate path applies no message_type predicate, so calendar_event
rows (and their attendee message_recipients) would leak into the email
Senders/Recipients/Domains/Time aggregates while the stats header gates
them out — desyncing per-view counts from the header. Exclude
calendar_event rows from the messages COPY and gate the recipients/
labels/attachments junction exports to non-calendar message_ids. Bump
cacheSchemaVersion 5→6 to force a rebuild of caches that already exported
leaked rows.

Regression test builds the Parquet cache over an email + a calendar
event and asserts the event and its attendee are absent while the email
and its recipient remain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
Each [[gcal]] entry configures a calendar sync target (email = OAuth
account/token key, optional oauth_app, calendar filter, cron schedule).
Adds GetGCalSource (lookup by name or email), ScheduledGCalSources (for
the daemon), and applyGCalDefaults (name defaults to email). Mirrors the
synctech_sms config helpers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
…uling

- add-calendar <email>: authorizes Gmail+Calendar (bundled re-consent so
  Gmail isn't dropped), then enumerates and registers the account's
  calendars (calendarList.list doubles as a live scope smoke test).
- sync-calendar <name|email>: resolves the account from a [[gcal]] config
  entry or a bare email; first run (or --full) full-syncs and registers
  calendars, later runs are incremental via syncToken. Opens vector
  features so events embed. --after/--before bound full sync only.
- buildCalendarClient keys the OAuth token on the account email (never a
  calendar source identifier), reauths with the combined Gmail+Calendar
  scope, and sizes the Calendar limiter (capacity=10, refill=8).
- runConfiguredGCalSync is shared by the CLI and the daemon; serve.go
  schedules [[gcal]] sources through the generic scheduler.AddJob path
  (single Store, vf.Enqueuer), mirroring synctech.
- calsync.RegisterCalendars enumerates + creates source rows without
  syncing events (used by add-calendar).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
README: add-calendar/sync-calendar commands, a Google Calendar section,
and a scheduled [[gcal]] daemon example. configuration.md: [[gcal]]
source reference. cli-reference.md: add-calendar and sync-calendar
entries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
- Add internal/calsync real-client integration test: drives the REAL
  gcal.Client over an httptest server against byte-realistic Calendar API
  v3 JSON (verified against the Events resource + sync guide), through
  the real calsync pipeline into a real store. Covers a dozen event
  shapes (timed, all-day, tentative, unicode, recurring master + moved
  exception + cancelled occurrence, organizer-not-me) with field-by-field
  comparison, then an incremental create+cancel cycle. Exercises the full
  production path; only the Google TCP endpoint + OAuth token are swapped.
- Export gcal.WithBaseURL / WithHTTPClient (custom endpoint / proxy /
  test server) and drop the redundant unexported variants.
- Convert assertion-heavy tests to the testify bound-helper pattern so
  `make testify-helper-check` passes (also fixes the pre-existing
  api_search_test.go violation).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
Five defects found by an adversarial multi-agent review, all with
regression tests:

- HIGH: a --limit full sync on a single-page calendar captured the final
  nextSyncToken and advanced the incremental cursor past the un-ingested
  events, so the next incremental sync never saw them (silent data loss).
  A limited run is now a preview that never advances the cursor.
- MEDIUM: re-syncing an event that lost its organizer or all attendees
  left stale from/to message_recipients rows (the writes were guarded by
  non-empty checks, skipping the DELETE) while FTS to_addr was cleared —
  desyncing the two. ReplaceMessageRecipients is now unconditional so an
  empty set clears stale rows.
- MEDIUM: incremental sync advanced the cursor even when an event failed
  to persist; the Calendar syncToken never re-delivers unchanged events,
  so the failure was permanent. The cursor now stays put (run fails) so
  the next sync re-delivers and retries, mirroring the full-sync path.
- LOW: full-sync resume reused the prior run's id, bypassing StartSync's
  writer lock and letting concurrent runs clobber one sync_run row.
  Resume now reads the checkpoint then always StartSync (which supersedes
  under lock).
- LOW: resume reset checkpoint counters to zero, under-reporting a
  resumed run's stats; prior counters are now seeded forward.

The OAuth, SQL/dialect, API-client, and daemon-concurrency review
dimensions found no genuine defects.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
The function header still described the old always-advance-cursor
behavior; update it to match the new fail-and-retry semantics (cursor is
not advanced when any event failed to persist). No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
… headless add-calendar

A live-deployment audit (against the running shapiroserver2 msgvault container)
surfaced three faults that make Calendar bootstrap unsafe on the headless server,
where one Gmail-only token drives a scheduled 4am Gmail sync.

1. promptScopeEscalation deleted the existing token BEFORE re-authorizing. On a
   headless host the browser re-auth can never complete, so the delete-first
   ordering left the account with no token and would break the live Gmail sync
   irrecoverably. Authorize already overwrites the token atomically only after a
   validated grant, so the up-front DeleteToken was pure downside — removed. The
   old token now survives any cancelled/failed re-auth. This also hardens the
   deletion scope-escalation flow, which shares the helper. The now-unused
   *oauth.Manager parameter is dropped from both helpers and their call sites.

2. add-calendar --headless was broken/misleading: it was ignored on the
   scope-escalation path (forced a browser) and even on the no-token path ran a
   localhost-callback flow that cannot complete on a true headless box. It now
   mirrors add-account --headless: prints copy-the-token instructions
   (PrintCalendarHeadlessInstructions) and stops, without a browser or touching
   the existing token. Once a dual-scope token is copied in, re-running it
   registers calendars headlessly.

3. serve now warns when an enabled [[gcal]] source has no schedule — such a
   source is never daemon-synced, so its freshness drifts stale and the hourly
   monitor eventually alarms RED.

Adds regression tests proving the token survives a failed escalation and that
headless add-calendar prints instructions without deleting the token.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
A calendar event that lists the same person twice (a duplicate attendee, or
two address forms that resolve to one participant) produced duplicate
(message_id, participant_id, recipient_type) rows, tripping the UNIQUE
constraint and aborting the entire calendar's sync. This surfaced on real
data: two large calendars failed full sync with "replace to recipients:
UNIQUE constraint failed".

ReplaceMessageRecipients now collapses duplicate participant IDs within a set
before inserting (first display name wins). The table can hold only one row
per (message_id, participant_id, recipient_type), so this is always correct
and hardens every importer that writes recipients (Gmail, calendar, iMessage,
Google Voice, Synctech SMS, Messenger), not just calendar.

Regression tests at both layers: a store unit test for duplicate participant
IDs, and a calsync end-to-end full sync of an event with a duplicate attendee.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
…ries title from master

Two faults found by an adversarial audit of the calendar ingest path:

1. A time-bounded full sync (--after/--before) advanced the incremental
   sync cursor over only that window. Future incremental syncs carry no
   time bounds, so out-of-window events would never be archived — silent
   data loss. The cursor-suppression guard that already covered --limit now
   also covers TimeMin/TimeMax; a bounded run ingests its window but does
   not establish an incremental baseline.

2. A recurring series' conversation title was overwritten by each
   per-instance exception's edited summary (last-writer-wins), so the
   navigation label flapped across syncs. Only the series master (or a
   standalone event) now sets the conversation title; an exception keeps
   its edited summary on its own message row.

Regression tests for both, exercised through the real Full sync path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
The CLI reference and configuration already covered the calendar commands,
but the published docs site had no dedicated guide for the feature. Add one
and wire it into the navigation and feature surfaces:

- docs/usage/calendar.md: end-to-end guide (authorize, sync, what gets
  archived, finding events, scheduled [[gcal]] sync, headless server setup,
  privacy)
- nav: list "Google Calendar" under CLI Usage
- index: add a Calendar Sync feature card and mention calendar in the
  supported-sources line
- searching: document --message-type filtering (calendar_event, sms, ...)
- remote-deployment: tip pointing NAS users at the headless calendar flow
- README: add the feature to the intro line and feature list
- check_markdown_sources allowlist: register the new page so its frontmatter
  is validated like its siblings

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016FsDYFf2qzEubESGFd4uDT
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (feefe2e)

Medium issues found; no Critical or High findings were reported.

Medium

  • cmd/msgvault/cmd/calendar.go:231
    sync-calendar --all-calendars does not add reader/subscribed calendars after the account already has any registered calendar. The command switches to Incremental() whenever existing is non-empty, and incremental sync only iterates stored sources, so the documented command in docs/usage/calendar.md has no effect after the first default sync.
    Fix: When selection flags can expand the calendar set, enumerate/register missing calendars or force a full sync; alternatively require and document --full.

  • docs/usage/calendar.md:103
    Docs and release notes advertise msgvault search ... --message-type calendar_event, but search only registers --limit, --offset, --json, --account, --collection, --mode, and --explain. Users following the docs will get an unknown flag instead of a filtered search.
    Fix: Add a --message-type search flag that populates search.Query.MessageTypes, or update all examples to use the implemented message_type:calendar_event query operator.

  • cmd/msgvault/cmd/build_cache.go:496
    The cache verifier still treats maxID > 0 as proof that message Parquet rows must exist, but calendar events are now excluded from the messages export. A database containing only calendar_event rows will produce no messages Parquet and build-cache will fail forever instead of recording an empty email analytics cache.
    Fix: Base the verification on exportable non-calendar live messages, or allow zero exported message rows when all messages are excluded calendar events.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 8m0s), codex_security (codex/security, done, 5m56s) | Total: 14m6s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant