` and scheduler logs |
+| `email` | (required) | Google account that owns the token (the token key) |
+| `oauth_app` | — | Named Google OAuth app to use |
+| `calendars` | — | Specific calendar IDs to sync; empty syncs owned/writable calendars |
+| `schedule` | — | Cron expression used by `msgvault serve` |
+| `enabled` | `false` | Whether the source is daemon-scheduled |
+
### `[vector]`
Top-level toggle and backend marker for semantic/hybrid search. SQLite vector search requires a build with `sqlite_vec` support (default via `make build`). PostgreSQL vector search requires a build with the `pgvector` tag and a PostgreSQL `[data].database_url`. See [Vector Search](/usage/vector-search/) for prerequisites, initial embedding, and the full workflow.
diff --git a/docs/guides/oauth-setup.md b/docs/guides/oauth-setup.md
index 0c1aa708b..3bc3bcd95 100644
--- a/docs/guides/oauth-setup.md
+++ b/docs/guides/oauth-setup.md
@@ -3,7 +3,7 @@ title: OAuth Setup
description: Create OAuth credentials for Gmail (Google Cloud) or Microsoft 365 (Azure AD) and authorize msgvault.
---
-## Google (Gmail)
+## Google (Gmail and Calendar)
msgvault requires OAuth credentials to access the Gmail API. This section walks through the complete setup.
@@ -13,11 +13,12 @@ msgvault requires OAuth credentials to access the Gmail API. This section walks
2. Create a new project or select an existing one
3. Note your project ID
-### Step 2: Enable the Gmail API
+### Step 2: Enable Google APIs
1. Navigate to **APIs & Services > Library**
2. Search for "Gmail API"
3. Click **Enable**
+4. If you will sync Google Calendar too, also search for "Google Calendar API" and click **Enable**
### Step 3: Configure OAuth Consent Screen
@@ -30,8 +31,9 @@ msgvault requires OAuth credentials to access the Gmail API. This section walks
4. Click **Save and Continue**
5. On the **Data Access** page, click **Add or Remove Scopes**
6. Add the scope: `https://www.googleapis.com/auth/gmail.modify`
-7. Save and continue through the remaining screens
-8. Under **Test users**, add all Gmail addresses you want to sync
+7. If you will sync Google Calendar too, add `https://www.googleapis.com/auth/calendar.readonly`
+8. Save and continue through the remaining screens
+9. Under **Test users**, add all Gmail addresses you want to sync
!!! note
The `gmail.modify` scope enables deletion features while sync operations remain read-only. When you first run `delete-staged`, msgvault will prompt you to upgrade to full `mail.google.com` access for batch deletion.
@@ -143,6 +145,7 @@ Workspace admins can avoid per-user browser OAuth by using a Google service acco
3. In the Google Admin Console, authorize the service account client ID for:
- `https://www.googleapis.com/auth/gmail.readonly`
- `https://www.googleapis.com/auth/gmail.modify`
+ - `https://www.googleapis.com/auth/calendar.readonly` if you will sync Google Calendar
- `https://mail.google.com/` if you will run `delete-staged --permanent`
4. Store the key with owner-only permissions, for example `chmod 600 /path/to/workspace-service-account.json`.
@@ -169,6 +172,8 @@ msgvault add-account teammate@acme.com --oauth-app acme
Service account mode validates the delegated Gmail profile and registers the source, but it does not create per-user token files. Do not combine service-account accounts with `--headless` or `--force`; delegated tokens are minted on demand.
+For Google Calendar with a service account, enable the Google Calendar API and authorize the `calendar.readonly` scope above. Then configure a `[[gcal]]` source and run `msgvault sync-calendar user@domain.com --oauth-app acme` (or let `msgvault serve` run the schedule). No browser token is created.
+
### Headless Server Setup
When running msgvault on a headless server (SSH, VPS, Docker), there is no browser available for OAuth. Google's device code flow does not support Gmail scopes, so you must authorize on a machine with a browser and copy the token to your server.
diff --git a/docs/guides/remote-deployment.md b/docs/guides/remote-deployment.md
index 296a1a26b..9176637e1 100644
--- a/docs/guides/remote-deployment.md
+++ b/docs/guides/remote-deployment.md
@@ -205,6 +205,13 @@ docker logs msgvault
curl -H "X-API-Key: YOUR_API_KEY" http://remote-host:8080/api/v1/scheduler/status
```
+!!! tip "Archive Google Calendar too"
+ The same headless-token workflow archives Google Calendar. Authorize with
+ `msgvault add-calendar you@gmail.com` on a machine with a browser, copy the
+ token to the server (it now carries Gmail + Calendar), then add a `[[gcal]]`
+ entry with a cron `schedule` so the daemon syncs it. See
+ [Google Calendar](/usage/calendar/).
+
After setup, your data directory contains:
```
diff --git a/docs/index.md b/docs/index.md
index f6317a2c7..077cc0b6f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -18,9 +18,9 @@ search, and local AI workflows.
-Supports Gmail sync, IMAP, Microsoft 365, PST import, MBOX import, Apple Mail
-import, and chat/text import (WhatsApp, iMessage, Google Voice, Facebook
-Messenger, and SMS Backup & Restore).
+Supports Gmail sync, Google Calendar sync, IMAP, Microsoft 365, PST import, MBOX
+import, Apple Mail import, and chat/text import (WhatsApp, iMessage, Google Voice,
+Facebook Messenger, and SMS Backup & Restore).
Read the [Introduction](/introduction/) to learn more about why this project
was created.
@@ -70,6 +70,10 @@ disk that you own and control.
Full Email Backup
Downloads complete messages from Gmail or any IMAP server, including raw MIME, labels, metadata, and every attachment. Every PDF, photo, spreadsheet, and document you've ever received or sent is extracted and stored locally, deduplicated by content hash.
+
+ Calendar Sync
+ Archive Google Calendar alongside email. Events — including organizers, attendees, recurring series, and cancellations — become searchable by keyword and by meaning, and their participants dedupe with your email contacts. Read-only and incremental.
+
Lightning-Fast TUI
Explore hundreds of thousands of messages with instant aggregation and drill-down. Powered by DuckDB over Parquet, hundreds of times faster than SQL JOINs, in a small footprint.
diff --git a/docs/scripts/check_markdown_sources.py b/docs/scripts/check_markdown_sources.py
index 67b924d46..8e4902a52 100755
--- a/docs/scripts/check_markdown_sources.py
+++ b/docs/scripts/check_markdown_sources.py
@@ -40,6 +40,7 @@
"setup.md",
"troubleshooting.md",
"usage/analytics.md",
+ "usage/calendar.md",
"usage/chat.md",
"usage/deduplication.md",
"usage/deletion.md",
diff --git a/docs/setup.md b/docs/setup.md
index dc13cf967..c0a520909 100644
--- a/docs/setup.md
+++ b/docs/setup.md
@@ -59,7 +59,7 @@ msgvault --help
## Configure OAuth
-Create a Google Cloud project, enable the Gmail API, and download your `client_secret.json`. See the full [OAuth Setup Guide](/guides/oauth-setup/).
+Create a Google Cloud project, enable the Gmail API, and download your `client_secret.json`. If you plan to archive Google Calendar, enable the Google Calendar API too. See the full [OAuth Setup Guide](/guides/oauth-setup/).
### Where to put config.toml
@@ -270,3 +270,17 @@ msgvault stats
See [Searching](/usage/searching/) and [Interactive TUI](/usage/tui/) for more.
+
+## Optional: Sync Google Calendar
+
+To archive Calendar events alongside email, authorize Calendar access and run a
+calendar sync:
+
+```bash
+msgvault add-calendar you@gmail.com
+msgvault sync-calendar you@gmail.com
+```
+
+Calendar sync is read-only. Events become searchable with
+`--message-type calendar_event`; see [Google Calendar](/usage/calendar/) for the
+full workflow, scheduled sync, and headless-server setup.
diff --git a/docs/usage/calendar.md b/docs/usage/calendar.md
new file mode 100644
index 000000000..1481cbd1f
--- /dev/null
+++ b/docs/usage/calendar.md
@@ -0,0 +1,196 @@
+---
+title: Google Calendar
+description: Archive Google Calendar events alongside your email, with full-text and semantic search over meetings, organizers, and attendees.
+---
+
+msgvault can archive your Google Calendar events into the same local database as
+your email. Events become searchable by keyword (and semantically, when vector
+search is enabled), and their organizers and attendees join the same contact
+graph as the people you email — so a meeting with `alice@example.com` dedupes
+against the messages you exchanged with her.
+
+Calendar sync is **read-only**: msgvault never creates, edits, or deletes
+anything on your Google Calendar.
+
+## Prerequisites
+
+- An OAuth client already configured for Gmail (see [OAuth Setup](/guides/oauth-setup/)).
+ Calendar reuses the same `client_secret.json`.
+- The **Google Calendar API** enabled on that OAuth project. In the
+ [Google Cloud Console](https://console.cloud.google.com/), go to
+ **APIs & Services > Library**, search for "Google Calendar API", and click
+ **Enable**.
+
+## Authorize and register calendars
+
+```bash
+msgvault add-calendar you@gmail.com
+```
+
+This grants read-only Calendar access (`calendar.readonly`) and registers your
+calendars for sync.
+
+!!! warning "Keep both Gmail and Calendar checked"
+ If the account already has a Gmail token, re-consent **replaces** the granted
+ scopes, so msgvault re-requests Gmail **and** Calendar together. On Google's
+ consent screen, keep **both** checked — unchecking Gmail would drop Gmail
+ access for that account.
+
+By default only calendars you own or can write to are registered. Add
+`--all-calendars` to also include subscribed and holiday calendars (those you can
+only read).
+
+| Flag | Description |
+|---|---|
+| `--all-calendars` | Include reader/freeBusyReader (subscribed, holiday) calendars |
+| `--min-access-role` | Minimum access role: `owner`, `writer`, or `reader` |
+| `--calendars` | Comma-separated calendar IDs to register |
+| `--oauth-app` | Named OAuth app to use |
+| `--headless` | Print headless-server setup instructions instead of opening a browser |
+
+## Sync events
+
+```bash
+# First run does a full sync and registers calendars; later runs are incremental.
+msgvault sync-calendar you@gmail.com
+
+# Force a full re-sync
+msgvault sync-calendar you@gmail.com --full
+
+# Include subscribed and holiday calendars
+msgvault sync-calendar you@gmail.com --all-calendars
+
+# Bound a full sync to a date range (full sync only)
+msgvault sync-calendar you@gmail.com --full --after 2020-01-01 --before 2024-12-31
+```
+
+The first run (or `--full`) enumerates and registers calendars and downloads
+events. Subsequent runs are incremental, using the Calendar `syncToken` to fetch
+only what changed. Interrupted full syncs resume from a checkpoint; pass
+`--noresume` to start over.
+
+| Flag | Description |
+|---|---|
+| `--full` | Force a full sync (ignore stored sync tokens) |
+| `--limit` | Max events per calendar (0 = unlimited) |
+| `--after` / `--before` | Bound a full sync to a date range (`YYYY-MM-DD`); full sync only |
+| `--calendar` | Restrict to specific calendar IDs |
+| `--all-calendars` | Include reader/freeBusyReader calendars |
+| `--min-access-role` | Minimum access role: `owner`, `writer`, or `reader` |
+| `--oauth-app` | Named OAuth app to use |
+| `--noresume` | Do not resume an interrupted full sync |
+
+The first argument can be an account email or the `name` of a `[[gcal]]` entry in
+`config.toml` (see [Scheduled sync](#scheduled-sync-daemon) below).
+
+## What gets archived
+
+Each event is stored as a searchable record with `message_type = calendar_event`:
+
+- The **organizer** becomes the `from` participant and **attendees** become `to`
+ participants, so they dedupe with your email contacts.
+- The **subject** is the event summary; the searchable body includes the title,
+ time range, location, description, and attendee names.
+- **Recurring events** are grouped into one conversation titled by the series;
+ individually edited occurrences keep their own details.
+- **Cancelled events are kept**, marked cancelled rather than deleted, so your
+ archive preserves that a meeting once existed.
+- The full original event record is retained for fidelity.
+
+## Find events
+
+Calendar events are searchable like any other message. Restrict a search to
+events with `--message-type calendar_event`:
+
+```bash
+# Keyword search across event summaries, locations, descriptions, and attendees
+msgvault search "standup" --message-type calendar_event
+
+# Everything on a calendar within a date range
+msgvault search "after:2024-01-01 before:2024-04-01" --message-type calendar_event
+```
+
+When [vector search](/usage/vector-search/) is enabled, events are embedded
+during sync and can be found semantically with `--mode vector` or
+`--mode hybrid`.
+
+## Scheduled sync (daemon)
+
+Run calendar sync automatically with `msgvault serve` by adding a `[[gcal]]`
+entry to `config.toml`:
+
+```toml
+[[gcal]]
+email = "you@gmail.com"
+schedule = "0 */6 * * *" # every 6 hours (5-field cron)
+enabled = true
+```
+
+The first scheduled run full-syncs and registers calendars; later runs are
+incremental. See [Configuration](/configuration/#google-calendar-sources) for
+every field.
+
+!!! note
+ An `enabled` `[[gcal]]` entry with no `schedule` is never synced by the
+ daemon — set a cron `schedule` so its freshness does not drift stale.
+
+## Headless server setup
+
+A headless server can't complete Google's browser consent, and the OAuth device
+flow doesn't support Calendar scopes. Authorize on a machine with a browser,
+then copy the token to the server. If the server already has a token for the
+account, copy that token to the browser machine first so re-consent preserves
+Drive or other previously granted Google scopes.
+
+1. **If a token already exists on the server**, copy it to the browser machine:
+ ```bash
+ mkdir -p ~/.msgvault/tokens
+ scp user@server:~/.msgvault/tokens/you@gmail.com.json ~/.msgvault/tokens/
+ ```
+
+2. **On a machine with a browser**, using the **same `client_secret.json`** as
+ the server:
+ ```bash
+ msgvault add-calendar you@gmail.com
+ ```
+ Keep all existing permissions plus Calendar checked on the consent screen.
+
+3. **Copy the token back to the server**, replacing the existing one. It now
+ carries Calendar plus the existing Google permissions, so current sync jobs
+ keep working:
+ ```bash
+ ssh user@server mkdir -p ~/.msgvault/tokens
+ scp ~/.msgvault/tokens/you@gmail.com.json user@server:~/.msgvault/tokens/
+ ```
+
+4. **On the server**, register the calendars (no browser needed) and sync:
+ ```bash
+ msgvault add-calendar you@gmail.com
+ msgvault sync-calendar you@gmail.com
+ ```
+
+Run `msgvault add-calendar you@gmail.com --headless` on the server to print these
+steps at any time.
+
+## Google Workspace service accounts
+
+Workspace admins using domain-wide delegation do not need per-user browser
+tokens for Calendar. Enable the Google Calendar API, authorize the service
+account client ID for `https://www.googleapis.com/auth/calendar.readonly`, and
+configure `[oauth].service_account_key` or `[oauth.apps.].service_account_key`
+as described in [OAuth Setup](/guides/oauth-setup/#google-workspace-service-accounts).
+
+Then sync the account directly or add a scheduled `[[gcal]]` entry:
+
+```bash
+msgvault sync-calendar user@domain.com --oauth-app acme
+```
+
+The first sync registers matching calendars and stores their sync cursors.
+
+## Privacy
+
+Calendar sync is read-only and runs only when you invoke it (or on the schedule
+you configure). OAuth tokens are stored under your msgvault home directory with
+owner-only permissions and are never written into `config.toml`, logs, or
+exported data.
diff --git a/docs/usage/searching.md b/docs/usage/searching.md
index 682ccff9a..edca1df8c 100644
--- a/docs/usage/searching.md
+++ b/docs/usage/searching.md
@@ -84,6 +84,22 @@ The two flags are mutually exclusive. Collection filters work in full-text, vect
SQLite FTS ranking is weighted to better match PostgreSQL-backed search behavior, so subject/body weighting should feel more consistent across local tools. The rankers are still different; see [Search Ranking Across Backends](/architecture/search-ranking/).
+## Filtering by Message Type
+
+Archives can hold more than email — Google Calendar events, text messages, and
+call logs all live in the same database. Restrict a search to one kind with
+`--message-type`:
+
+```bash
+# Only Google Calendar events
+msgvault search "standup" --message-type calendar_event
+
+# Only SMS/MMS text messages
+msgvault search "dinner" --message-type sms
+```
+
+Values include `calendar_event`, `sms`, `mms`, and `synctech_sms_call`.
+
## JSON Output
Add `--json` for machine-readable output:
diff --git a/docs/zensical.toml b/docs/zensical.toml
index ad49847b0..fa391931f 100644
--- a/docs/zensical.toml
+++ b/docs/zensical.toml
@@ -24,6 +24,7 @@ nav = [
{"Vector Search" = "usage/vector-search.md"},
{"Importing Local Email" = "usage/importing.md"},
{"Text Messages" = "usage/text-messages.md"},
+ {"Google Calendar" = "usage/calendar.md"},
{"Exporting Data" = "usage/exporting.md"},
{"Analytics & Stats" = "usage/analytics.md"},
{"SQL Queries" = "usage/querying.md"},
diff --git a/internal/api/handlers.go b/internal/api/handlers.go
index c76ed2473..4df02ff66 100644
--- a/internal/api/handlers.go
+++ b/internal/api/handlers.go
@@ -616,6 +616,8 @@ func (s *Server) handleHybridSearch(
case errors.Is(err, vector.ErrEmbeddingTimeout):
writeError(w, http.StatusServiceUnavailable, "embedding_timeout",
"the embedding endpoint did not respond in time; retry, or raise [vector.embeddings].timeout")
+ case errors.Is(err, vector.ErrIndexScopeMismatch):
+ writeError(w, http.StatusBadRequest, "index_scope_mismatch", err.Error())
default:
s.logger.Error("hybrid search failed", "query", q, "mode", mode, "error", err)
writeError(w, http.StatusInternalServerError, "internal_error", "search failed")
diff --git a/internal/calsync/calsync.go b/internal/calsync/calsync.go
new file mode 100644
index 000000000..d86767fa6
--- /dev/null
+++ b/internal/calsync/calsync.go
@@ -0,0 +1,477 @@
+// Package calsync orchestrates read-only Google Calendar sync, mirroring
+// internal/sync for Gmail. It enumerates an account's calendars, full-syncs and
+// incrementally syncs their events via internal/gcal, and persists each event as
+// a messages row (message_type=calendar_event) through the canonical store write
+// path plus the messages.metadata helper. Calendars are sources keyed on a
+// natural per-calendar identifier, decoupled from the OAuth account/token key
+// which lives in sync_config.account_email.
+package calsync
+
+import (
+ "context"
+ "database/sql"
+ "encoding/json"
+ "fmt"
+ "log/slog"
+ "slices"
+ "strings"
+
+ "go.kenn.io/msgvault/internal/gcal"
+ "go.kenn.io/msgvault/internal/store"
+)
+
+const (
+ accessRoleOwner = "owner"
+ accessRoleWriter = "writer"
+ accessRoleReader = "reader"
+)
+
+const calendarFullCheckpointKind = "gcal_full_v1"
+
+// EmbedEnqueuer matches sync.EmbedEnqueuer: nil disables vector enqueue.
+type EmbedEnqueuer interface {
+ EnqueueMessages(ctx context.Context, messageIDs []int64) error
+}
+
+// Options configures a calendar sync run.
+type Options struct {
+ // AccountEmail is the OAuth account that owns the token; it keys token
+ // lookup and is stored in each source's sync_config. Never the source
+ // identifier.
+ AccountEmail string
+ // OAuthApp is the named OAuth app binding to persist on new sources (""=default).
+ OAuthApp string
+ // Calendars restricts sync to these calendar IDs (empty = access-role filter).
+ Calendars []string
+ // AllCalendars includes reader/freeBusyReader calendars (default: owner+writer).
+ AllCalendars bool
+ // MinAccessRole overrides the default minimum access role ("writer").
+ MinAccessRole string
+ // TimeMin/TimeMax bound a full sync (RFC3339). Full-sync only; ignored on
+ // incremental (the API rejects them with a syncToken).
+ TimeMin string
+ TimeMax string
+ // Limit caps events ingested per calendar (0 = unlimited).
+ Limit int
+ // NoResume forces a fresh full sync instead of resuming an interrupted one.
+ NoResume bool
+}
+
+// Result summarizes a sync run.
+type Result struct {
+ CalendarsSynced int
+ EventsAdded int
+ EventsCancelled int
+ InsertedIDs []int64
+}
+
+// Syncer runs calendar syncs against a gcal.API and a store.Store.
+type Syncer struct {
+ client gcal.API
+ store *store.Store
+ opts Options
+ logger *slog.Logger
+ enq EmbedEnqueuer
+}
+
+// New builds a Syncer.
+func New(client gcal.API, st *store.Store, opts Options) *Syncer {
+ return &Syncer{
+ client: client,
+ store: st,
+ opts: opts,
+ logger: slog.Default(),
+ }
+}
+
+// WithLogger sets the structured logger and returns the Syncer for chaining.
+func (s *Syncer) WithLogger(l *slog.Logger) *Syncer {
+ if l != nil {
+ s.logger = l
+ }
+ return s
+}
+
+// SetEmbedEnqueuer wires the optional vector-search enqueuer (nil = disabled).
+func (s *Syncer) SetEmbedEnqueuer(e EmbedEnqueuer) { s.enq = e }
+
+// Full enumerates calendars and runs a full sync of each. A per-calendar
+// failure is logged and does not abort the remaining calendars; the first such
+// error is returned after all calendars are attempted.
+func (s *Syncer) Full(ctx context.Context) (Result, error) {
+ if err := ValidateMinAccessRole(s.opts.MinAccessRole); err != nil {
+ return Result{}, err
+ }
+ cals, err := s.listCalendars(ctx)
+ if err != nil {
+ return Result{}, fmt.Errorf("enumerate calendars: %w", err)
+ }
+
+ var result Result
+ var firstErr error
+ for _, cal := range cals {
+ if !s.includeCalendar(cal) {
+ continue
+ }
+ if err := ctx.Err(); err != nil {
+ return result, err
+ }
+ if err := s.syncCalendarFull(ctx, cal, &result); err != nil {
+ s.logger.Error("calendar full sync failed", "calendar", cal.ID, "error", err)
+ if firstErr == nil {
+ firstErr = err
+ }
+ continue
+ }
+ result.CalendarsSynced++
+ }
+ s.enqueue(ctx, result.InsertedIDs)
+ return result, firstErr
+}
+
+// RegisterCalendars enumerates the account's calendars, applies the selection
+// filter, and creates/updates a source row per selected calendar WITHOUT syncing
+// events. Used by `add-calendar`, where the calendarList.list call also doubles
+// as a smoke test that the calendar scope was actually granted. Returns the
+// registered calendars.
+func (s *Syncer) RegisterCalendars(ctx context.Context) ([]gcal.Calendar, error) {
+ if err := ValidateMinAccessRole(s.opts.MinAccessRole); err != nil {
+ return nil, err
+ }
+ cals, err := s.listCalendars(ctx)
+ if err != nil {
+ return nil, fmt.Errorf("enumerate calendars: %w", err)
+ }
+ var registered []gcal.Calendar
+ for _, cal := range cals {
+ if !s.includeCalendar(cal) {
+ continue
+ }
+ src, err := s.store.GetOrCreateSource(gcal.SourceType, s.sourceIdentifier(cal))
+ if err != nil {
+ return nil, fmt.Errorf("get/create source for %s: %w", cal.ID, err)
+ }
+ if err := s.store.UpdateSourceSyncConfig(src.ID, s.sourceConfigJSON(cal)); err != nil {
+ return nil, fmt.Errorf("write sync config for %s: %w", cal.ID, err)
+ }
+ if s.opts.OAuthApp != "" {
+ if err := s.store.UpdateSourceOAuthApp(src.ID, sql.NullString{String: s.opts.OAuthApp, Valid: true}); err != nil {
+ return nil, fmt.Errorf("write oauth app for %s: %w", cal.ID, err)
+ }
+ }
+ registered = append(registered, cal)
+ }
+ return registered, nil
+}
+
+// listCalendars paginates calendarList.list.
+func (s *Syncer) listCalendars(ctx context.Context) ([]gcal.Calendar, error) {
+ var all []gcal.Calendar
+ pageToken := ""
+ for {
+ page, err := s.client.ListCalendars(ctx, pageToken)
+ if err != nil {
+ return nil, err
+ }
+ all = append(all, page.Items...)
+ if page.NextPageToken == "" {
+ break
+ }
+ pageToken = page.NextPageToken
+ }
+ return all, nil
+}
+
+// syncCalendarFull full-syncs one calendar, persisting events and advancing the
+// stored syncToken cursor only after the final page succeeds. Complete,
+// unbounded full syncs resume from a checkpointed pageToken unless NoResume is
+// set. Bounded/limited runs deliberately do not checkpoint page tokens because
+// replaying that token under a later unbounded request can skip or corrupt the
+// traversal.
+func (s *Syncer) syncCalendarFull(ctx context.Context, cal gcal.Calendar, result *Result) error {
+ src, err := s.store.GetOrCreateSource(gcal.SourceType, s.sourceIdentifier(cal))
+ if err != nil {
+ return fmt.Errorf("get/create source: %w", err)
+ }
+ if err := s.store.UpdateSourceSyncConfig(src.ID, s.sourceConfigJSON(cal)); err != nil {
+ return fmt.Errorf("write sync config: %w", err)
+ }
+ if s.opts.OAuthApp != "" {
+ if err := s.store.UpdateSourceOAuthApp(src.ID, sql.NullString{String: s.opts.OAuthApp, Valid: true}); err != nil {
+ return fmt.Errorf("write oauth app: %w", err)
+ }
+ }
+
+ // Resume an interrupted run from its checkpoint, then ALWAYS StartSync. We do
+ // NOT reuse the prior run's id: StartSync is the only path that takes the
+ // source row's writer lock and supersedes other 'running' runs, so going
+ // through it serializes concurrent/overlapping full syncs (a manual run racing
+ // the daemon) instead of two callers sharing — and clobbering — one sync_run
+ // row. The prior run's counters are carried forward so a resumed run's stats
+ // stay accurate (UpdateSyncCheckpoint overwrites counters absolutely).
+ var resumePageToken string
+ var priorProcessed, priorAdded, priorUpdated int64
+ resumeEligible := s.fullSyncResumeEligible()
+ if resumeEligible {
+ if active, _ := s.store.GetActiveSync(src.ID); active != nil && active.Status == store.SyncStatusRunning {
+ if pageToken, ok := decodeCalendarFullCheckpoint(active.CursorBefore); ok {
+ resumePageToken = pageToken
+ priorProcessed = active.MessagesProcessed
+ priorAdded = active.MessagesAdded
+ priorUpdated = active.MessagesUpdated
+ s.logger.Info("resuming interrupted calendar sync", "calendar", cal.ID, "page_token", resumePageToken)
+ } else {
+ s.logger.Info("ignoring legacy calendar sync checkpoint; restarting full sync", "calendar", cal.ID)
+ }
+ }
+ }
+ syncID, err := s.store.StartSync(src.ID, "full")
+ if err != nil {
+ return fmt.Errorf("start sync: %w", err)
+ }
+
+ cp := store.Checkpoint{
+ PageToken: resumePageToken,
+ MessagesProcessed: priorProcessed,
+ MessagesAdded: priorAdded,
+ MessagesUpdated: priorUpdated,
+ }
+ pageToken := resumePageToken
+ finalToken := ""
+ ingested := 0
+ limitHit := false
+
+ fail := func(e error) error {
+ _ = s.store.FailSync(syncID, e.Error())
+ return e
+ }
+
+ for {
+ if err := ctx.Err(); err != nil {
+ return fail(err)
+ }
+ page, err := s.client.ListEvents(ctx, cal.ID, gcal.EventsListParams{
+ SingleEvents: false,
+ ShowDeleted: true,
+ MaxResults: 2500,
+ TimeMin: s.opts.TimeMin,
+ TimeMax: s.opts.TimeMax,
+ PageToken: pageToken,
+ })
+ if err != nil {
+ return fail(fmt.Errorf("events.list: %w", err))
+ }
+
+ for i := range page.Items {
+ if s.opts.Limit > 0 && ingested >= s.opts.Limit {
+ limitHit = true
+ break
+ }
+ ev := page.Items[i]
+ added, cancelled, err := s.persistOne(src.ID, cal, ev, result)
+ if err != nil {
+ return fail(fmt.Errorf("persist event %s: %w", ev.ID, err))
+ }
+ ingested++
+ cp.MessagesProcessed++
+ if added {
+ cp.MessagesAdded++
+ }
+ if cancelled {
+ cp.MessagesUpdated++
+ }
+ }
+
+ if page.NextSyncToken != "" {
+ finalToken = page.NextSyncToken
+ }
+ pageToken = page.NextPageToken
+ cp.PageToken = pageToken
+ if resumeEligible {
+ checkpoint := cp
+ checkpoint.PageToken = encodeCalendarFullCheckpoint(pageToken)
+ if err := s.store.UpdateSyncCheckpoint(syncID, &checkpoint); err != nil {
+ return fail(fmt.Errorf("checkpoint: %w", err))
+ }
+ }
+ if pageToken == "" || limitHit {
+ break
+ }
+ }
+
+ // A partial traversal must NOT establish an incremental baseline: a --limit
+ // run deliberately skips events, and a --after/--before (TimeMin/TimeMax) run
+ // only sees a time window. In both cases — on a single-page calendar, where
+ // the API returns events AND the final nextSyncToken together — the cursor
+ // would advance past un-ingested events, and the next incremental sync (which
+ // carries no time bounds) would never see them (silent data loss). Only a
+ // complete, unbounded traversal advances the cursor.
+ if limitHit || s.opts.TimeMin != "" || s.opts.TimeMax != "" {
+ finalToken = ""
+ }
+ if finalToken != "" {
+ if err := s.store.UpdateSourceSyncCursor(src.ID, finalToken); err != nil {
+ return fail(fmt.Errorf("update cursor: %w", err))
+ }
+ }
+ if err := s.store.CompleteSync(syncID, finalToken); err != nil {
+ return fail(fmt.Errorf("complete sync: %w", err))
+ }
+ if err := s.store.RecomputeConversationStats(src.ID); err != nil {
+ s.logger.Warn("recompute conversation stats failed", "calendar", cal.ID, "error", err)
+ }
+ _ = s.store.CheckpointWAL()
+ s.logger.Info("calendar full sync complete",
+ "calendar", cal.ID, "events_processed", cp.MessagesProcessed,
+ "events_added", cp.MessagesAdded, "events_cancelled", cp.MessagesUpdated)
+ return nil
+}
+
+func (s *Syncer) fullSyncResumeEligible() bool {
+ return !s.opts.NoResume &&
+ s.opts.Limit == 0 &&
+ s.opts.TimeMin == "" &&
+ s.opts.TimeMax == ""
+}
+
+type calendarFullCheckpoint struct {
+ Kind string `json:"kind"`
+ PageToken string `json:"page_token"`
+}
+
+func encodeCalendarFullCheckpoint(pageToken string) string {
+ b, err := json.Marshal(calendarFullCheckpoint{
+ Kind: calendarFullCheckpointKind,
+ PageToken: pageToken,
+ })
+ if err != nil {
+ return ""
+ }
+ return string(b)
+}
+
+func decodeCalendarFullCheckpoint(raw sql.NullString) (string, bool) {
+ if !raw.Valid || raw.String == "" {
+ return "", false
+ }
+ var cp calendarFullCheckpoint
+ if err := json.Unmarshal([]byte(raw.String), &cp); err != nil {
+ return "", false
+ }
+ if cp.Kind != calendarFullCheckpointKind {
+ return "", false
+ }
+ return cp.PageToken, true
+}
+
+// persistOne routes an event to ingest or cancellation handling and updates the
+// run result. Returns (added, cancelled).
+func (s *Syncer) persistOne(sourceID int64, cal gcal.Calendar, ev gcal.Event, result *Result) (bool, bool, error) {
+ if ev.IsCancelled() {
+ id, inserted, err := s.flagCancelled(sourceID, cal, ev)
+ if err != nil {
+ return false, false, err
+ }
+ result.EventsCancelled++
+ if id != 0 {
+ result.InsertedIDs = append(result.InsertedIDs, id)
+ }
+ return inserted, true, nil
+ }
+ id, err := s.ingestEvent(sourceID, cal, ev)
+ if err != nil {
+ return false, false, err
+ }
+ result.EventsAdded++
+ result.InsertedIDs = append(result.InsertedIDs, id)
+ return true, false, nil
+}
+
+// enqueue forwards ingested ids to the embed worker if vector search is enabled.
+func (s *Syncer) enqueue(ctx context.Context, ids []int64) {
+ if s.enq == nil || len(ids) == 0 {
+ return
+ }
+ if err := s.enq.EnqueueMessages(ctx, ids); err != nil {
+ s.logger.Warn("enqueue events for embedding failed", "count", len(ids), "error", err)
+ }
+}
+
+// includeCalendar applies the calendar selection filter.
+func (s *Syncer) includeCalendar(cal gcal.Calendar) bool {
+ if cal.Deleted {
+ return false
+ }
+ if len(s.opts.Calendars) > 0 {
+ return slices.Contains(s.opts.Calendars, cal.ID)
+ }
+ if s.opts.AllCalendars {
+ return true
+ }
+ return accessRoleRank(cal.AccessRole) >= accessRoleRank(s.minAccessRole())
+}
+
+func (s *Syncer) minAccessRole() string {
+ if s.opts.MinAccessRole != "" {
+ return s.opts.MinAccessRole
+ }
+ return accessRoleWriter
+}
+
+// ValidateMinAccessRole checks the user-supplied minimum role. Calendar access
+// roles below reader, such as freeBusyReader, may still appear on calendars but
+// are not accepted as minimum-selection flags because the CLI advertises
+// owner|writer|reader only.
+func ValidateMinAccessRole(role string) error {
+ if role == "" {
+ return nil
+ }
+ switch role {
+ case accessRoleOwner, accessRoleWriter, accessRoleReader:
+ return nil
+ default:
+ return fmt.Errorf("invalid min access role %q (expected owner|writer|reader)", role)
+ }
+}
+
+// accessRoleRank orders calendar access roles so a minimum can be enforced.
+func accessRoleRank(role string) int {
+ switch role {
+ case accessRoleOwner:
+ return 3
+ case accessRoleWriter:
+ return 2
+ case accessRoleReader:
+ return 1
+ default: // freeBusyReader, unknown
+ return 0
+ }
+}
+
+// sourceIdentifier is the natural per-calendar key, scoped by account to avoid
+// collisions when two accounts subscribe to the same shared calendar ID.
+func (s *Syncer) sourceIdentifier(cal gcal.Calendar) string {
+ return s.opts.AccountEmail + "/" + cal.ID
+}
+
+// sourceConfigJSON builds the sync_config payload. account_email is the token
+// key; the rest is descriptive.
+func (s *Syncer) sourceConfigJSON(cal gcal.Calendar) string {
+ return buildSourceConfigJSON(sourceConfig{
+ AccountEmail: s.opts.AccountEmail,
+ CalendarID: cal.ID,
+ CalendarSummary: cal.Summary,
+ AccessRole: cal.AccessRole,
+ Primary: cal.Primary,
+ TimeZone: cal.TimeZone,
+ })
+}
+
+// emailDomain returns the lowercased domain part of an email, or "".
+func emailDomain(email string) string {
+ at := strings.LastIndex(email, "@")
+ if at < 0 || at == len(email)-1 {
+ return ""
+ }
+ return strings.ToLower(email[at+1:])
+}
diff --git a/internal/calsync/calsync_test.go b/internal/calsync/calsync_test.go
new file mode 100644
index 000000000..4509cd34a
--- /dev/null
+++ b/internal/calsync/calsync_test.go
@@ -0,0 +1,439 @@
+package calsync
+
+import (
+ "context"
+ "database/sql"
+ "encoding/json"
+ "log/slog"
+ "testing"
+ "time"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+
+ "go.kenn.io/msgvault/internal/gcal"
+ "go.kenn.io/msgvault/internal/store"
+ "go.kenn.io/msgvault/internal/testutil"
+)
+
+const testAccount = "alice@example.com"
+
+func quietLogger() *slog.Logger {
+ return slog.New(slog.DiscardHandler)
+}
+
+func newSyncer(t *testing.T, mock *gcal.MockAPI, opts Options) (*Syncer, *store.Store) {
+ t.Helper()
+ st := testutil.NewTestStore(t)
+ if opts.AccountEmail == "" {
+ opts.AccountEmail = testAccount
+ }
+ s := New(mock, st, opts).WithLogger(quietLogger())
+ return s, st
+}
+
+// --- read-back helpers (direct SQL through the real store) ---
+
+type msgRow struct {
+ id int64
+ convID int64
+ mtype string
+ subject sql.NullString
+ sentAt sql.NullTime
+ senderID sql.NullInt64
+ isFromMe bool
+ snippet sql.NullString
+ metadata sql.NullString
+ deletedFromSource sql.NullTime
+}
+
+func getMsg(t *testing.T, st *store.Store, sourceID int64, smid string) (msgRow, bool) {
+ t.Helper()
+ var m msgRow
+ err := st.DB().QueryRow(st.Rebind(`
+ SELECT id, conversation_id, message_type, subject, sent_at, sender_id,
+ is_from_me, snippet, metadata, deleted_from_source_at
+ FROM messages WHERE source_id = ? AND source_message_id = ?`), sourceID, smid).
+ Scan(&m.id, &m.convID, &m.mtype, &m.subject, &m.sentAt, &m.senderID,
+ &m.isFromMe, &m.snippet, &m.metadata, &m.deletedFromSource)
+ if err == sql.ErrNoRows {
+ return msgRow{}, false
+ }
+ require.NoError(t, err)
+ return m, true
+}
+
+func primarySource(t *testing.T, st *store.Store) *store.Source {
+ t.Helper()
+ src, err := st.GetSourceByIdentifier(testAccount + "/primary")
+ require.NoError(t, err)
+ return src
+}
+
+func countMessages(t *testing.T, st *store.Store, sourceID int64) int {
+ t.Helper()
+ var n int
+ require.NoError(t, st.DB().QueryRow(
+ st.Rebind(`SELECT COUNT(*) FROM messages WHERE source_id = ?`), sourceID).Scan(&n))
+ return n
+}
+
+func bodyText(t *testing.T, st *store.Store, msgID int64) string {
+ t.Helper()
+ var body sql.NullString
+ err := st.DB().QueryRow(
+ st.Rebind(`SELECT body_text FROM message_bodies WHERE message_id = ?`), msgID).Scan(&body)
+ if err == sql.ErrNoRows {
+ return ""
+ }
+ require.NoError(t, err)
+ return body.String
+}
+
+func rawFormat(t *testing.T, st *store.Store, msgID int64) string {
+ t.Helper()
+ var f string
+ require.NoError(t, st.DB().QueryRow(
+ st.Rebind(`SELECT raw_format FROM message_raw WHERE message_id = ?`), msgID).Scan(&f))
+ return f
+}
+
+func recipientEmails(t *testing.T, st *store.Store, msgID int64, typ string) []string {
+ t.Helper()
+ rows, err := st.DB().Query(st.Rebind(`
+ SELECT p.email_address FROM message_recipients mr
+ JOIN participants p ON p.id = mr.participant_id
+ WHERE mr.message_id = ? AND mr.recipient_type = ?
+ ORDER BY p.email_address`), msgID, typ)
+ require.NoError(t, err)
+ defer func() { _ = rows.Close() }()
+ var out []string
+ for rows.Next() {
+ var e sql.NullString
+ require.NoError(t, rows.Scan(&e))
+ out = append(out, e.String)
+ }
+ require.NoError(t, rows.Err())
+ return out
+}
+
+func conversationTitle(t *testing.T, st *store.Store, convID int64) string {
+ t.Helper()
+ var title sql.NullString
+ require.NoError(t, st.DB().QueryRow(
+ st.Rebind(`SELECT title FROM conversations WHERE id = ?`), convID).Scan(&title))
+ return title.String
+}
+
+func parseMeta(t *testing.T, m msgRow) map[string]any {
+ t.Helper()
+ require.True(t, m.metadata.Valid, "metadata should be set")
+ var out map[string]any
+ require.NoError(t, json.Unmarshal([]byte(m.metadata.String), &out))
+ return out
+}
+
+// timedEvent builds a representative timed event.
+func timedEvent(id, summary string, attendees ...gcal.Attendee) gcal.Event {
+ return gcal.Event{
+ ID: id,
+ Status: gcal.StatusConfirmed,
+ Summary: summary,
+ Location: "Room 1",
+ HTMLLink: "https://cal/" + id,
+ Organizer: gcal.Person{Email: testAccount, DisplayName: "Alice", Self: true},
+ Start: gcal.EventDateTime{DateTime: time.Date(2024, 5, 1, 16, 0, 0, 0, time.UTC), TimeZone: "UTC"},
+ End: gcal.EventDateTime{DateTime: time.Date(2024, 5, 1, 16, 30, 0, 0, time.UTC)},
+ Attendees: attendees,
+ }
+}
+
+func TestFull_PersistsEventsAsMessages(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", Summary: "Personal", AccessRole: "owner", Primary: true, TimeZone: "UTC"}}
+ ev := timedEvent("e1", "Sprint standup",
+ gcal.Attendee{Email: "bob@example.com", DisplayName: "Bob", ResponseStatus: "accepted"})
+ m.FullEvents["primary"] = [][]gcal.Event{{ev}}
+ m.FullSyncToken["primary"] = "TOKEN1"
+
+ s, st := newSyncer(t, m, Options{})
+
+ // Pre-existing email contact for Bob — the attendee must dedupe to it.
+ bobID, err := st.EnsureParticipant("bob@example.com", "Bob", "example.com")
+ require.NoError(err)
+
+ res, err := s.Full(context.Background())
+ require.NoError(err)
+ assert.Equal(1, res.CalendarsSynced)
+ assert.Equal(1, res.EventsAdded)
+
+ src := primarySource(t, st)
+ require.NotNil(src)
+ assert.Equal("TOKEN1", src.SyncCursor.String, "final sync token persisted as cursor")
+
+ row, ok := getMsg(t, st, src.ID, "e1")
+ require.True(ok, "event e1 should be persisted")
+ assert.Equal(gcal.MessageTypeCalendarEvent, row.mtype)
+ assert.Equal("Sprint standup", row.subject.String)
+ require.True(row.sentAt.Valid)
+ assert.Equal(time.Date(2024, 5, 1, 16, 0, 0, 0, time.UTC), row.sentAt.Time.UTC())
+ assert.True(row.isFromMe, "organizer is the account → is_from_me")
+ assert.False(row.deletedFromSource.Valid, "must not be soft-deleted")
+
+ meta := parseMeta(t, row)
+ assert.Equal("confirmed", meta["status"])
+ assert.Equal(false, meta["all_day"])
+ assert.Equal("primary", meta["calendar_id"])
+ assert.Equal(testAccount, meta["account_email"])
+ assert.NotEmpty(meta["end"], "interval end stored in metadata")
+
+ assert.Equal("gcal_json", rawFormat(t, st, row.id))
+
+ // Organizer is 'from'; attendee is 'to' and deduped with the email contact.
+ assert.Equal([]string{testAccount}, recipientEmails(t, st, row.id, "from"))
+ assert.Equal([]string{"bob@example.com"}, recipientEmails(t, st, row.id, "to"))
+ toID := recipientParticipantID(t, st, row.id, "to")
+ assert.Equal(bobID, toID, "attendee must reuse the existing email contact participant")
+
+ // Body carries the display name + summary but NOT the raw attendee email.
+ body := bodyText(t, st, row.id)
+ assert.Contains(body, "Sprint standup")
+ assert.Contains(body, "Bob")
+ assert.NotContains(body, "bob@example.com", "raw attendee email must not be in body_text")
+
+ // FTS to_addr indexes the raw attendee email (SQLite vtable).
+ if st.FTS5Available() && !st.IsPostgreSQL() {
+ var toAddr, ftsBody string
+ require.NoError(st.DB().QueryRow(
+ `SELECT to_addr, body FROM messages_fts WHERE message_id = ?`, row.id).Scan(&toAddr, &ftsBody))
+ assert.Contains(toAddr, "bob@example.com", "attendee email reaches FTS via to_addr")
+ assert.NotContains(ftsBody, "bob@example.com", "attendee email must not be double-encoded in FTS body")
+ }
+}
+
+func recipientParticipantID(t *testing.T, st *store.Store, msgID int64, typ string) int64 {
+ t.Helper()
+ var id int64
+ require.NoError(t, st.DB().QueryRow(st.Rebind(`
+ SELECT participant_id FROM message_recipients
+ WHERE message_id = ? AND recipient_type = ?`), msgID, typ).Scan(&id))
+ return id
+}
+
+func TestFull_IdempotentReRun(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{
+ {timedEvent("e1", "One"), timedEvent("e2", "Two")},
+ }
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+ src := primarySource(t, st)
+ first, _ := getMsg(t, st, src.ID, "e1")
+ assert.Equal(2, countMessages(t, st, src.ID))
+
+ // Re-run: no duplicate rows, stable ids.
+ _, err = s.Full(context.Background())
+ require.NoError(err)
+ assert.Equal(2, countMessages(t, st, src.ID), "re-run must not duplicate rows")
+ again, _ := getMsg(t, st, src.ID, "e1")
+ assert.Equal(first.id, again.id, "message id stable across re-sync")
+}
+
+func TestFull_AccessRoleFilter(t *testing.T) {
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{
+ {ID: "primary", AccessRole: "owner"},
+ {ID: "team", AccessRole: "writer"},
+ {ID: "holidays", AccessRole: "reader"},
+ {ID: "busy", AccessRole: "freeBusyReader"},
+ }
+ for _, id := range []string{"primary", "team", "holidays", "busy"} {
+ m.FullEvents[id] = [][]gcal.Event{{timedEvent("e-"+id, "Ev "+id)}}
+ m.FullSyncToken[id] = "T-" + id
+ }
+ s, st := newSyncer(t, m, Options{})
+
+ assert := assert.New(t)
+ require := require.New(t)
+ res, err := s.Full(context.Background())
+ require.NoError(err)
+ assert.Equal(2, res.CalendarsSynced, "default filter keeps owner+writer only")
+
+ _, err = st.GetSourceByIdentifier(testAccount + "/holidays")
+ require.ErrorIs(err, store.ErrSourceNotFound, "reader calendar must be skipped")
+ _, err = st.GetSourceByIdentifier(testAccount + "/team")
+ assert.NoError(err, "writer calendar must be synced")
+}
+
+func TestFull_InvalidMinAccessRoleRejected(t *testing.T) {
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ s, _ := newSyncer(t, m, Options{MinAccessRole: "wrtier"})
+
+ _, err := s.Full(context.Background())
+
+ require.Error(t, err)
+ assert.Contains(t, err.Error(), "invalid min access role")
+}
+
+func TestRegisterCalendars_InvalidMinAccessRoleRejected(t *testing.T) {
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ s, _ := newSyncer(t, m, Options{MinAccessRole: "wrtier"})
+
+ _, err := s.RegisterCalendars(context.Background())
+
+ require.Error(t, err)
+ assert.Contains(t, err.Error(), "invalid min access role")
+}
+
+func TestIncremental_CancellationRetainsRow(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{{timedEvent("e1", "Lunch")}}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+ src := primarySource(t, st)
+ before, ok := getMsg(t, st, src.ID, "e1")
+ require.True(ok)
+ assert.Equal("Lunch", before.subject.String)
+
+ // Incremental delta cancels e1 (delta carries only id + status, no summary).
+ m.IncEvents["T1"] = [][]gcal.Event{{{ID: "e1", Status: gcal.StatusCancelled}}}
+ m.IncNextToken["T1"] = "T2"
+
+ res, err := s.Incremental(context.Background())
+ require.NoError(err)
+ assert.Equal(1, res.EventsCancelled)
+
+ after, ok := getMsg(t, st, src.ID, "e1")
+ require.True(ok, "cancelled event must be RETAINED, not deleted")
+ assert.False(after.deletedFromSource.Valid, "deleted_from_source_at must stay NULL")
+ assert.Equal("Lunch", after.subject.String, "original subject preserved (not wiped by empty delta)")
+ assert.Equal("cancelled", parseMeta(t, after)["status"], "metadata.status flipped to cancelled")
+
+ src2 := primarySource(t, st)
+ assert.Equal("T2", src2.SyncCursor.String, "cursor advanced after incremental")
+}
+
+func TestIncremental_410TriggersFullResync(t *testing.T) {
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{{timedEvent("e1", "First")}}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+ assert := assert.New(t)
+ require := require.New(t)
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+ src := primarySource(t, st)
+ require.Equal("T1", src.SyncCursor.String)
+
+ // The stored token is now stale (410), and a fresh full listing has a new event.
+ m.GoneTokens["T1"] = true
+ m.FullEvents["primary"] = [][]gcal.Event{{
+ timedEvent("e1", "First"),
+ timedEvent("e2", "Second"),
+ }}
+ m.FullSyncToken["primary"] = "T2"
+
+ res, err := s.Incremental(context.Background())
+ require.NoError(err, "410 should self-heal into a full resync, not error out")
+ assert.GreaterOrEqual(res.CalendarsSynced, 1)
+
+ assert.Equal(2, countMessages(t, st, src.ID), "full resync repopulated after 410")
+ src2 := primarySource(t, st)
+ assert.Equal("T2", src2.SyncCursor.String, "cursor replaced with the fresh token")
+}
+
+func TestFull_RecurringMasterAndCancelledOccurrence(t *testing.T) {
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+
+ master := gcal.Event{
+ ID: "r1",
+ Status: gcal.StatusConfirmed,
+ Summary: "Weekly sync",
+ Organizer: gcal.Person{Email: testAccount, Self: true},
+ Start: gcal.EventDateTime{DateTime: time.Date(2024, 5, 2, 10, 0, 0, 0, time.UTC)},
+ End: gcal.EventDateTime{DateTime: time.Date(2024, 5, 2, 10, 30, 0, 0, time.UTC)},
+ Recurrence: []string{"RRULE:FREQ=WEEKLY;BYDAY=TH"},
+ }
+ // A single cancelled occurrence: same series, a specific original start.
+ cancelledOcc := gcal.Event{
+ ID: "r1_20240509T100000Z",
+ Status: gcal.StatusCancelled,
+ RecurringEventID: "r1",
+ OriginalStartTime: gcal.EventDateTime{DateTime: time.Date(2024, 5, 9, 10, 0, 0, 0, time.UTC)},
+ }
+ m.FullEvents["primary"] = [][]gcal.Event{{master, cancelledOcc}}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+ assert := assert.New(t)
+ require := require.New(t)
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+ src := primarySource(t, st)
+
+ // Master stored under event.id with its recurrence in metadata.
+ masterRow, ok := getMsg(t, st, src.ID, "r1")
+ require.True(ok)
+ mMeta := parseMeta(t, masterRow)
+ rec, _ := mMeta["recurrence"].([]any)
+ require.Len(rec, 1)
+ assert.Equal("RRULE:FREQ=WEEKLY;BYDAY=TH", rec[0])
+ assert.Equal("confirmed", mMeta["status"], "master is not affected by the occurrence cancellation")
+
+ // Cancelled occurrence keyed recurringEventId|originalStartTime, flagged only itself.
+ occRow, ok := getMsg(t, st, src.ID, "r1|2024-05-09T10:00:00Z")
+ require.True(ok, "cancelled occurrence stored under its derived key")
+ assert.Equal("cancelled", parseMeta(t, occRow)["status"])
+ assert.NotEqual(masterRow.id, occRow.id, "occurrence is its own row")
+ require.True(occRow.sentAt.Valid)
+ assert.Equal(time.Date(2024, 5, 9, 10, 0, 0, 0, time.UTC), occRow.sentAt.Time.UTC(),
+ "occurrence sorts at its original start time")
+}
+
+type captureEnqueuer struct{ ids []int64 }
+
+func (c *captureEnqueuer) EnqueueMessages(_ context.Context, ids []int64) error {
+ c.ids = append(c.ids, ids...)
+ return nil
+}
+
+func TestFull_EnqueuesForEmbedding(t *testing.T) {
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{{
+ timedEvent("e1", "One"), timedEvent("e2", "Two"),
+ }}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+ enq := &captureEnqueuer{}
+ s.SetEmbedEnqueuer(enq)
+
+ _, err := s.Full(context.Background())
+ require.NoError(t, err)
+ assert.Len(t, enq.ids, 2, "both events queued for embedding")
+
+ src := primarySource(t, st)
+ e1, _ := getMsg(t, st, src.ID, "e1")
+ assert.Contains(t, enq.ids, e1.id)
+}
diff --git a/internal/calsync/errors.go b/internal/calsync/errors.go
new file mode 100644
index 000000000..eb9e24931
--- /dev/null
+++ b/internal/calsync/errors.go
@@ -0,0 +1,9 @@
+package calsync
+
+import "errors"
+
+// ErrSyncTokenExpired is the Calendar analogue of sync.ErrHistoryExpired: the
+// stored syncToken is no longer valid (HTTP 410) and a full re-sync is required
+// for that calendar. Incremental sync handles it internally (clear cursor + full
+// resync); the CLI surfaces it as guidance when a manual incremental run hits it.
+var ErrSyncTokenExpired = errors.New("calendar sync token expired - run a full sync")
diff --git a/internal/calsync/incremental.go b/internal/calsync/incremental.go
new file mode 100644
index 000000000..b3cf00ec7
--- /dev/null
+++ b/internal/calsync/incremental.go
@@ -0,0 +1,206 @@
+package calsync
+
+import (
+ "context"
+ "database/sql"
+ "encoding/json"
+ "errors"
+ "fmt"
+
+ "go.kenn.io/msgvault/internal/gcal"
+ "go.kenn.io/msgvault/internal/store"
+)
+
+// Incremental syncs each of the account's existing calendar sources from its
+// stored syncToken. A calendar with no token yet is full-synced. A 410 (expired
+// token) self-heals: the cursor is cleared and the calendar is full-resynced.
+// Per-calendar failures are logged and do not abort the others.
+func (s *Syncer) Incremental(ctx context.Context) (Result, error) {
+ if err := ValidateMinAccessRole(s.opts.MinAccessRole); err != nil {
+ return Result{}, err
+ }
+ sources, err := s.store.GetSourcesByTypeAndAccount(gcal.SourceType, s.opts.AccountEmail)
+ if err != nil {
+ return Result{}, fmt.Errorf("enumerate calendar sources: %w", err)
+ }
+
+ var result Result
+ var firstErr error
+ recordErr := func(err error) {
+ s.logger.Error("calendar incremental sync failed", "error", err)
+ if firstErr == nil {
+ firstErr = err
+ }
+ }
+
+ for _, src := range sources {
+ if err := ctx.Err(); err != nil {
+ return result, err
+ }
+ cfg := parseSourceConfig(src.SyncConfig)
+ if cfg.CalendarID == "" {
+ continue
+ }
+ cal := gcal.Calendar{
+ ID: cfg.CalendarID,
+ Summary: cfg.CalendarSummary,
+ AccessRole: cfg.AccessRole,
+ Primary: cfg.Primary,
+ TimeZone: cfg.TimeZone,
+ }
+ if !s.includeCalendar(cal) {
+ continue
+ }
+
+ // No token yet → full sync.
+ if !src.SyncCursor.Valid || src.SyncCursor.String == "" {
+ if err := s.syncCalendarFull(ctx, cal, &result); err != nil {
+ recordErr(err)
+ continue
+ }
+ result.CalendarsSynced++
+ continue
+ }
+
+ err := s.incrementalCalendar(ctx, src, cal, &result)
+ if errors.Is(err, ErrSyncTokenExpired) {
+ s.logger.Warn("calendar sync token expired; running full resync", "calendar", cal.ID)
+ _ = s.store.UpdateSourceSyncCursor(src.ID, "")
+ if ferr := s.syncCalendarFull(ctx, cal, &result); ferr != nil {
+ recordErr(ferr)
+ continue
+ }
+ result.CalendarsSynced++
+ continue
+ }
+ if err != nil {
+ recordErr(err)
+ continue
+ }
+ result.CalendarsSynced++
+ }
+
+ s.enqueue(ctx, result.InsertedIDs)
+ return result, firstErr
+}
+
+// incrementalCalendar runs a single calendar's incremental sync. It advances the
+// stored cursor only when every event on the page persisted; if any event failed
+// (recorded to sync_run_items), it fails the run and leaves the cursor untouched
+// so the next sync re-delivers and retries the failed events rather than losing
+// them. A 410 surfaces as ErrSyncTokenExpired for the caller to self-heal.
+func (s *Syncer) incrementalCalendar(ctx context.Context, src *store.Source, cal gcal.Calendar, result *Result) error {
+ syncID, err := s.store.StartSync(src.ID, "incremental")
+ if err != nil {
+ return fmt.Errorf("start sync: %w", err)
+ }
+ fail := func(e error) error {
+ _ = s.store.FailSync(syncID, e.Error())
+ return e
+ }
+
+ token := src.SyncCursor.String
+ pageToken := ""
+ finalToken := ""
+ cp := store.Checkpoint{}
+
+ for {
+ if err := ctx.Err(); err != nil {
+ return fail(err)
+ }
+ page, err := s.client.ListEvents(ctx, cal.ID, gcal.EventsListParams{
+ SyncToken: token,
+ SingleEvents: false,
+ ShowDeleted: true,
+ MaxResults: 2500,
+ PageToken: pageToken,
+ })
+ if err != nil {
+ var gone *gcal.GoneError
+ if errors.As(err, &gone) {
+ _ = s.store.FailSync(syncID, ErrSyncTokenExpired.Error())
+ return ErrSyncTokenExpired
+ }
+ return fail(fmt.Errorf("events.list: %w", err))
+ }
+
+ for i := range page.Items {
+ ev := page.Items[i]
+ added, cancelled, perr := s.persistOne(src.ID, cal, ev, result)
+ if perr != nil {
+ cp.ErrorsCount++
+ s.recordItemError(syncID, ev.ID, perr)
+ continue
+ }
+ cp.MessagesProcessed++
+ if added {
+ cp.MessagesAdded++
+ }
+ if cancelled {
+ cp.MessagesUpdated++
+ }
+ }
+
+ if page.NextSyncToken != "" {
+ finalToken = page.NextSyncToken
+ }
+ pageToken = page.NextPageToken
+ cp.PageToken = pageToken
+ if err := s.store.UpdateSyncCheckpoint(syncID, &cp); err != nil {
+ return fail(fmt.Errorf("checkpoint: %w", err))
+ }
+ if pageToken == "" {
+ break
+ }
+ }
+
+ // If any event failed to persist, do NOT advance the cursor. The Calendar
+ // syncToken only re-delivers CHANGED events, so silently swallowing a failure
+ // would drop that event from the archive forever. Leaving the cursor
+ // unadvanced (and failing the run) means the next incremental sync re-delivers
+ // the same delta and retries the failed events, while the events that did
+ // persist re-upsert idempotently — mirroring the full-sync path's
+ // fail-and-resume rather than skip. A genuinely poison event then surfaces as
+ // a repeated, visible per-item error instead of silent data loss.
+ if cp.ErrorsCount > 0 {
+ return fail(fmt.Errorf("%d calendar event(s) failed to persist; sync token not advanced so they retry on the next sync", cp.ErrorsCount))
+ }
+
+ if finalToken != "" {
+ if err := s.store.UpdateSourceSyncCursor(src.ID, finalToken); err != nil {
+ return fail(fmt.Errorf("update cursor: %w", err))
+ }
+ }
+ if err := s.store.CompleteSync(syncID, finalToken); err != nil {
+ return fail(fmt.Errorf("complete sync: %w", err))
+ }
+ if err := s.store.RecomputeConversationStats(src.ID); err != nil {
+ s.logger.Warn("recompute conversation stats failed", "calendar", cal.ID, "error", err)
+ }
+ _ = s.store.CheckpointWAL()
+ s.logger.Info("calendar incremental sync complete",
+ "calendar", cal.ID, "events_processed", cp.MessagesProcessed,
+ "events_added", cp.MessagesAdded, "events_cancelled", cp.MessagesUpdated,
+ "errors", cp.ErrorsCount)
+ return nil
+}
+
+func (s *Syncer) recordItemError(syncID int64, eventID string, err error) {
+ _ = s.store.RecordSyncRunItem(store.SyncRunItem{
+ SyncRunID: syncID,
+ SourceMessageID: eventID,
+ Phase: "ingest",
+ Status: store.SyncRunItemStatusError,
+ ErrorKind: "calendar_ingest_error",
+ ErrorMessage: err.Error(),
+ })
+}
+
+// parseSourceConfig decodes a source's sync_config JSON into sourceConfig.
+func parseSourceConfig(cfg sql.NullString) sourceConfig {
+ var c sourceConfig
+ if cfg.Valid && cfg.String != "" {
+ _ = json.Unmarshal([]byte(cfg.String), &c)
+ }
+ return c
+}
diff --git a/internal/calsync/persist.go b/internal/calsync/persist.go
new file mode 100644
index 000000000..a8d2c69ce
--- /dev/null
+++ b/internal/calsync/persist.go
@@ -0,0 +1,363 @@
+package calsync
+
+import (
+ "database/sql"
+ "encoding/json"
+ "fmt"
+ "strings"
+ "time"
+
+ "go.kenn.io/msgvault/internal/gcal"
+ "go.kenn.io/msgvault/internal/store"
+)
+
+// sourceConfig is the JSON persisted in sources.sync_config for a calendar.
+type sourceConfig struct {
+ AccountEmail string `json:"account_email"`
+ CalendarID string `json:"calendar_id"`
+ CalendarSummary string `json:"calendar_summary,omitempty"`
+ AccessRole string `json:"access_role,omitempty"`
+ Primary bool `json:"primary,omitempty"`
+ TimeZone string `json:"time_zone,omitempty"`
+}
+
+func buildSourceConfigJSON(c sourceConfig) string {
+ b, err := json.Marshal(c)
+ if err != nil {
+ return "{}"
+ }
+ return string(b)
+}
+
+// eventMetadata is the structured JSON stored in messages.metadata. It carries
+// the event facts that don't fit the messages columns (the interval end, all-day
+// flag, status, recurrence rules, series linkage, and source links).
+type eventMetadata struct {
+ Status string `json:"status,omitempty"`
+ AllDay bool `json:"all_day"`
+ Start string `json:"start,omitempty"`
+ End string `json:"end,omitempty"`
+ TimeZone string `json:"time_zone,omitempty"`
+ Recurrence []string `json:"recurrence,omitempty"`
+ RecurringEventID string `json:"recurring_event_id,omitempty"`
+ OriginalStartTime string `json:"original_start_time,omitempty"`
+ ICalUID string `json:"ical_uid,omitempty"`
+ Sequence int `json:"sequence,omitempty"`
+ HTMLLink string `json:"html_link,omitempty"`
+ HangoutLink string `json:"hangout_link,omitempty"`
+ Transparency string `json:"transparency,omitempty"`
+ Visibility string `json:"visibility,omitempty"`
+ EventType string `json:"event_type,omitempty"`
+ OrganizerEmail string `json:"organizer_email,omitempty"`
+ CalendarID string `json:"calendar_id,omitempty"`
+ AccountEmail string `json:"account_email,omitempty"`
+}
+
+// ingestEvent persists a non-cancelled event through the canonical write path
+// plus the metadata helper, and indexes it for FTS/embeddings. It is idempotent
+// via UpsertMessage's ON CONFLICT(source_id, source_message_id).
+func (s *Syncer) ingestEvent(sourceID int64, cal gcal.Calendar, ev gcal.Event) (int64, error) {
+ smid := deriveSourceMessageID(ev)
+
+ // Organizer → sender, resolved through the email-keyed participant path so
+ // calendar people dedupe with email contacts.
+ var senderID int64
+ if ev.Organizer.Email != "" {
+ id, err := s.store.EnsureParticipant(ev.Organizer.Email, ev.Organizer.DisplayName, emailDomain(ev.Organizer.Email))
+ if err != nil {
+ return 0, fmt.Errorf("organizer participant: %w", err)
+ }
+ senderID = id
+ }
+
+ // Attendees → 'to' recipients + FTS toAddrs.
+ var attendeeIDs []int64
+ var attendeeNames []string
+ var attendeeEmails []string
+ for _, a := range ev.Attendees {
+ if a.Email == "" {
+ continue
+ }
+ pid, err := s.store.EnsureParticipant(a.Email, a.DisplayName, emailDomain(a.Email))
+ if err != nil {
+ return 0, fmt.Errorf("attendee participant: %w", err)
+ }
+ attendeeIDs = append(attendeeIDs, pid)
+ attendeeNames = append(attendeeNames, a.DisplayName)
+ attendeeEmails = append(attendeeEmails, a.Email)
+ }
+
+ // Only the series master (or a standalone event) sets the conversation
+ // title. A per-instance exception keeps its edited summary on its own message
+ // row, but must not overwrite the shared series title — otherwise the
+ // conversation label flaps as the master and edited instances re-deliver
+ // across syncs. Passing "" preserves the existing title (EnsureConversation
+ // only overwrites with a non-empty title).
+ convTitle := ev.Summary
+ if ev.RecurringEventID != "" {
+ convTitle = ""
+ }
+ convID, err := s.store.EnsureConversationWithType(sourceID, conversationKey(ev), gcal.ConversationType, convTitle)
+ if err != nil {
+ return 0, fmt.Errorf("ensure conversation: %w", err)
+ }
+
+ body := serializeBody(ev)
+ subject := ev.Summary
+ fromMe := ev.Organizer.Self || (ev.Organizer.Email != "" && strings.EqualFold(ev.Organizer.Email, s.opts.AccountEmail))
+
+ msgID, err := s.store.UpsertMessage(&store.Message{
+ ConversationID: convID,
+ SourceID: sourceID,
+ SourceMessageID: smid,
+ MessageType: gcal.MessageTypeCalendarEvent,
+ SentAt: eventSentAt(ev),
+ SenderID: sql.NullInt64{Int64: senderID, Valid: senderID != 0},
+ IsFromMe: fromMe,
+ Subject: sql.NullString{String: subject, Valid: subject != ""},
+ Snippet: sql.NullString{String: snippet(body), Valid: body != ""},
+ SizeEstimate: int64(len(body)),
+ })
+ if err != nil {
+ return 0, fmt.Errorf("upsert message: %w", err)
+ }
+
+ metaJSON, err := json.Marshal(buildMetadata(ev, cal, s.opts.AccountEmail))
+ if err != nil {
+ return 0, fmt.Errorf("marshal metadata: %w", err)
+ }
+ if err := s.store.SetMessageMetadata(msgID, sql.NullString{String: string(metaJSON), Valid: true}); err != nil {
+ return 0, fmt.Errorf("set metadata: %w", err)
+ }
+
+ if body != "" {
+ if err := s.store.UpsertMessageBody(msgID, sql.NullString{String: body, Valid: true}, sql.NullString{}); err != nil {
+ return 0, fmt.Errorf("upsert body: %w", err)
+ }
+ }
+
+ raw := []byte(ev.Raw)
+ if len(raw) == 0 {
+ if raw, err = json.Marshal(ev); err != nil {
+ return 0, fmt.Errorf("marshal raw event: %w", err)
+ }
+ }
+ if err := s.store.UpsertMessageRawWithFormat(msgID, raw, gcal.RawFormat); err != nil {
+ return 0, fmt.Errorf("upsert raw: %w", err)
+ }
+
+ // Replace recipients UNCONDITIONALLY (even with empty sets) so re-syncing an
+ // event that lost its organizer or all attendees clears the stale rows.
+ // ReplaceMessageRecipients DELETEs the existing rows of that type first, then
+ // no-ops the insert on an empty slice — a guarded call would skip the DELETE
+ // and leave stale 'from'/'to' rows that desync from the (always-rewritten)
+ // FTS to_addr column.
+ var fromIDs []int64
+ var fromNames []string
+ if senderID != 0 {
+ fromIDs = []int64{senderID}
+ fromNames = []string{ev.Organizer.DisplayName}
+ }
+ if err := s.store.ReplaceMessageRecipients(msgID, "from", fromIDs, fromNames); err != nil {
+ return 0, fmt.Errorf("replace from recipient: %w", err)
+ }
+ if err := s.store.ReplaceMessageRecipients(msgID, "to", attendeeIDs, attendeeNames); err != nil {
+ return 0, fmt.Errorf("replace to recipients: %w", err)
+ }
+
+ // FTS: raw attendee emails go ONLY through the toAddrs column, never the
+ // body, so BM25/ts_rank doesn't double-count them and embeddings see only
+ // semantic prose.
+ if err := s.store.UpsertFTS(msgID, subject, body, ev.Organizer.Email, strings.Join(attendeeEmails, " "), ""); err != nil {
+ s.logger.Warn("upsert calendar event fts failed", "message_id", msgID, "event_id", smid, "error", err)
+ }
+
+ return msgID, nil
+}
+
+// flagCancelled retains a cancelled event rather than soft-deleting it. If the
+// row already exists, it flips metadata.status to "cancelled" while preserving
+// every other stored field (a cancellation delta usually arrives with empty
+// summary/start, so re-upserting would wipe the archived event). If the row was
+// never seen, it inserts a minimal tombstone whose metadata records the
+// cancellation. Returns (messageID, insertedNew).
+func (s *Syncer) flagCancelled(sourceID int64, cal gcal.Calendar, ev gcal.Event) (int64, bool, error) {
+ smid := deriveSourceMessageID(ev)
+ existing, err := s.store.MessageExistsBatch(sourceID, []string{smid})
+ if err != nil {
+ return 0, false, fmt.Errorf("lookup existing event: %w", err)
+ }
+ if id, ok := existing[smid]; ok {
+ merged, err := mergeStatusCancelled(s.store, id)
+ if err != nil {
+ return 0, false, err
+ }
+ if err := s.store.SetMessageMetadata(id, merged); err != nil {
+ return 0, false, fmt.Errorf("flag cancelled metadata: %w", err)
+ }
+ return id, false, nil
+ }
+ // Never-seen cancellation: record it as a tombstone via the normal path.
+ // ev.Status == "cancelled" flows into metadata.status.
+ id, err := s.ingestEvent(sourceID, cal, ev)
+ if err != nil {
+ return 0, false, err
+ }
+ return id, true, nil
+}
+
+// mergeStatusCancelled reads a message's existing metadata, sets status to
+// "cancelled", and returns the merged JSON, preserving all other keys.
+func mergeStatusCancelled(st *store.Store, messageID int64) (sql.NullString, error) {
+ existing, err := st.GetMessageMetadata(messageID)
+ if err != nil {
+ return sql.NullString{}, fmt.Errorf("read metadata: %w", err)
+ }
+ m := map[string]any{}
+ if existing.Valid && existing.String != "" {
+ if err := json.Unmarshal([]byte(existing.String), &m); err != nil {
+ // Corrupt/absent metadata shouldn't block the cancellation flag.
+ m = map[string]any{}
+ }
+ }
+ m["status"] = gcal.StatusCancelled
+ b, err := json.Marshal(m)
+ if err != nil {
+ return sql.NullString{}, fmt.Errorf("marshal merged metadata: %w", err)
+ }
+ return sql.NullString{String: string(b), Valid: true}, nil
+}
+
+// deriveSourceMessageID is the idempotency key: a standalone event or series
+// master uses event.id; a recurring instance/exception/cancellation uses
+// recurringEventId|originalStartTime so each occurrence upserts independently
+// and a single cancelled occurrence flags only its own row.
+func deriveSourceMessageID(ev gcal.Event) string {
+ if ev.RecurringEventID != "" {
+ if key := originalStartKey(ev.OriginalStartTime); key != "" {
+ return ev.RecurringEventID + "|" + key
+ }
+ }
+ return ev.ID
+}
+
+func originalStartKey(dt gcal.EventDateTime) string {
+ if dt.Date != "" {
+ return dt.Date
+ }
+ if !dt.DateTime.IsZero() {
+ return dt.DateTime.UTC().Format(time.RFC3339)
+ }
+ return ""
+}
+
+// conversationKey groups a recurring series under one conversation; standalone
+// events each get their own.
+func conversationKey(ev gcal.Event) string {
+ if ev.RecurringEventID != "" {
+ return "event:" + ev.RecurringEventID
+ }
+ return "event:" + ev.ID
+}
+
+// eventSentAt is the universal time axis: the event start, falling back to the
+// occurrence's original start (for cancellation tombstones that omit start).
+func eventSentAt(ev gcal.Event) sql.NullTime {
+ if t, ok := ev.Start.Instant(); ok {
+ return sql.NullTime{Time: t, Valid: true}
+ }
+ if t, ok := ev.OriginalStartTime.Instant(); ok {
+ return sql.NullTime{Time: t, Valid: true}
+ }
+ return sql.NullTime{}
+}
+
+// buildMetadata projects an event into the metadata payload.
+func buildMetadata(ev gcal.Event, cal gcal.Calendar, accountEmail string) eventMetadata {
+ return eventMetadata{
+ Status: ev.Status,
+ AllDay: ev.Start.IsAllDay(),
+ Start: dateTimeString(ev.Start),
+ End: dateTimeString(ev.End),
+ TimeZone: ev.Start.TimeZone,
+ Recurrence: ev.Recurrence,
+ RecurringEventID: ev.RecurringEventID,
+ OriginalStartTime: originalStartKey(ev.OriginalStartTime),
+ ICalUID: ev.ICalUID,
+ Sequence: ev.Sequence,
+ HTMLLink: ev.HTMLLink,
+ HangoutLink: ev.HangoutLink,
+ Transparency: ev.Transparency,
+ Visibility: ev.Visibility,
+ EventType: ev.EventType,
+ OrganizerEmail: ev.Organizer.Email,
+ CalendarID: cal.ID,
+ AccountEmail: accountEmail,
+ }
+}
+
+func dateTimeString(dt gcal.EventDateTime) string {
+ if dt.Date != "" {
+ return dt.Date
+ }
+ if !dt.DateTime.IsZero() {
+ return dt.DateTime.Format(time.RFC3339)
+ }
+ return ""
+}
+
+// serializeBody is the single body_text shared by FTS body and embeddings:
+// title, time range, location, description, and attendee DISPLAY NAMES. Raw
+// attendee email addresses are deliberately excluded (they reach FTS via the
+// toAddrs column only).
+func serializeBody(ev gcal.Event) string {
+ var b strings.Builder
+ writeLine := func(s string) {
+ if s != "" {
+ b.WriteString(s)
+ b.WriteString("\n")
+ }
+ }
+ writeLine(ev.Summary)
+ writeLine(whenLine(ev))
+ if ev.Location != "" {
+ writeLine("Location: " + ev.Location)
+ }
+ writeLine(ev.Description)
+
+ var names []string
+ for _, a := range ev.Attendees {
+ if a.DisplayName != "" {
+ names = append(names, a.DisplayName)
+ }
+ }
+ if len(names) > 0 {
+ writeLine("Attendees: " + strings.Join(names, ", "))
+ }
+ return strings.TrimSpace(b.String())
+}
+
+// whenLine renders a human/searchable time range.
+func whenLine(ev gcal.Event) string {
+ start, ok := ev.Start.Instant()
+ if !ok {
+ return ""
+ }
+ if ev.Start.IsAllDay() {
+ return "When: " + start.Format("2006-01-02") + " (all day)"
+ }
+ if end, ok := ev.End.Instant(); ok {
+ return "When: " + start.Format("2006-01-02 15:04") + " - " + end.Format("2006-01-02 15:04")
+ }
+ return "When: " + start.Format("2006-01-02 15:04")
+}
+
+// snippet is a short preview derived from the body.
+func snippet(body string) string {
+ const maxSnippetLength = 200
+ body = strings.TrimSpace(body)
+ if len(body) <= maxSnippetLength {
+ return body
+ }
+ return body[:maxSnippetLength]
+}
diff --git a/internal/calsync/realclient_integration_test.go b/internal/calsync/realclient_integration_test.go
new file mode 100644
index 000000000..42fdcab9c
--- /dev/null
+++ b/internal/calsync/realclient_integration_test.go
@@ -0,0 +1,305 @@
+package calsync
+
+// End-to-end integration test driving the REAL gcal.Client over a loopback
+// httptest.Server (not the in-memory mock) against byte-realistic Google
+// Calendar API v3 JSON, through the real calsync pipeline into a real store.
+// It exercises everything except the literal TCP-to-Google and OAuth token
+// (the parts that require interactive consent): HTTP client unmarshal,
+// pagination, the rate limiter, persist, metadata, FTS, recipients, raw bytes,
+// and the incremental create/cancel cycle.
+//
+// Fixtures follow the documented wire shapes (verified against
+// developers.google.com/workspace/calendar/api/v3/reference/events and
+// .../guides/sync): a cancelled exception instance carries only id +
+// recurringEventId + originalStartTime; a deleted standalone carries only id +
+// status=cancelled; nextSyncToken appears only on the final page; incremental
+// responses always include cancellations and never carry timeMin/timeMax.
+
+import (
+ "context"
+ "database/sql"
+ "encoding/json"
+ "net/http"
+ "net/http/httptest"
+ "strings"
+ "testing"
+ "time"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+ "golang.org/x/oauth2"
+
+ "go.kenn.io/msgvault/internal/gcal"
+ "go.kenn.io/msgvault/internal/store"
+ "go.kenn.io/msgvault/internal/testutil"
+)
+
+type calFakeServer struct {
+ calendarListJSON string
+ fullPages map[string]string // keyed by incoming pageToken ("" = first page)
+ incPages map[string]string // keyed by incoming syncToken
+}
+
+func (cs *calFakeServer) start(t *testing.T) *httptest.Server {
+ t.Helper()
+ return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ w.Header().Set("Content-Type", "application/json")
+ switch {
+ case r.URL.Path == "/users/me/calendarList":
+ _, _ = w.Write([]byte(cs.calendarListJSON))
+ case strings.HasSuffix(r.URL.Path, "/events"):
+ q := r.URL.Query()
+ assert.Equal(t, "false", q.Get("singleEvents"), "must request masters")
+ assert.Equal(t, "true", q.Get("showDeleted"), "must include cancellations")
+ if tok := q.Get("syncToken"); tok != "" {
+ assert.Empty(t, q.Get("timeMin"), "timeMin must not accompany syncToken")
+ assert.Empty(t, q.Get("timeMax"), "timeMax must not accompany syncToken")
+ body, ok := cs.incPages[tok]
+ if !assert.Truef(t, ok, "no incremental page seeded for syncToken %q", tok) {
+ return
+ }
+ _, _ = w.Write([]byte(body))
+ return
+ }
+ body, ok := cs.fullPages[q.Get("pageToken")]
+ if !assert.Truef(t, ok, "no full page seeded for pageToken %q", q.Get("pageToken")) {
+ return
+ }
+ _, _ = w.Write([]byte(body))
+ default:
+ assert.Failf(t, "unexpected request path", "%q", r.URL.Path)
+ }
+ }))
+}
+
+func realCalClient(t *testing.T, srv *httptest.Server) gcal.API {
+ t.Helper()
+ ts := oauth2.StaticTokenSource(&oauth2.Token{AccessToken: "test-token"})
+ return gcal.NewClient(ts, gcal.WithBaseURL(srv.URL), gcal.WithHTTPClient(srv.Client()))
+}
+
+type persistedEvent struct {
+ id int64
+ mtype string
+ subject string
+ sentAt sql.NullTime
+ isFromMe bool
+ meta map[string]any
+ deleted bool
+ body string
+ to []string
+ rawFmt string
+}
+
+func loadEvent(t *testing.T, st *store.Store, sourceID int64, smid string) (persistedEvent, bool) {
+ t.Helper()
+ var p persistedEvent
+ var subj, metaStr sql.NullString
+ var deletedAt sql.NullTime
+ err := st.DB().QueryRow(st.Rebind(`
+ SELECT id, message_type, subject, sent_at, is_from_me, metadata, deleted_from_source_at
+ FROM messages WHERE source_id = ? AND source_message_id = ?`), sourceID, smid).
+ Scan(&p.id, &p.mtype, &subj, &p.sentAt, &p.isFromMe, &metaStr, &deletedAt)
+ if err == sql.ErrNoRows {
+ return persistedEvent{}, false
+ }
+ require.NoError(t, err)
+ p.subject = subj.String
+ p.deleted = deletedAt.Valid
+ if metaStr.Valid {
+ _ = json.Unmarshal([]byte(metaStr.String), &p.meta)
+ }
+ var body sql.NullString
+ _ = st.DB().QueryRow(st.Rebind(`SELECT body_text FROM message_bodies WHERE message_id = ?`), p.id).Scan(&body)
+ p.body = body.String
+ _ = st.DB().QueryRow(st.Rebind(`SELECT raw_format FROM message_raw WHERE message_id = ?`), p.id).Scan(&p.rawFmt)
+ p.to = recipientEmails(t, st, p.id, "to")
+ return p, true
+}
+
+func TestRealClient_DozenEvents_FullThenIncremental(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ const account = "alice@example.com"
+
+ cs := &calFakeServer{
+ calendarListJSON: `{"items":[
+ {"id":"alice@example.com","summary":"Alice","timeZone":"America/Los_Angeles","accessRole":"owner","primary":true},
+ {"id":"holidays@group.v.calendar.google.com","summary":"US Holidays","accessRole":"reader"}
+ ]}`,
+ fullPages: map[string]string{},
+ incPages: map[string]string{},
+ }
+
+ cs.fullPages[""] = `{"items":[
+ {"id":"evt_timed","status":"confirmed","summary":"Sprint Standup","location":"Room 4",
+ "htmlLink":"https://www.google.com/calendar/event?eid=abc","hangoutLink":"https://meet.google.com/abc-defg-hij",
+ "iCalUID":"evt_timed@google.com","sequence":2,"transparency":"opaque","visibility":"default",
+ "organizer":{"email":"alice@example.com","displayName":"Alice Account","self":true},
+ "start":{"dateTime":"2024-05-01T09:00:00-07:00","timeZone":"America/Los_Angeles"},
+ "end":{"dateTime":"2024-05-01T09:30:00-07:00","timeZone":"America/Los_Angeles"},
+ "attendees":[
+ {"email":"alice@example.com","displayName":"Alice Account","responseStatus":"accepted","organizer":true,"self":true},
+ {"email":"bob@example.com","displayName":"Bob Builder","responseStatus":"accepted"},
+ {"email":"carol@example.com","displayName":"Carol Singer","responseStatus":"tentative"}
+ ]},
+ {"id":"evt_allday","status":"confirmed","summary":"Company Offsite",
+ "organizer":{"email":"alice@example.com","self":true},
+ "start":{"date":"2024-06-10"},"end":{"date":"2024-06-13"}},
+ {"id":"evt_tentative","status":"tentative","summary":"Maybe Lunch",
+ "organizer":{"email":"dave@example.com","displayName":"Dave External"},
+ "start":{"dateTime":"2024-05-03T12:00:00Z"},"end":{"dateTime":"2024-05-03T13:00:00Z"},
+ "attendees":[{"email":"alice@example.com","self":true,"responseStatus":"needsAction"}]},
+ {"id":"evt_unicode","status":"confirmed","summary":"Café résumé 会議 — plan",
+ "description":"Discuss the Ωmega rollout.","location":"Zürich",
+ "organizer":{"email":"alice@example.com","self":true},
+ "start":{"dateTime":"2024-05-04T15:00:00Z"},"end":{"dateTime":"2024-05-04T16:00:00Z"}}
+ ],"nextPageToken":"PAGE2"}`
+
+ cs.fullPages["PAGE2"] = `{"items":[
+ {"id":"evt_master","status":"confirmed","summary":"Weekly 1:1","location":"Office",
+ "organizer":{"email":"alice@example.com","self":true},
+ "start":{"dateTime":"2024-05-06T10:00:00Z"},"end":{"dateTime":"2024-05-06T10:30:00Z"},
+ "recurrence":["RRULE:FREQ=WEEKLY;BYDAY=MO","EXDATE;TZID=UTC:20240520T100000Z"],
+ "attendees":[{"email":"bob@example.com","displayName":"Bob Builder","responseStatus":"accepted"}]},
+ {"id":"evt_master_20240513T100000Z","status":"confirmed","summary":"Weekly 1:1 (moved)",
+ "organizer":{"email":"alice@example.com","self":true},
+ "recurringEventId":"evt_master",
+ "originalStartTime":{"dateTime":"2024-05-13T10:00:00Z"},
+ "start":{"dateTime":"2024-05-13T14:00:00Z"},"end":{"dateTime":"2024-05-13T14:30:00Z"}},
+ {"id":"evt_master_20240527T100000Z","status":"cancelled","recurringEventId":"evt_master",
+ "originalStartTime":{"dateTime":"2024-05-27T10:00:00Z"}},
+ {"id":"evt_solo","status":"confirmed","summary":"Focus block",
+ "organizer":{"email":"alice@example.com","self":true},
+ "start":{"dateTime":"2024-05-07T13:00:00Z"},"end":{"dateTime":"2024-05-07T15:00:00Z"}}
+ ],"nextSyncToken":"SYNC_AFTER_FULL"}`
+
+ srv := cs.start(t)
+ defer srv.Close()
+
+ st := testutil.NewTestStore(t)
+ // Scope to the primary calendar so the fake server's page keying is
+ // unambiguous; the access-role filter itself is covered by a mock test.
+ syncer := New(realCalClient(t, srv), st, Options{
+ AccountEmail: account,
+ Calendars: []string{account},
+ }).WithLogger(quietLogger())
+
+ res, err := syncer.Full(context.Background())
+ require.NoError(err)
+
+ src, err := st.GetSourceByIdentifier(account + "/" + account)
+ require.NoError(err)
+ require.NotNil(src)
+ assert.Equal("SYNC_AFTER_FULL", src.SyncCursor.String, "final sync token persisted as cursor")
+
+ type expect struct {
+ smid string
+ subject string
+ isFromMe bool
+ allDay any
+ status string
+ toEmails []string
+ }
+ cases := []expect{
+ // Google lists the organizer (Alice, self) among attendees too, so she is
+ // faithfully stored as both 'from' and a 'to' recipient — the full invite
+ // list is preserved rather than silently dropping self.
+ {"evt_timed", "Sprint Standup", true, false, "confirmed", []string{"alice@example.com", "bob@example.com", "carol@example.com"}},
+ {"evt_allday", "Company Offsite", true, true, "confirmed", nil},
+ {"evt_tentative", "Maybe Lunch", false, false, "tentative", nil},
+ {"evt_unicode", "Café résumé 会議 — plan", true, false, "confirmed", nil},
+ {"evt_master", "Weekly 1:1", true, false, "confirmed", []string{"bob@example.com"}},
+ {"evt_master|2024-05-13T10:00:00Z", "Weekly 1:1 (moved)", true, false, "confirmed", nil},
+ {"evt_master|2024-05-27T10:00:00Z", "", false, false, "cancelled", nil},
+ {"evt_solo", "Focus block", true, false, "confirmed", nil},
+ }
+
+ t.Logf("=== persisted calendar events (real client over loopback) ===")
+ t.Logf("%-36s %-24s %-7s %-7s %-9s", "source_message_id", "subject", "from_me", "all_day", "status")
+ for _, c := range cases {
+ p, ok := loadEvent(t, st, src.ID, c.smid)
+ require.Truef(ok, "event %s should be persisted", c.smid)
+ t.Logf("%-36s %-24s %-7v %-7v %-9s", c.smid, truncateRunes(p.subject, 24), p.isFromMe, p.meta["all_day"], p.meta["status"])
+
+ assert.Equalf(gcal.MessageTypeCalendarEvent, p.mtype, "%s message_type", c.smid)
+ if c.subject != "" {
+ assert.Equalf(c.subject, p.subject, "%s subject", c.smid)
+ }
+ assert.Equalf(c.isFromMe, p.isFromMe, "%s is_from_me", c.smid)
+ assert.Equalf(c.status, p.meta["status"], "%s metadata.status", c.smid)
+ assert.Equalf(c.allDay, p.meta["all_day"], "%s metadata.all_day", c.smid)
+ assert.Equalf("gcal_json", p.rawFmt, "%s raw_format", c.smid)
+ assert.Falsef(p.deleted, "%s must not be soft-deleted", c.smid)
+ if c.toEmails != nil {
+ assert.Equalf(c.toEmails, p.to, "%s to-recipients", c.smid)
+ for _, e := range c.toEmails {
+ assert.NotContainsf(p.body, e, "%s body must not contain raw attendee email", c.smid)
+ }
+ }
+ }
+
+ // Cancelled occurrence retained, sorts at its ORIGINAL start time.
+ cancelled, ok := loadEvent(t, st, src.ID, "evt_master|2024-05-27T10:00:00Z")
+ require.True(ok)
+ require.True(cancelled.sentAt.Valid)
+ assert.Equal(time.Date(2024, 5, 27, 10, 0, 0, 0, time.UTC), cancelled.sentAt.Time.UTC())
+
+ // All-day start normalized to midnight UTC.
+ allday, _ := loadEvent(t, st, src.ID, "evt_allday")
+ require.True(allday.sentAt.Valid)
+ assert.Equal(time.Date(2024, 6, 10, 0, 0, 0, 0, time.UTC), allday.sentAt.Time.UTC())
+
+ // Master keeps RRULE + EXDATE.
+ master, _ := loadEvent(t, st, src.ID, "evt_master")
+ rec, _ := master.meta["recurrence"].([]any)
+ assert.Len(rec, 2, "RRULE + EXDATE preserved in metadata")
+ assert.Equal("https://meet.google.com/abc-defg-hij", func() any {
+ ev, _ := loadEvent(t, st, src.ID, "evt_timed")
+ return ev.meta["hangout_link"]
+ }(), "hangout link preserved in metadata")
+
+ assert.Equal(7, res.EventsAdded)
+ assert.Equal(1, res.EventsCancelled)
+
+ // ---- INCREMENTAL: create a new event + cancel an existing standalone ----
+ cs.incPages["SYNC_AFTER_FULL"] = `{"items":[
+ {"id":"evt_new","status":"confirmed","summary":"Newly created",
+ "organizer":{"email":"alice@example.com","self":true},
+ "start":{"dateTime":"2024-05-10T08:00:00Z"},"end":{"dateTime":"2024-05-10T08:30:00Z"},
+ "attendees":[{"email":"erin@example.com","displayName":"Erin Eve","responseStatus":"accepted"}]},
+ {"id":"evt_solo","status":"cancelled"}
+ ],"nextSyncToken":"SYNC_AFTER_INC"}`
+
+ res2, err := syncer.Incremental(context.Background())
+ require.NoError(err)
+ assert.Equal(1, res2.EventsAdded)
+ assert.Equal(1, res2.EventsCancelled)
+
+ newEv, ok := loadEvent(t, st, src.ID, "evt_new")
+ require.True(ok, "newly created event should be ingested by incremental")
+ assert.Equal("Newly created", newEv.subject)
+ assert.Equal([]string{"erin@example.com"}, newEv.to)
+
+ solo, ok := loadEvent(t, st, src.ID, "evt_solo")
+ require.True(ok, "cancelled event must be retained, not deleted")
+ assert.False(solo.deleted, "deleted_from_source_at must stay NULL")
+ assert.Equal("Focus block", solo.subject, "original subject preserved despite empty cancellation delta")
+ assert.Equal("cancelled", solo.meta["status"])
+
+ src2, _ := st.GetSourceByIdentifier(account + "/" + account)
+ assert.Equal("SYNC_AFTER_INC", src2.SyncCursor.String, "cursor advanced after incremental")
+
+ // A subsequent empty incremental is a clean no-op.
+ cs.incPages["SYNC_AFTER_INC"] = `{"items":[],"nextSyncToken":"SYNC_AFTER_INC"}`
+ _, err = syncer.Incremental(context.Background())
+ require.NoError(err)
+}
+
+func truncateRunes(s string, n int) string {
+ r := []rune(s)
+ if len(r) <= n {
+ return s
+ }
+ return string(r[:n])
+}
diff --git a/internal/calsync/review_fixes_test.go b/internal/calsync/review_fixes_test.go
new file mode 100644
index 000000000..5ebf791c9
--- /dev/null
+++ b/internal/calsync/review_fixes_test.go
@@ -0,0 +1,430 @@
+package calsync
+
+import (
+ "context"
+ "errors"
+ "testing"
+ "time"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+
+ "go.kenn.io/msgvault/internal/gcal"
+ "go.kenn.io/msgvault/internal/store"
+ "go.kenn.io/msgvault/internal/testutil"
+)
+
+// TestFull_LimitDoesNotAdvanceCursor is the regression for the silent-data-loss
+// bug where a --limit full sync on a single-page calendar captured the final
+// nextSyncToken and advanced the cursor past the un-ingested events, so the next
+// incremental sync would never see them.
+func TestFull_LimitDoesNotAdvanceCursor(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ // One page of 5 events (the common single-page case) with a terminal token.
+ var evs []gcal.Event
+ for i := range 5 {
+ evs = append(evs, timedEvent("ev"+string(rune('a'+i)), "Event"))
+ }
+ m.FullEvents["primary"] = [][]gcal.Event{evs}
+ m.FullSyncToken["primary"] = "TOKEN1"
+
+ s, st := newSyncer(t, m, Options{Limit: 2})
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+
+ src := primarySource(t, st)
+ assert.Equal(2, countMessages(t, st, src.ID), "only --limit events ingested")
+ assert.Empty(src.SyncCursor.String, "a --limit run must NOT advance the incremental cursor")
+
+ // A later unlimited full sync re-traverses everything and DOES advance.
+ s2 := New(m, st, Options{AccountEmail: testAccount}).WithLogger(quietLogger())
+ _, err = s2.Full(context.Background())
+ require.NoError(err)
+ src = primarySource(t, st)
+ assert.Equal(5, countMessages(t, st, src.ID), "unlimited run ingests all events")
+ assert.Equal("TOKEN1", src.SyncCursor.String, "complete run advances the cursor")
+}
+
+// TestReingest_ClearsStaleRecipients is the regression for stale 'from'/'to'
+// rows surviving when an event loses its organizer or all attendees on re-sync.
+func TestReingest_ClearsStaleRecipients(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{{timedEvent("e1", "Meeting",
+ gcal.Attendee{Email: "bob@example.com", DisplayName: "Bob"},
+ gcal.Attendee{Email: "carol@example.com", DisplayName: "Carol"})}}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+ src := primarySource(t, st)
+ row, ok := getMsg(t, st, src.ID, "e1")
+ require.True(ok)
+ require.Equal([]string{"bob@example.com", "carol@example.com"}, recipientEmails(t, st, row.id, "to"))
+ require.Equal([]string{testAccount}, recipientEmails(t, st, row.id, "from"))
+
+ // Re-sync the same event id with NO organizer and NO attendees.
+ m.FullEvents["primary"] = [][]gcal.Event{{{
+ ID: "e1",
+ Status: gcal.StatusConfirmed,
+ Summary: "Meeting (now solo)",
+ Start: gcal.EventDateTime{DateTime: time.Date(2024, 5, 1, 16, 0, 0, 0, time.UTC)},
+ End: gcal.EventDateTime{DateTime: time.Date(2024, 5, 1, 16, 30, 0, 0, time.UTC)},
+ }}}
+ _, err = s.Full(context.Background())
+ require.NoError(err)
+
+ row2, ok := getMsg(t, st, src.ID, "e1")
+ require.True(ok)
+ assert.Equal(row.id, row2.id, "same message row updated")
+ assert.Empty(recipientEmails(t, st, row2.id, "to"), "stale attendee rows must be cleared")
+ assert.Empty(recipientEmails(t, st, row2.id, "from"), "stale organizer row must be cleared")
+}
+
+// TestFull_DuplicateAttendeesDoNotCollide is the regression for the production
+// UNIQUE-constraint crash: a calendar event that lists the same person twice (a
+// duplicate attendee entry, common on busy recurring work/personal calendars)
+// made the 'to' recipient insert collide on
+// (message_id, participant_id, recipient_type) and aborted the ENTIRE calendar's
+// sync. The duplicate must collapse to one 'to' row and the sync must succeed.
+func TestFull_DuplicateAttendeesDoNotCollide(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{{timedEvent("dup1", "Standup",
+ gcal.Attendee{Email: "bob@example.com", DisplayName: "Bob"},
+ gcal.Attendee{Email: "bob@example.com", DisplayName: "Bob (again)"},
+ gcal.Attendee{Email: "carol@example.com", DisplayName: "Carol"})}}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+ _, err := s.Full(context.Background())
+ require.NoError(err, "an event with duplicate attendees must not abort the sync")
+
+ src := primarySource(t, st)
+ row, ok := getMsg(t, st, src.ID, "dup1")
+ require.True(ok)
+ assert.Equal([]string{"bob@example.com", "carol@example.com"}, recipientEmails(t, st, row.id, "to"),
+ "a duplicate attendee collapses to a single 'to' row")
+}
+
+// TestFull_BoundedSyncDoesNotAdvanceCursor is the regression (from the
+// adversarial audit) for silent data loss when a time-bounded full sync
+// (--after/--before) established an incremental baseline scoped to only that
+// window: future incremental syncs carry no time bounds, so out-of-window events
+// would never be archived. A bounded run must ingest its window but NOT advance
+// the cursor — exactly like a --limit run.
+func TestFull_BoundedSyncDoesNotAdvanceCursor(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{{
+ timedEvent("a", "A"), timedEvent("b", "B"),
+ }}
+ m.FullSyncToken["primary"] = "TOKEN1"
+
+ s, st := newSyncer(t, m, Options{TimeMin: "2024-01-01T00:00:00Z"})
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+
+ src := primarySource(t, st)
+ assert.Equal(2, countMessages(t, st, src.ID), "bounded full sync still ingests in-window events")
+ assert.Empty(src.SyncCursor.String, "a time-bounded run must NOT advance the incremental cursor")
+}
+
+func TestIncremental_AppliesAccessRoleSelection(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{
+ {ID: "primary", AccessRole: "owner"},
+ {ID: "holidays", AccessRole: "reader"},
+ }
+ m.FullEvents["primary"] = [][]gcal.Event{{timedEvent("p1", "Primary")}}
+ m.FullEvents["holidays"] = [][]gcal.Event{{timedEvent("h1", "Holiday")}}
+ m.FullSyncToken["primary"] = "P1"
+ m.FullSyncToken["holidays"] = "H1"
+
+ s, st := newSyncer(t, m, Options{AllCalendars: true})
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+
+ primary, err := st.GetSourceByIdentifier(testAccount + "/primary")
+ require.NoError(err)
+ holidays, err := st.GetSourceByIdentifier(testAccount + "/holidays")
+ require.NoError(err)
+ require.Equal("P1", primary.SyncCursor.String)
+ require.Equal("H1", holidays.SyncCursor.String)
+
+ m.IncEvents["P1"] = [][]gcal.Event{{timedEvent("p2", "Primary delta")}}
+ m.IncNextToken["P1"] = "P2"
+ m.IncEvents["H1"] = [][]gcal.Event{{timedEvent("h2", "Holiday delta")}}
+ m.IncNextToken["H1"] = "H2"
+
+ defaultSelection := New(m, st, Options{AccountEmail: testAccount}).WithLogger(quietLogger())
+ res, err := defaultSelection.Incremental(context.Background())
+
+ require.NoError(err)
+ assert.Equal(1, res.CalendarsSynced, "default incremental sync should keep owner+writer only")
+ assert.Equal(2, countMessages(t, st, primary.ID), "owner calendar should receive its delta")
+ assert.Equal(1, countMessages(t, st, holidays.ID), "reader calendar should be skipped by default")
+
+ holidays, err = st.GetSourceByIdentifier(testAccount + "/holidays")
+ require.NoError(err)
+ assert.Equal("H1", holidays.SyncCursor.String, "skipped reader calendar cursor must not advance")
+}
+
+func TestFull_FTSFailureDoesNotAbortCalendarSync(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{{timedEvent("e1", "FTS failure")}}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+ if !st.FTS5Available() || st.IsPostgreSQL() {
+ t.Skip("SQLite FTS5-specific regression")
+ }
+ _, err := st.DB().Exec("DROP TABLE messages_fts")
+ require.NoError(err, "break FTS table")
+
+ res, err := s.Full(context.Background())
+
+ require.NoError(err, "FTS indexing failure must not abort event persistence")
+ assert.Equal(1, res.EventsAdded)
+ src := primarySource(t, st)
+ _, ok := getMsg(t, st, src.ID, "e1")
+ assert.True(ok, "event row should still be persisted")
+ assert.Equal("T1", src.SyncCursor.String, "cursor should advance after durable event persistence")
+}
+
+func TestFull_BoundedInterruptedRunDoesNotWriteResumePageToken(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{
+ {timedEvent("e1", "One")},
+ {timedEvent("e2", "Two")},
+ }
+ m.FullSyncToken["primary"] = "T1"
+
+ st := testutil.NewTestStore(t)
+ failing := &listEventsRecorder{
+ MockAPI: m,
+ failOnCall: 2,
+ err: errors.New("boom"),
+ }
+ bounded := New(failing, st, Options{
+ AccountEmail: testAccount,
+ TimeMin: "2025-01-01T00:00:00Z",
+ }).WithLogger(quietLogger())
+
+ _, err := bounded.Full(context.Background())
+ require.Error(err)
+
+ src := primarySource(t, st)
+ run, err := st.GetLatestSync(src.ID)
+ require.NoError(err)
+ assert.False(run.CursorBefore.Valid && run.CursorBefore.String != "",
+ "bounded failed run must not leave a resumable page token")
+
+ recorder := &listEventsRecorder{MockAPI: m}
+ unbounded := New(recorder, st, Options{AccountEmail: testAccount}).WithLogger(quietLogger())
+ _, err = unbounded.Full(context.Background())
+ require.NoError(err)
+ require.NotEmpty(recorder.params)
+ assert.Empty(recorder.params[0].PageToken, "later unbounded sync must restart from the first page")
+}
+
+func TestFull_LegacyResumeCheckpointIgnored(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{
+ {timedEvent("e1", "One")},
+ {timedEvent("e2", "Two")},
+ }
+ m.FullSyncToken["primary"] = "T1"
+
+ st := testutil.NewTestStore(t)
+ src, err := st.GetOrCreateSource(gcal.SourceType, testAccount+"/primary")
+ require.NoError(err)
+ require.NoError(st.UpdateSourceSyncConfig(src.ID, `{"account_email":"`+testAccount+`","calendar_id":"primary"}`))
+ oldSyncID, err := st.StartSync(src.ID, "full")
+ require.NoError(err)
+ require.NoError(st.UpdateSyncCheckpoint(oldSyncID, &store.Checkpoint{
+ PageToken: "1", MessagesProcessed: 2, MessagesAdded: 2,
+ }))
+
+ recorder := &listEventsRecorder{MockAPI: m}
+ s := New(recorder, st, Options{AccountEmail: testAccount}).WithLogger(quietLogger())
+ _, err = s.Full(context.Background())
+ require.NoError(err)
+
+ require.NotEmpty(recorder.params)
+ assert.Empty(recorder.params[0].PageToken, "legacy raw page-token checkpoints are not resumable")
+}
+
+// TestFull_RecurringConversationTitleFromMasterNotException is the regression
+// (from the adversarial audit) for the series conversation title flapping: an
+// edited per-instance exception's summary used to overwrite the shared series
+// title (last-writer-wins). The exception keeps its own subject, but the series
+// conversation title must come from the master.
+func TestFull_RecurringConversationTitleFromMasterNotException(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+
+ master := gcal.Event{
+ ID: "r1",
+ Status: gcal.StatusConfirmed,
+ Summary: "Weekly sync",
+ Organizer: gcal.Person{Email: testAccount, Self: true},
+ Start: gcal.EventDateTime{DateTime: time.Date(2024, 5, 2, 10, 0, 0, 0, time.UTC)},
+ End: gcal.EventDateTime{DateTime: time.Date(2024, 5, 2, 10, 30, 0, 0, time.UTC)},
+ Recurrence: []string{"RRULE:FREQ=WEEKLY;BYDAY=TH"},
+ }
+ // Confirmed exception with an EDITED summary, delivered AFTER the master —
+ // the order that previously overwrote the series title.
+ exception := gcal.Event{
+ ID: "r1_20240509T100000Z",
+ Status: gcal.StatusConfirmed,
+ Summary: "Weekly sync — MOVED to Zoom",
+ Organizer: gcal.Person{Email: testAccount, Self: true},
+ RecurringEventID: "r1",
+ OriginalStartTime: gcal.EventDateTime{DateTime: time.Date(2024, 5, 9, 10, 0, 0, 0, time.UTC)},
+ Start: gcal.EventDateTime{DateTime: time.Date(2024, 5, 9, 15, 0, 0, 0, time.UTC)},
+ End: gcal.EventDateTime{DateTime: time.Date(2024, 5, 9, 15, 30, 0, 0, time.UTC)},
+ }
+ m.FullEvents["primary"] = [][]gcal.Event{{master, exception}}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+ src := primarySource(t, st)
+
+ masterRow, ok := getMsg(t, st, src.ID, "r1")
+ require.True(ok)
+ excRow, ok := getMsg(t, st, src.ID, "r1|2024-05-09T10:00:00Z")
+ require.True(ok)
+ require.Equal(masterRow.convID, excRow.convID, "master and exception share one series conversation")
+
+ assert.Equal("Weekly sync — MOVED to Zoom", excRow.subject.String,
+ "the edited summary stays on the exception's own message row")
+ assert.Equal("Weekly sync", conversationTitle(t, st, masterRow.convID),
+ "the series conversation title comes from the master, not an edited instance")
+}
+
+// TestIncremental_PersistErrorDoesNotAdvanceCursor is the regression for the
+// incremental path silently dropping an event that fails to persist: the cursor
+// must stay put so the next sync re-delivers and retries it.
+func TestIncremental_PersistErrorDoesNotAdvanceCursor(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{{timedEvent("e1", "First")}}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+ _, err := s.Full(context.Background())
+ require.NoError(err)
+ src := primarySource(t, st)
+ require.Equal("T1", src.SyncCursor.String)
+
+ // Force a persist failure during the incremental: drop a table the ingest
+ // path writes to, so the new event's UpsertMessageRawWithFormat errors.
+ _, err = st.DB().Exec("DROP TABLE message_raw")
+ require.NoError(err)
+
+ m.IncEvents["T1"] = [][]gcal.Event{{timedEvent("e2", "Second")}}
+ m.IncNextToken["T1"] = "T2"
+
+ _, err = s.Incremental(context.Background())
+ require.Error(err, "a per-item persist failure must surface, not be swallowed")
+
+ src2 := primarySource(t, st)
+ assert.Equal("T1", src2.SyncCursor.String, "cursor must NOT advance past an event that failed to persist")
+}
+
+// TestFull_ResumeSupersedesAndSeedsCounters is the regression for the resume
+// path: it must go through StartSync (so a stale/concurrent running run is
+// superseded under the writer lock, not shared) and carry the prior run's
+// counters forward so a resumed run's stats are not reset to zero.
+func TestFull_ResumeSupersedesAndSeedsCounters(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+
+ m := gcal.NewMockAPI()
+ m.Calendars = []gcal.Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]gcal.Event{{timedEvent("e1", "One")}}
+ m.FullSyncToken["primary"] = "T1"
+
+ s, st := newSyncer(t, m, Options{})
+
+ // Simulate an interrupted prior run: a 'running' sync_run with 2 already
+ // processed, checkpointed to resume from the first page ("").
+ src, err := st.GetOrCreateSource(gcal.SourceType, testAccount+"/primary")
+ require.NoError(err)
+ require.NoError(st.UpdateSourceSyncConfig(src.ID, `{"account_email":"`+testAccount+`","calendar_id":"primary"}`))
+ oldSyncID, err := st.StartSync(src.ID, "full")
+ require.NoError(err)
+ require.NoError(st.UpdateSyncCheckpoint(oldSyncID, &store.Checkpoint{
+ PageToken: encodeCalendarFullCheckpoint(""), MessagesProcessed: 2, MessagesAdded: 2,
+ }))
+
+ _, err = s.Full(context.Background())
+ require.NoError(err)
+
+ // The old run was superseded (no longer 'running').
+ var oldStatus string
+ require.NoError(st.DB().QueryRow(
+ st.Rebind("SELECT status FROM sync_runs WHERE id = ?"), oldSyncID).Scan(&oldStatus))
+ assert.NotEqual(store.SyncStatusRunning, oldStatus, "resume must supersede the prior running run via StartSync")
+
+ // The completed run's counter includes the 2 seeded + 1 newly ingested.
+ var processed int64
+ require.NoError(st.DB().QueryRow(st.Rebind(
+ "SELECT messages_processed FROM sync_runs WHERE source_id = ? AND status = 'completed' ORDER BY id DESC LIMIT 1"),
+ src.ID).Scan(&processed))
+ assert.Equal(int64(3), processed, "resumed run seeds prior counters (2) + new (1)")
+}
+
+type listEventsRecorder struct {
+ *gcal.MockAPI
+
+ params []gcal.EventsListParams
+ failOnCall int
+ err error
+}
+
+func (r *listEventsRecorder) ListEvents(ctx context.Context, calendarID string, p gcal.EventsListParams) (*gcal.EventsPage, error) {
+ r.params = append(r.params, p)
+ if r.failOnCall > 0 && len(r.params) == r.failOnCall {
+ return nil, r.err
+ }
+ return r.MockAPI.ListEvents(ctx, calendarID, p)
+}
diff --git a/internal/config/config.go b/internal/config/config.go
index 9180e1520..5a9854ab8 100644
--- a/internal/config/config.go
+++ b/internal/config/config.go
@@ -110,6 +110,7 @@ type Config struct {
Identity IdentityConfig `toml:"identity"`
Accounts []AccountSchedule `toml:"accounts"`
SynctechSMS SynctechSMSConfig `toml:"synctech_sms"`
+ GCal []GCalSource `toml:"gcal"`
// Computed paths (not from config file)
HomeDir string `toml:"-"`
@@ -273,6 +274,7 @@ func NewDefaultConfig() *Config {
},
Accounts: []AccountSchedule{},
SynctechSMS: SynctechSMSConfig{Sources: []SynctechSMSSource{}},
+ GCal: []GCalSource{},
}
cfg.Vector.ApplyDefaults()
return cfg
@@ -366,6 +368,7 @@ func Load(path, homeDir string) (*Config, error) {
// an explicit false in the file stays false.
cfg.Vector.ApplyDefaults()
cfg.applySynctechSMSDefaults()
+ cfg.applyGCalDefaults()
return cfg, nil
}
@@ -566,6 +569,50 @@ func (c *Config) GetAccountSchedule(email string) *AccountSchedule {
return nil
}
+// GCalSource is one configured Google Calendar sync target. Each entry is a
+// top-level [[gcal]] table.
+type GCalSource struct {
+ Name string `toml:"name"` // identifier for sync-calendar ; defaults to Email
+ Email string `toml:"email"` // the OAuth account = token key
+ OAuthApp string `toml:"oauth_app"` // optional named OAuth app
+ Calendars []string `toml:"calendars"` // optional calendarId filter; empty = owner+writer
+ Schedule string `toml:"schedule"` // 5-field cron; empty = not daemon-scheduled
+ Enabled bool `toml:"enabled"`
+}
+
+// applyGCalDefaults normalizes [[gcal]] entries: a source with no name takes its
+// email, so `sync-calendar ` resolves it.
+func (c *Config) applyGCalDefaults() {
+ for i := range c.GCal {
+ if c.GCal[i].Name == "" {
+ c.GCal[i].Name = c.GCal[i].Email
+ }
+ }
+}
+
+// GetGCalSource returns the configured calendar source matching name or email
+// (case-insensitive), or nil.
+func (c *Config) GetGCalSource(name string) *GCalSource {
+ for _, src := range c.GCal {
+ if strings.EqualFold(src.Name, name) || strings.EqualFold(src.Email, name) {
+ cp := src
+ return &cp
+ }
+ }
+ return nil
+}
+
+// ScheduledGCalSources returns enabled calendar sources with a cron schedule.
+func (c *Config) ScheduledGCalSources() []GCalSource {
+ var out []GCalSource
+ for _, src := range c.GCal {
+ if src.Enabled && src.Schedule != "" {
+ out = append(out, src)
+ }
+ }
+ return out
+}
+
func (c *Config) GetSynctechSMSSource(name string) *SynctechSMSSource {
for _, src := range c.SynctechSMS.Sources {
if strings.EqualFold(src.Name, name) {
diff --git a/internal/config/config_gcal_test.go b/internal/config/config_gcal_test.go
new file mode 100644
index 000000000..6470e6f94
--- /dev/null
+++ b/internal/config/config_gcal_test.go
@@ -0,0 +1,58 @@
+package config
+
+import (
+ "os"
+ "path/filepath"
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+)
+
+func TestLoadGCalSources(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ tmpDir := t.TempDir()
+ t.Setenv("MSGVAULT_HOME", tmpDir)
+
+ configContent := `
+[[gcal]]
+email = "dan@example.com"
+oauth_app = "work"
+calendars = ["primary", "team@group.calendar.google.com"]
+schedule = "0 */6 * * *"
+enabled = true
+
+[[gcal]]
+name = "secondary"
+email = "alt@example.com"
+enabled = false
+`
+ configPath := filepath.Join(tmpDir, "config.toml")
+ require.NoError(os.WriteFile(configPath, []byte(configContent), 0644))
+
+ cfg, err := Load(configPath, "")
+ require.NoError(err)
+ require.Len(cfg.GCal, 2)
+
+ // First entry: name defaults to email (applyGCalDefaults).
+ first := cfg.GCal[0]
+ assert.Equal("dan@example.com", first.Name, "name defaults to email")
+ assert.Equal("dan@example.com", first.Email)
+ assert.Equal("work", first.OAuthApp)
+ assert.Equal([]string{"primary", "team@group.calendar.google.com"}, first.Calendars)
+ assert.True(first.Enabled)
+
+ assert.Equal("secondary", cfg.GCal[1].Name, "explicit name preserved")
+
+ // Lookup by name and by email.
+ require.NotNil(cfg.GetGCalSource("dan@example.com"))
+ require.NotNil(cfg.GetGCalSource("secondary"))
+ assert.Equal("alt@example.com", cfg.GetGCalSource("secondary").Email)
+ assert.Nil(cfg.GetGCalSource("nobody"))
+
+ // Only the enabled+scheduled source is daemon-scheduled.
+ scheduled := cfg.ScheduledGCalSources()
+ require.Len(scheduled, 1)
+ assert.Equal("dan@example.com", scheduled[0].Email)
+}
diff --git a/internal/gcal/api.go b/internal/gcal/api.go
new file mode 100644
index 000000000..7fe151b97
--- /dev/null
+++ b/internal/gcal/api.go
@@ -0,0 +1,37 @@
+package gcal
+
+import "context"
+
+// CalendarReader enumerates an account's calendars.
+type CalendarReader interface {
+ // ListCalendars returns one page of the account's calendar list. Pass an
+ // empty pageToken for the first page; follow CalendarListPage.NextPageToken
+ // until it is empty.
+ ListCalendars(ctx context.Context, pageToken string) (*CalendarListPage, error)
+}
+
+// EventReader reads events from a single calendar.
+type EventReader interface {
+ // ListEvents returns one page of events for the given calendar. The traversal
+ // is driven by EventsListParams.PageToken; the final page carries
+ // EventsPage.NextSyncToken for the next incremental sync. A 410 (expired
+ // syncToken) surfaces as *GoneError.
+ ListEvents(ctx context.Context, calendarID string, params EventsListParams) (*EventsPage, error)
+ // GetEvent fetches a single event by id (used for tombstone reconciliation
+ // fallbacks). A missing event surfaces as *NotFoundError.
+ GetEvent(ctx context.Context, calendarID, eventID string) (*Event, error)
+}
+
+// API is the full read-only Calendar surface the sync layer depends on. The
+// concrete *Client and the in-memory *MockAPI both satisfy it.
+type API interface {
+ CalendarReader
+ EventReader
+ // Close releases client resources.
+ Close() error
+}
+
+var (
+ _ API = (*Client)(nil)
+ _ API = (*MockAPI)(nil)
+)
diff --git a/internal/gcal/client.go b/internal/gcal/client.go
new file mode 100644
index 000000000..6587e248b
--- /dev/null
+++ b/internal/gcal/client.go
@@ -0,0 +1,327 @@
+package gcal
+
+import (
+ "context"
+ "encoding/json"
+ "errors"
+ "fmt"
+ "io"
+ "log/slog"
+ "math/rand"
+ "net/http"
+ "net/url"
+ "strconv"
+ "time"
+
+ "golang.org/x/oauth2"
+
+ "go.kenn.io/msgvault/internal/gmail"
+)
+
+const (
+ defaultBaseURL = "https://www.googleapis.com/calendar/v3"
+ maxRetries = 12 // ~10 minutes of network outages, matching the Gmail client
+ maxBackoff = 600 // seconds
+ // defaultMaxResults is the events.list page size. 2500 is the API max,
+ // minimizing round-trips (and quota cost) on large calendars.
+ defaultMaxResults = 2500
+)
+
+// Client is the concrete read-only Calendar API client.
+type Client struct {
+ httpClient *http.Client
+ rateLimiter *gmail.RateLimiter
+ logger *slog.Logger
+ baseURL string
+}
+
+// ClientOption configures a Client.
+type ClientOption func(*Client)
+
+// WithLogger sets the structured logger.
+func WithLogger(logger *slog.Logger) ClientOption {
+ return func(c *Client) {
+ if logger != nil {
+ c.logger = logger
+ }
+ }
+}
+
+// WithRateLimiter overrides the default Calendar rate limiter.
+func WithRateLimiter(rl *gmail.RateLimiter) ClientOption {
+ return func(c *Client) {
+ if rl != nil {
+ c.rateLimiter = rl
+ }
+ }
+}
+
+// WithBaseURL overrides the API base URL. Useful for pointing the client at a
+// test server or a proxy/regional endpoint.
+func WithBaseURL(u string) ClientOption {
+ return func(c *Client) {
+ if u != "" {
+ c.baseURL = u
+ }
+ }
+}
+
+// WithHTTPClient overrides the HTTP client. The supplied client is used as-is,
+// so the caller is responsible for attaching credentials when needed (the test
+// server accepts any token).
+func WithHTTPClient(h *http.Client) ClientOption {
+ return func(c *Client) {
+ if h != nil {
+ c.httpClient = h
+ }
+ }
+}
+
+// NewClient creates a Calendar client. The token source is wrapped with
+// oauth2.NewClient so the bearer token auto-attaches and auto-refreshes; the
+// client never touches token internals. The default rate limiter is sized for
+// the Calendar API's per-user budget (600 req/min/user ≈ 10 req/s), not Gmail's.
+func NewClient(tokenSource oauth2.TokenSource, opts ...ClientOption) *Client {
+ c := &Client{
+ httpClient: oauth2.NewClient(context.Background(), tokenSource),
+ logger: slog.Default(),
+ baseURL: defaultBaseURL,
+ rateLimiter: gmail.NewRateLimiterWithCapacity(10, 8),
+ }
+ for _, opt := range opts {
+ opt(c)
+ }
+ if c.rateLimiter == nil {
+ c.rateLimiter = gmail.NewRateLimiterWithCapacity(10, 8)
+ }
+ return c
+}
+
+// Close releases resources held by the client.
+func (c *Client) Close() error { return nil }
+
+// NotFoundError indicates a 404 response (e.g. a deleted event on GetEvent).
+type NotFoundError struct{ Path string }
+
+func (e *NotFoundError) Error() string { return "not found: " + e.Path }
+
+// GoneError indicates an HTTP 410 — the Calendar analogue of Gmail's 404 on a
+// stale historyId. It means the supplied syncToken has expired and the caller
+// must clear the cursor and run a fresh full sync.
+type GoneError struct{ Path string }
+
+func (e *GoneError) Error() string { return "gone (410, sync token expired): " + e.Path }
+
+// request performs a rate-limited HTTP request with retry/backoff, mirroring the
+// Gmail client's loop. It retries network errors, 429, quota-403, and 5xx with
+// full-jitter exponential backoff; it does not retry permission-403, 401, 404,
+// 410, or other 4xx. The op selects the quota cost on the shared limiter.
+func (c *Client) request(ctx context.Context, op gmail.Operation, method, path string) ([]byte, error) {
+ if err := c.rateLimiter.Acquire(ctx, op); err != nil {
+ return nil, fmt.Errorf("rate limit: %w", err)
+ }
+
+ reqURL := c.baseURL + path
+
+ var lastErr error
+ for attempt := 0; attempt <= maxRetries; attempt++ {
+ if attempt > 0 {
+ backoff := c.calculateBackoff(attempt)
+ c.logger.Debug("retrying calendar request", "attempt", attempt, "backoff", backoff, "path", path)
+ select {
+ case <-ctx.Done():
+ return nil, ctx.Err()
+ case <-time.After(backoff):
+ }
+ }
+
+ req, err := http.NewRequestWithContext(ctx, method, reqURL, io.Reader(nil))
+ if err != nil {
+ return nil, fmt.Errorf("create request: %w", err)
+ }
+
+ resp, err := c.httpClient.Do(req)
+ if err != nil {
+ lastErr = fmt.Errorf("http request: %w", err)
+ continue // retry on network errors
+ }
+
+ respBody, err := io.ReadAll(resp.Body)
+ _ = resp.Body.Close()
+ if err != nil {
+ lastErr = fmt.Errorf("read response: %w", err)
+ continue
+ }
+
+ if resp.StatusCode >= 200 && resp.StatusCode < 300 {
+ return respBody, nil
+ }
+
+ switch resp.StatusCode {
+ case http.StatusTooManyRequests:
+ c.logger.Debug("calendar rate limited, backing off 30s", "path", path, "attempt", attempt)
+ c.rateLimiter.Throttle(30 * time.Second)
+ lastErr = errors.New("rate limited (429)")
+ continue
+
+ case http.StatusForbidden:
+ if isRateLimitError(respBody) {
+ c.logger.Debug("calendar quota exceeded, backing off 60s", "path", path, "attempt", attempt)
+ c.rateLimiter.Throttle(60 * time.Second)
+ lastErr = errors.New("quota exceeded (403)")
+ continue
+ }
+ return nil, fmt.Errorf("forbidden (403): %s", string(respBody))
+
+ case http.StatusInternalServerError, http.StatusBadGateway,
+ http.StatusServiceUnavailable, http.StatusGatewayTimeout:
+ lastErr = fmt.Errorf("server error (%d)", resp.StatusCode)
+ continue
+
+ case http.StatusUnauthorized:
+ return nil, errors.New("unauthorized (401): token may be invalid")
+
+ case http.StatusGone:
+ return nil, &GoneError{Path: path}
+
+ case http.StatusNotFound:
+ return nil, &NotFoundError{Path: path}
+
+ default:
+ return nil, fmt.Errorf("request failed (%d): %s", resp.StatusCode, string(respBody))
+ }
+ }
+
+ return nil, fmt.Errorf("max retries exceeded: %w", lastErr)
+}
+
+// calculateBackoff returns full-jitter exponential backoff for a retry attempt.
+func (c *Client) calculateBackoff(attempt int) time.Duration {
+ base := float64(uint(1) << uint(attempt))
+ if base > maxBackoff {
+ base = maxBackoff
+ }
+ jittered := rand.Float64() * base //nolint:gosec // retry spread, not security-sensitive
+ return time.Duration(jittered * float64(time.Second))
+}
+
+// isRateLimitError reports whether a 403 body is a quota/rate-limit error
+// (retryable) rather than a genuine permission error (terminal). Calendar uses
+// the same error envelope as Gmail: error.errors[].reason / error.status.
+func isRateLimitError(body []byte) bool {
+ var parsed struct {
+ Error struct {
+ Status string `json:"status"`
+ Errors []struct {
+ Reason string `json:"reason"`
+ Domain string `json:"domain"`
+ } `json:"errors"`
+ } `json:"error"`
+ }
+ if err := json.Unmarshal(body, &parsed); err != nil {
+ return false
+ }
+ if parsed.Error.Status == "RESOURCE_EXHAUSTED" {
+ return true
+ }
+ for _, e := range parsed.Error.Errors {
+ switch e.Reason {
+ case "rateLimitExceeded", "userRateLimitExceeded", "quotaExceeded", "RATE_LIMIT_EXCEEDED":
+ return true
+ }
+ if e.Domain == "usageLimits" {
+ return true
+ }
+ }
+ return false
+}
+
+// ListCalendars returns one page of the account's calendar list.
+func (c *Client) ListCalendars(ctx context.Context, pageToken string) (*CalendarListPage, error) {
+ v := url.Values{}
+ v.Set("maxResults", "250")
+ v.Set("showHidden", "true")
+ if pageToken != "" {
+ v.Set("pageToken", pageToken)
+ }
+ body, err := c.request(ctx, gmail.OpCalendarListList, http.MethodGet, "/users/me/calendarList?"+v.Encode())
+ if err != nil {
+ return nil, fmt.Errorf("calendarList.list: %w", err)
+ }
+
+ var wire wireCalendarList
+ if err := json.Unmarshal(body, &wire); err != nil {
+ return nil, fmt.Errorf("decode calendarList: %w", err)
+ }
+ page := &CalendarListPage{NextPageToken: wire.NextPageToken}
+ for i := range wire.Items {
+ page.Items = append(page.Items, wire.Items[i].toCalendar())
+ }
+ return page, nil
+}
+
+// ListEvents returns one page of events for a calendar. See EventsListParams.
+func (c *Client) ListEvents(ctx context.Context, calendarID string, p EventsListParams) (*EventsPage, error) {
+ v := url.Values{}
+ maxResults := p.MaxResults
+ if maxResults <= 0 {
+ maxResults = defaultMaxResults
+ }
+ v.Set("maxResults", strconv.Itoa(maxResults))
+ v.Set("singleEvents", strconv.FormatBool(p.SingleEvents))
+ if p.ShowDeleted {
+ v.Set("showDeleted", "true")
+ }
+ if p.SyncToken != "" {
+ // Incremental: syncToken is mutually exclusive with timeMin/timeMax.
+ v.Set("syncToken", p.SyncToken)
+ } else {
+ if p.TimeMin != "" {
+ v.Set("timeMin", p.TimeMin)
+ }
+ if p.TimeMax != "" {
+ v.Set("timeMax", p.TimeMax)
+ }
+ }
+ if p.PageToken != "" {
+ v.Set("pageToken", p.PageToken)
+ }
+
+ path := "/calendars/" + url.PathEscape(calendarID) + "/events?" + v.Encode()
+ body, err := c.request(ctx, gmail.OpEventsList, http.MethodGet, path)
+ if err != nil {
+ return nil, fmt.Errorf("events.list: %w", err)
+ }
+
+ var wire wireEvents
+ if err := json.Unmarshal(body, &wire); err != nil {
+ return nil, fmt.Errorf("decode events: %w", err)
+ }
+ page := &EventsPage{
+ NextPageToken: wire.NextPageToken,
+ NextSyncToken: wire.NextSyncToken,
+ TimeZone: wire.TimeZone,
+ }
+ for _, raw := range wire.Items {
+ ev, err := decodeEvent(raw)
+ if err != nil {
+ return nil, fmt.Errorf("decode event: %w", err)
+ }
+ page.Items = append(page.Items, ev)
+ }
+ return page, nil
+}
+
+// GetEvent fetches a single event by id.
+func (c *Client) GetEvent(ctx context.Context, calendarID, eventID string) (*Event, error) {
+ path := "/calendars/" + url.PathEscape(calendarID) + "/events/" + url.PathEscape(eventID)
+ body, err := c.request(ctx, gmail.OpEventsGet, http.MethodGet, path)
+ if err != nil {
+ return nil, fmt.Errorf("events.get: %w", err)
+ }
+ ev, err := decodeEvent(body)
+ if err != nil {
+ return nil, fmt.Errorf("decode event: %w", err)
+ }
+ return &ev, nil
+}
diff --git a/internal/gcal/client_test.go b/internal/gcal/client_test.go
new file mode 100644
index 000000000..86df62771
--- /dev/null
+++ b/internal/gcal/client_test.go
@@ -0,0 +1,264 @@
+package gcal
+
+import (
+ "context"
+ "net/http"
+ "net/http/httptest"
+ "sync/atomic"
+ "testing"
+ "time"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+ "golang.org/x/oauth2"
+
+ "go.kenn.io/msgvault/internal/gmail"
+)
+
+func testClient(t *testing.T, srv *httptest.Server) *Client {
+ t.Helper()
+ ts := oauth2.StaticTokenSource(&oauth2.Token{AccessToken: "test-token"})
+ return NewClient(ts,
+ WithBaseURL(srv.URL),
+ WithHTTPClient(srv.Client()),
+ // A roomy limiter so multi-call tests don't block on the bucket.
+ WithRateLimiter(gmail.NewRateLimiterWithCapacity(100, 100)),
+ )
+}
+
+func TestClient_ListCalendars(t *testing.T) {
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ assert.Equal(t, "/users/me/calendarList", r.URL.Path)
+ w.Header().Set("Content-Type", "application/json")
+ _, _ = w.Write([]byte(`{
+ "items": [
+ {"id":"primary","summary":"Personal","timeZone":"America/Los_Angeles","accessRole":"owner","primary":true},
+ {"id":"team@group.calendar.google.com","summary":"Team","summaryOverride":"My Team","accessRole":"writer"},
+ {"id":"holidays","summary":"Holidays","accessRole":"reader"}
+ ]
+ }`))
+ }))
+ defer srv.Close()
+
+ assert := assert.New(t)
+ require := require.New(t)
+ page, err := testClient(t, srv).ListCalendars(context.Background(), "")
+ require.NoError(err)
+ require.Len(page.Items, 3)
+ assert.Equal("primary", page.Items[0].ID)
+ assert.True(page.Items[0].Primary)
+ assert.Equal("owner", page.Items[0].AccessRole)
+ // summaryOverride wins over summary.
+ assert.Equal("My Team", page.Items[1].Summary)
+ assert.Equal("reader", page.Items[2].AccessRole)
+}
+
+func TestClient_ListEvents_PaginationAndSyncToken(t *testing.T) {
+ var calls atomic.Int32
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ calls.Add(1)
+ assert.Equal(t, "/calendars/primary/events", r.URL.Path)
+ // Full sync: no syncToken on the first page.
+ q := r.URL.Query()
+ assert.Equal(t, "false", q.Get("singleEvents"))
+ w.Header().Set("Content-Type", "application/json")
+ switch q.Get("pageToken") {
+ case "":
+ // First page: a timed event + an all-day event, with a nextPageToken
+ // and NO syncToken (sync token must only appear on the final page).
+ _, _ = w.Write([]byte(`{
+ "items": [
+ {"id":"e1","status":"confirmed","summary":"Standup","location":"Room 1",
+ "organizer":{"email":"alice@example.com","displayName":"Alice","self":true},
+ "start":{"dateTime":"2024-05-01T09:00:00-07:00","timeZone":"America/Los_Angeles"},
+ "end":{"dateTime":"2024-05-01T09:30:00-07:00"},
+ "attendees":[{"email":"bob@example.com","displayName":"Bob","responseStatus":"accepted"}]},
+ {"id":"e2","status":"confirmed","summary":"Vacation",
+ "start":{"date":"2024-06-10"},"end":{"date":"2024-06-15"}}
+ ],
+ "nextPageToken":"p2"
+ }`))
+ case "p2":
+ // Final page: a recurring master + a cancelled instance, with the
+ // terminal nextSyncToken.
+ _, _ = w.Write([]byte(`{
+ "items": [
+ {"id":"r1","status":"confirmed","summary":"Weekly sync",
+ "start":{"dateTime":"2024-05-02T10:00:00Z"},"end":{"dateTime":"2024-05-02T10:30:00Z"},
+ "recurrence":["RRULE:FREQ=WEEKLY;BYDAY=TH"]},
+ {"id":"r1_20240509T100000Z","status":"cancelled","recurringEventId":"r1",
+ "originalStartTime":{"dateTime":"2024-05-09T10:00:00Z"}}
+ ],
+ "nextSyncToken":"SYNC_TOKEN_FINAL"
+ }`))
+ default:
+ assert.Failf(t, "unexpected pageToken", "%q", q.Get("pageToken"))
+ }
+ }))
+ defer srv.Close()
+
+ c := testClient(t, srv)
+ ctx := context.Background()
+
+ assert := assert.New(t)
+ require := require.New(t)
+
+ p1, err := c.ListEvents(ctx, "primary", EventsListParams{SingleEvents: false, ShowDeleted: true, MaxResults: 2500})
+ require.NoError(err)
+ require.Len(p1.Items, 2)
+ assert.Empty(p1.NextSyncToken, "sync token must not appear before the final page")
+ assert.Equal("p2", p1.NextPageToken)
+
+ // Timed event mapping.
+ e1 := p1.Items[0]
+ assert.Equal("Standup", e1.Summary)
+ assert.Equal("alice@example.com", e1.Organizer.Email)
+ assert.True(e1.Organizer.Self)
+ require.Len(e1.Attendees, 1)
+ assert.Equal("bob@example.com", e1.Attendees[0].Email)
+ inst, ok := e1.Start.Instant()
+ require.True(ok)
+ assert.Equal(time.Date(2024, 5, 1, 16, 0, 0, 0, time.UTC), inst.UTC())
+ assert.False(e1.Start.IsAllDay())
+
+ // All-day event mapping.
+ e2 := p1.Items[1]
+ assert.True(e2.Start.IsAllDay())
+ assert.Equal("2024-06-10", e2.Start.Date)
+ allDay, ok := e2.Start.Instant()
+ require.True(ok)
+ assert.Equal(time.Date(2024, 6, 10, 0, 0, 0, 0, time.UTC), allDay)
+
+ // Final page carries the sync token.
+ p2, err := c.ListEvents(ctx, "primary", EventsListParams{
+ SingleEvents: false, ShowDeleted: true, MaxResults: 2500, PageToken: p1.NextPageToken,
+ })
+ require.NoError(err)
+ require.Len(p2.Items, 2)
+ assert.Equal("SYNC_TOKEN_FINAL", p2.NextSyncToken)
+ assert.Empty(p2.NextPageToken)
+
+ // Recurring master + cancelled instance mapping.
+ master := p2.Items[0]
+ require.Len(master.Recurrence, 1)
+ assert.Equal("RRULE:FREQ=WEEKLY;BYDAY=TH", master.Recurrence[0])
+ cancelled := p2.Items[1]
+ assert.True(cancelled.IsCancelled())
+ assert.Equal("r1", cancelled.RecurringEventID)
+ assert.Equal(time.Date(2024, 5, 9, 10, 0, 0, 0, time.UTC), cancelled.OriginalStartTime.DateTime.UTC())
+
+ assert.Equal(int32(2), calls.Load())
+}
+
+func TestClient_ListEvents_IncrementalParams(t *testing.T) {
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ q := r.URL.Query()
+ // Incremental: syncToken present, timeMin/timeMax MUST NOT be sent.
+ assert.Equal(t, "PRIOR_TOKEN", q.Get("syncToken"))
+ assert.Empty(t, q.Get("timeMin"), "timeMin must not accompany syncToken")
+ assert.Empty(t, q.Get("timeMax"), "timeMax must not accompany syncToken")
+ w.Header().Set("Content-Type", "application/json")
+ _, _ = w.Write([]byte(`{"items":[],"nextSyncToken":"NEW_TOKEN"}`))
+ }))
+ defer srv.Close()
+
+ page, err := testClient(t, srv).ListEvents(context.Background(), "primary", EventsListParams{
+ SyncToken: "PRIOR_TOKEN", SingleEvents: false, ShowDeleted: true,
+ // These must be dropped because SyncToken is set.
+ TimeMin: "2024-01-01T00:00:00Z", TimeMax: "2024-12-31T00:00:00Z",
+ })
+ require.NoError(t, err)
+ assert.Equal(t, "NEW_TOKEN", page.NextSyncToken)
+}
+
+func TestClient_ListEvents_Gone410(t *testing.T) {
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
+ w.WriteHeader(http.StatusGone)
+ _, _ = w.Write([]byte(`{"error":{"code":410,"message":"Sync token is no longer valid"}}`))
+ }))
+ defer srv.Close()
+
+ _, err := testClient(t, srv).ListEvents(context.Background(), "primary", EventsListParams{SyncToken: "STALE"})
+ var gone *GoneError
+ require.ErrorAs(t, err, &gone, "expected *GoneError, got %v", err)
+}
+
+func TestClient_GetEvent_NotFound404(t *testing.T) {
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
+ w.WriteHeader(http.StatusNotFound)
+ _, _ = w.Write([]byte(`{"error":{"code":404,"message":"Not Found"}}`))
+ }))
+ defer srv.Close()
+
+ _, err := testClient(t, srv).GetEvent(context.Background(), "primary", "missing")
+ var nf *NotFoundError
+ require.ErrorAs(t, err, &nf, "expected *NotFoundError, got %v", err)
+}
+
+func TestClient_Retry429ThenSuccess(t *testing.T) {
+ var calls atomic.Int32
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
+ if calls.Add(1) == 1 {
+ w.WriteHeader(http.StatusTooManyRequests)
+ _, _ = w.Write([]byte(`{"error":{"code":429}}`))
+ return
+ }
+ w.Header().Set("Content-Type", "application/json")
+ _, _ = w.Write([]byte(`{"items":[],"nextSyncToken":"OK"}`))
+ }))
+ defer srv.Close()
+
+ page, err := testClient(t, srv).ListEvents(context.Background(), "primary", EventsListParams{})
+ require.NoError(t, err)
+ assert.Equal(t, "OK", page.NextSyncToken)
+ assert.GreaterOrEqual(t, calls.Load(), int32(2), "should have retried")
+}
+
+func TestClient_QuotaForbiddenRetries_PermissionForbiddenTerminal(t *testing.T) {
+ t.Run("quota 403 retries", func(t *testing.T) {
+ var calls atomic.Int32
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
+ if calls.Add(1) == 1 {
+ w.WriteHeader(http.StatusForbidden)
+ _, _ = w.Write([]byte(`{"error":{"code":403,"errors":[{"reason":"rateLimitExceeded","domain":"usageLimits"}]}}`))
+ return
+ }
+ w.Header().Set("Content-Type", "application/json")
+ _, _ = w.Write([]byte(`{"items":[],"nextSyncToken":"OK"}`))
+ }))
+ defer srv.Close()
+
+ _, err := testClient(t, srv).ListEvents(context.Background(), "primary", EventsListParams{})
+ require.NoError(t, err)
+ assert.GreaterOrEqual(t, calls.Load(), int32(2))
+ })
+
+ t.Run("permission 403 is terminal", func(t *testing.T) {
+ var calls atomic.Int32
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
+ calls.Add(1)
+ w.WriteHeader(http.StatusForbidden)
+ _, _ = w.Write([]byte(`{"error":{"code":403,"errors":[{"reason":"insufficientPermissions"}]}}`))
+ }))
+ defer srv.Close()
+
+ _, err := testClient(t, srv).ListEvents(context.Background(), "primary", EventsListParams{})
+ require.Error(t, err)
+ assert.Equal(t, int32(1), calls.Load(), "permission error must not retry")
+ })
+}
+
+func TestClient_ContextCancelled(t *testing.T) {
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
+ w.Header().Set("Content-Type", "application/json")
+ _, _ = w.Write([]byte(`{"items":[]}`))
+ }))
+ defer srv.Close()
+
+ ctx, cancel := context.WithCancel(context.Background())
+ cancel() // already cancelled
+
+ _, err := testClient(t, srv).ListCalendars(ctx, "")
+ require.Error(t, err)
+ assert.ErrorIs(t, err, context.Canceled)
+}
diff --git a/internal/gcal/mock.go b/internal/gcal/mock.go
new file mode 100644
index 000000000..08c137251
--- /dev/null
+++ b/internal/gcal/mock.go
@@ -0,0 +1,146 @@
+package gcal
+
+import (
+ "context"
+ "strconv"
+ "sync"
+)
+
+// MockAPI is an in-memory, deterministic fake of the Calendar API for tests.
+// It owns pagination so tests describe pages as plain event slices (no token
+// bookkeeping). It supports multi-page full sync, incremental deltas keyed by
+// the incoming syncToken, first-class 410 injection, error injection, and call
+// counting. Re-running a full sync (PageToken="") restarts pagination, so
+// idempotency tests work without resetting state.
+type MockAPI struct {
+ mu sync.Mutex
+
+ // Calendars is returned (single page) by ListCalendars.
+ Calendars []Calendar
+
+ // FullEvents[calendarID] is the ordered pages of events for a full sync
+ // (no syncToken). FullSyncToken[calendarID] is the NextSyncToken delivered
+ // on the final full page.
+ FullEvents map[string][][]Event
+ FullSyncToken map[string]string
+
+ // IncEvents[incomingSyncToken] is the ordered pages returned when a caller
+ // lists with that syncToken. IncNextToken[incomingSyncToken] is the
+ // NextSyncToken delivered on the final incremental page.
+ IncEvents map[string][][]Event
+ IncNextToken map[string]string
+
+ // GoneTokens[incomingSyncToken]=true makes ListEvents return *GoneError.
+ GoneTokens map[string]bool
+
+ // EventsByID[calendarID][eventID] backs GetEvent.
+ EventsByID map[string]map[string]Event
+
+ // Injectable errors (returned before any work).
+ ListCalendarsErr error
+ ListEventsErr error
+ GetEventErr error
+
+ // Call counters (read under the mutex via the accessor methods).
+ listCalendarsCalls int
+ listEventsCalls int
+ getEventCalls int
+}
+
+// NewMockAPI returns an empty mock with initialized maps.
+func NewMockAPI() *MockAPI {
+ return &MockAPI{
+ FullEvents: map[string][][]Event{},
+ FullSyncToken: map[string]string{},
+ IncEvents: map[string][][]Event{},
+ IncNextToken: map[string]string{},
+ GoneTokens: map[string]bool{},
+ EventsByID: map[string]map[string]Event{},
+ }
+}
+
+// ListCalendars returns all seeded calendars in a single page.
+func (m *MockAPI) ListCalendars(_ context.Context, _ string) (*CalendarListPage, error) {
+ m.mu.Lock()
+ defer m.mu.Unlock()
+ m.listCalendarsCalls++
+ if m.ListCalendarsErr != nil {
+ return nil, m.ListCalendarsErr
+ }
+ items := make([]Calendar, len(m.Calendars))
+ copy(items, m.Calendars)
+ return &CalendarListPage{Items: items}, nil
+}
+
+// ListEvents serves full or incremental pages depending on params.SyncToken.
+func (m *MockAPI) ListEvents(_ context.Context, calendarID string, p EventsListParams) (*EventsPage, error) {
+ m.mu.Lock()
+ defer m.mu.Unlock()
+ m.listEventsCalls++
+ if m.ListEventsErr != nil {
+ return nil, m.ListEventsErr
+ }
+
+ if p.SyncToken != "" {
+ if m.GoneTokens[p.SyncToken] {
+ return nil, &GoneError{Path: "events:" + calendarID}
+ }
+ return pageAt(m.IncEvents[p.SyncToken], pageIdx(p.PageToken), m.IncNextToken[p.SyncToken]), nil
+ }
+ return pageAt(m.FullEvents[calendarID], pageIdx(p.PageToken), m.FullSyncToken[calendarID]), nil
+}
+
+// GetEvent returns a seeded event by id, or *NotFoundError.
+func (m *MockAPI) GetEvent(_ context.Context, calendarID, eventID string) (*Event, error) {
+ m.mu.Lock()
+ defer m.mu.Unlock()
+ m.getEventCalls++
+ if m.GetEventErr != nil {
+ return nil, m.GetEventErr
+ }
+ if byID, ok := m.EventsByID[calendarID]; ok {
+ if ev, ok := byID[eventID]; ok {
+ return &ev, nil
+ }
+ }
+ return nil, &NotFoundError{Path: "events/" + eventID}
+}
+
+// Close is a no-op for the mock.
+func (m *MockAPI) Close() error { return nil }
+
+// ListCalendarsCalls/ListEventsCalls/GetEventCalls return call counts.
+func (m *MockAPI) ListCalendarsCalls() int {
+ m.mu.Lock()
+ defer m.mu.Unlock()
+ return m.listCalendarsCalls
+}
+func (m *MockAPI) ListEventsCalls() int { m.mu.Lock(); defer m.mu.Unlock(); return m.listEventsCalls }
+func (m *MockAPI) GetEventCalls() int { m.mu.Lock(); defer m.mu.Unlock(); return m.getEventCalls }
+
+// pageIdx decodes the mock's pageToken (a stringified index; "" == 0).
+func pageIdx(token string) int {
+ if token == "" {
+ return 0
+ }
+ if n, err := strconv.Atoi(token); err == nil && n >= 0 {
+ return n
+ }
+ return 0
+}
+
+// pageAt returns the page at idx. The final page (or an empty/over-range
+// traversal) carries finalToken as NextSyncToken; earlier pages carry a
+// NextPageToken pointing at idx+1.
+func pageAt(pages [][]Event, idx int, finalToken string) *EventsPage {
+ if idx >= len(pages) {
+ return &EventsPage{NextSyncToken: finalToken}
+ }
+ page := &EventsPage{Items: pages[idx]}
+ if idx+1 < len(pages) {
+ page.NextPageToken = strconv.Itoa(idx + 1)
+ } else {
+ page.NextSyncToken = finalToken
+ }
+ return page
+}
diff --git a/internal/gcal/mock_test.go b/internal/gcal/mock_test.go
new file mode 100644
index 000000000..9a8bf8469
--- /dev/null
+++ b/internal/gcal/mock_test.go
@@ -0,0 +1,81 @@
+package gcal
+
+import (
+ "context"
+ "errors"
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+)
+
+func TestMockAPI_FullPaginationDeliversSyncTokenOnFinalPage(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ m := NewMockAPI()
+ m.Calendars = []Calendar{{ID: "primary", AccessRole: "owner"}}
+ m.FullEvents["primary"] = [][]Event{
+ {{ID: "a"}, {ID: "b"}},
+ {{ID: "c"}},
+ }
+ m.FullSyncToken["primary"] = "TOK1"
+
+ ctx := context.Background()
+ p0, err := m.ListEvents(ctx, "primary", EventsListParams{})
+ require.NoError(err)
+ assert.Len(p0.Items, 2)
+ assert.Equal("1", p0.NextPageToken)
+ assert.Empty(p0.NextSyncToken, "no sync token before the final page")
+
+ p1, err := m.ListEvents(ctx, "primary", EventsListParams{PageToken: p0.NextPageToken})
+ require.NoError(err)
+ assert.Len(p1.Items, 1)
+ assert.Empty(p1.NextPageToken)
+ assert.Equal("TOK1", p1.NextSyncToken)
+
+ // Re-running the full sync (PageToken="") restarts pagination — needed for
+ // idempotency tests.
+ again, err := m.ListEvents(ctx, "primary", EventsListParams{})
+ require.NoError(err)
+ assert.Len(again.Items, 2)
+ assert.Equal(3, m.ListEventsCalls())
+}
+
+func TestMockAPI_IncrementalAndGone(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ m := NewMockAPI()
+ m.IncEvents["TOK1"] = [][]Event{{{ID: "x", Status: StatusCancelled, RecurringEventID: "r"}}}
+ m.IncNextToken["TOK1"] = "TOK2"
+ m.GoneTokens["STALE"] = true
+
+ ctx := context.Background()
+ page, err := m.ListEvents(ctx, "primary", EventsListParams{SyncToken: "TOK1"})
+ require.NoError(err)
+ require.Len(page.Items, 1)
+ assert.True(page.Items[0].IsCancelled())
+ assert.Equal("TOK2", page.NextSyncToken)
+
+ _, err = m.ListEvents(ctx, "primary", EventsListParams{SyncToken: "STALE"})
+ var gone *GoneError
+ assert.ErrorAs(err, &gone, "stale token should yield *GoneError")
+}
+
+func TestMockAPI_GetEventAndErrorInjection(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ m := NewMockAPI()
+ m.EventsByID["primary"] = map[string]Event{"e1": {ID: "e1", Summary: "Found"}}
+
+ ev, err := m.GetEvent(context.Background(), "primary", "e1")
+ require.NoError(err)
+ assert.Equal("Found", ev.Summary)
+
+ _, err = m.GetEvent(context.Background(), "primary", "nope")
+ var nf *NotFoundError
+ require.ErrorAs(err, &nf)
+
+ m.ListEventsErr = errors.New("boom")
+ _, err = m.ListEvents(context.Background(), "primary", EventsListParams{})
+ assert.ErrorContains(err, "boom")
+}
diff --git a/internal/gcal/models.go b/internal/gcal/models.go
new file mode 100644
index 000000000..5eda45d66
--- /dev/null
+++ b/internal/gcal/models.go
@@ -0,0 +1,161 @@
+// Package gcal is a read-only Google Calendar API v3 client, structured as a
+// close mirror of internal/gmail: a hand-rolled net/http client with a dedicated
+// rate limiter, an API interface plus in-memory mock, and unexported wire types
+// mapped to exported domain types. It deliberately reuses internal/gmail's
+// token-bucket RateLimiter (and its adaptive Throttle/RecoverRate backoff) via
+// the shared Operation enum and NewRateLimiterWithCapacity.
+package gcal
+
+import (
+ "encoding/json"
+ "time"
+)
+
+// Identity constants shared across the calendar sync code.
+const (
+ // SourceType is the sources.source_type value for calendar sources.
+ SourceType = "gcal"
+ // AdapterName labels this adapter in logs/metrics.
+ AdapterName = "gcal"
+ // MessageTypeCalendarEvent is the messages.message_type value for events.
+ MessageTypeCalendarEvent = "calendar_event"
+ // ConversationType is the conversations.conversation_type value.
+ ConversationType = "calendar"
+ // RawFormat is the message_raw.raw_format tag for the stored event JSON.
+ RawFormat = "gcal_json"
+)
+
+// Event status values returned by the API.
+const (
+ StatusConfirmed = "confirmed"
+ StatusTentative = "tentative"
+ StatusCancelled = "cancelled"
+)
+
+// Calendar is one entry from calendarList.list.
+type Calendar struct {
+ ID string `json:"id,omitempty"`
+ Summary string `json:"summary,omitempty"`
+ Description string `json:"description,omitempty"`
+ TimeZone string `json:"timeZone,omitempty"`
+ AccessRole string `json:"accessRole,omitempty"` // owner | writer | reader | freeBusyReader
+ Primary bool `json:"primary,omitempty"`
+ Deleted bool `json:"deleted,omitempty"`
+}
+
+// Person is an organizer/creator reference on an event.
+type Person struct {
+ Email string `json:"email,omitempty"`
+ DisplayName string `json:"displayName,omitempty"`
+ Self bool `json:"self,omitempty"`
+}
+
+// Attendee is one invitee on an event.
+type Attendee struct {
+ Email string `json:"email,omitempty"`
+ DisplayName string `json:"displayName,omitempty"`
+ ResponseStatus string `json:"responseStatus,omitempty"` // needsAction | declined | tentative | accepted
+ Organizer bool `json:"organizer,omitempty"`
+ Self bool `json:"self,omitempty"`
+ Resource bool `json:"resource,omitempty"`
+ Optional bool `json:"optional,omitempty"`
+}
+
+// EventDateTime is an event start/end. Exactly one of DateTime (timed) or Date
+// (all-day) is meaningful. DateTime carries an absolute instant (RFC3339 with
+// offset); Date is a calendar day "2006-01-02" with no instant.
+type EventDateTime struct {
+ DateTime time.Time `json:"dateTime,omitzero"`
+ Date string `json:"date,omitempty"`
+ TimeZone string `json:"timeZone,omitempty"`
+}
+
+// IsAllDay reports whether this is an all-day (date-only) value.
+func (e EventDateTime) IsAllDay() bool { return e.Date != "" }
+
+// IsZero reports whether neither a timed nor all-day value is present.
+func (e EventDateTime) IsZero() bool { return e.DateTime.IsZero() && e.Date == "" }
+
+// Instant returns a single sortable time for the value, plus ok=false if the
+// value is empty/unparseable. Timed events use their absolute DateTime; all-day
+// events are normalized to midnight UTC of their Date so they sort and partition
+// deterministically regardless of the running machine's timezone.
+func (e EventDateTime) Instant() (time.Time, bool) {
+ if !e.DateTime.IsZero() {
+ return e.DateTime, true
+ }
+ if e.Date != "" {
+ if t, err := time.Parse("2006-01-02", e.Date); err == nil {
+ return t.UTC(), true
+ }
+ }
+ return time.Time{}, false
+}
+
+// Event is a single calendar event (a master, a recurring instance, or an
+// exception). The full original JSON is preserved separately in message_raw.
+type Event struct {
+ ID string `json:"id,omitempty"`
+ Status string `json:"status,omitempty"`
+ HTMLLink string `json:"htmlLink,omitempty"`
+ HangoutLink string `json:"hangoutLink,omitempty"`
+ Created time.Time `json:"created,omitzero"`
+ Updated time.Time `json:"updated,omitzero"`
+ Summary string `json:"summary,omitempty"`
+ Description string `json:"description,omitempty"`
+ Location string `json:"location,omitempty"`
+ Creator Person `json:"creator,omitzero"`
+ Organizer Person `json:"organizer,omitzero"`
+ Start EventDateTime `json:"start,omitzero"`
+ End EventDateTime `json:"end,omitzero"`
+ Recurrence []string `json:"recurrence,omitempty"` // RRULE/RDATE/EXDATE lines (masters only)
+ RecurringEventID string `json:"recurringEventId,omitempty"` // set on instances/exceptions of a series
+ OriginalStartTime EventDateTime `json:"originalStartTime,omitzero"`
+ ICalUID string `json:"iCalUID,omitempty"`
+ Sequence int `json:"sequence,omitempty"`
+ Attendees []Attendee `json:"attendees,omitempty"`
+ Transparency string `json:"transparency,omitempty"`
+ Visibility string `json:"visibility,omitempty"`
+ EventType string `json:"eventType,omitempty"`
+
+ // Raw is the original API JSON for this event, preserved verbatim for
+ // archival fidelity (stored in message_raw). It is not re-serialized from
+ // the mapped fields, so fields msgvault does not model (conferenceData,
+ // extendedProperties, ...) survive in the archive.
+ Raw json.RawMessage `json:"-"`
+}
+
+// IsCancelled reports whether the event is a cancellation/tombstone.
+func (e Event) IsCancelled() bool { return e.Status == StatusCancelled }
+
+// EventsPage is one page of events.list, plus the pagination/sync tokens. The
+// API delivers NextSyncToken only on the final page of a list traversal.
+type EventsPage struct {
+ Items []Event `json:"items,omitempty"`
+ NextPageToken string `json:"nextPageToken,omitempty"`
+ NextSyncToken string `json:"nextSyncToken,omitempty"`
+ TimeZone string `json:"timeZone,omitempty"`
+}
+
+// CalendarListPage is one page of calendarList.list.
+type CalendarListPage struct {
+ Items []Calendar `json:"items,omitempty"`
+ NextPageToken string `json:"nextPageToken,omitempty"`
+}
+
+// EventsListParams configures a single events.list call.
+//
+// SyncToken and TimeMin/TimeMax are mutually exclusive: the Calendar API rejects
+// timeMin/timeMax (and q/orderBy/updatedMin) when syncToken is set, to keep the
+// client's incremental state consistent. SingleEvents and ShowDeleted must match
+// the values used for the full sync that minted the token; the client always
+// forwards them, and full sync uses SingleEvents=false (store masters).
+type EventsListParams struct {
+ SyncToken string
+ PageToken string
+ SingleEvents bool
+ ShowDeleted bool
+ MaxResults int
+ TimeMin string // RFC3339; full-sync only (ignored when SyncToken set)
+ TimeMax string // RFC3339; full-sync only (ignored when SyncToken set)
+}
diff --git a/internal/gcal/wire.go b/internal/gcal/wire.go
new file mode 100644
index 000000000..4419f7a5b
--- /dev/null
+++ b/internal/gcal/wire.go
@@ -0,0 +1,172 @@
+package gcal
+
+import (
+ "encoding/json"
+ "time"
+)
+
+// Unexported JSON wire types for the Calendar API v3, mapped to the exported
+// domain types. Each maps via a toX() method so the rest of the package never
+// sees raw JSON shapes.
+
+type wireCalendarListEntry struct {
+ ID string `json:"id"`
+ Summary string `json:"summary"`
+ SummaryOverride string `json:"summaryOverride"`
+ Description string `json:"description"`
+ TimeZone string `json:"timeZone"`
+ AccessRole string `json:"accessRole"`
+ Primary bool `json:"primary"`
+ Deleted bool `json:"deleted"`
+}
+
+func (w wireCalendarListEntry) toCalendar() Calendar {
+ summary := w.Summary
+ if w.SummaryOverride != "" {
+ // A user-set override (e.g. a renamed subscribed calendar) is the
+ // label the user actually sees; prefer it.
+ summary = w.SummaryOverride
+ }
+ return Calendar{
+ ID: w.ID,
+ Summary: summary,
+ Description: w.Description,
+ TimeZone: w.TimeZone,
+ AccessRole: w.AccessRole,
+ Primary: w.Primary,
+ Deleted: w.Deleted,
+ }
+}
+
+type wireCalendarList struct {
+ Items []wireCalendarListEntry `json:"items"`
+ NextPageToken string `json:"nextPageToken"`
+}
+
+type wirePerson struct {
+ Email string `json:"email"`
+ DisplayName string `json:"displayName"`
+ Self bool `json:"self"`
+}
+
+func (w *wirePerson) toPerson() Person {
+ if w == nil {
+ return Person{}
+ }
+ return Person{Email: w.Email, DisplayName: w.DisplayName, Self: w.Self}
+}
+
+type wireAttendee struct {
+ Email string `json:"email"`
+ DisplayName string `json:"displayName"`
+ ResponseStatus string `json:"responseStatus"`
+ Organizer bool `json:"organizer"`
+ Self bool `json:"self"`
+ Resource bool `json:"resource"`
+ Optional bool `json:"optional"`
+}
+
+type wireEventDateTime struct {
+ DateTime string `json:"dateTime"`
+ Date string `json:"date"`
+ TimeZone string `json:"timeZone"`
+}
+
+func (w *wireEventDateTime) toEventDateTime() EventDateTime {
+ if w == nil {
+ return EventDateTime{}
+ }
+ var dt time.Time
+ if w.DateTime != "" {
+ if t, err := time.Parse(time.RFC3339, w.DateTime); err == nil {
+ dt = t
+ }
+ }
+ return EventDateTime{DateTime: dt, Date: w.Date, TimeZone: w.TimeZone}
+}
+
+type wireEvent struct {
+ ID string `json:"id"`
+ Status string `json:"status"`
+ HTMLLink string `json:"htmlLink"`
+ HangoutLink string `json:"hangoutLink"`
+ Created string `json:"created"`
+ Updated string `json:"updated"`
+ Summary string `json:"summary"`
+ Description string `json:"description"`
+ Location string `json:"location"`
+ Creator *wirePerson `json:"creator"`
+ Organizer *wirePerson `json:"organizer"`
+ Start *wireEventDateTime `json:"start"`
+ End *wireEventDateTime `json:"end"`
+ Recurrence []string `json:"recurrence"`
+ RecurringEventID string `json:"recurringEventId"`
+ OriginalStartTime *wireEventDateTime `json:"originalStartTime"`
+ ICalUID string `json:"iCalUID"`
+ Sequence int `json:"sequence"`
+ Attendees []wireAttendee `json:"attendees"`
+ Transparency string `json:"transparency"`
+ Visibility string `json:"visibility"`
+ EventType string `json:"eventType"`
+}
+
+func parseRFC3339(s string) time.Time {
+ if s == "" {
+ return time.Time{}
+ }
+ if t, err := time.Parse(time.RFC3339, s); err == nil {
+ return t
+ }
+ return time.Time{}
+}
+
+func (w wireEvent) toEvent() Event {
+ ev := Event{
+ ID: w.ID,
+ Status: w.Status,
+ HTMLLink: w.HTMLLink,
+ HangoutLink: w.HangoutLink,
+ Created: parseRFC3339(w.Created),
+ Updated: parseRFC3339(w.Updated),
+ Summary: w.Summary,
+ Description: w.Description,
+ Location: w.Location,
+ Creator: w.Creator.toPerson(),
+ Organizer: w.Organizer.toPerson(),
+ Start: w.Start.toEventDateTime(),
+ End: w.End.toEventDateTime(),
+ Recurrence: w.Recurrence,
+ RecurringEventID: w.RecurringEventID,
+ OriginalStartTime: w.OriginalStartTime.toEventDateTime(),
+ ICalUID: w.ICalUID,
+ Sequence: w.Sequence,
+ Transparency: w.Transparency,
+ Visibility: w.Visibility,
+ EventType: w.EventType,
+ }
+ for _, a := range w.Attendees {
+ ev.Attendees = append(ev.Attendees, Attendee(a))
+ }
+ return ev
+}
+
+type wireEvents struct {
+ // Items are kept as raw JSON so each event's original bytes can be
+ // preserved verbatim in Event.Raw for the archive.
+ Items []json.RawMessage `json:"items"`
+ NextPageToken string `json:"nextPageToken"`
+ NextSyncToken string `json:"nextSyncToken"`
+ TimeZone string `json:"timeZone"`
+}
+
+// decodeEvent unmarshals one event's JSON into the domain Event, preserving the
+// original bytes in Event.Raw.
+func decodeEvent(raw []byte) (Event, error) {
+ var w wireEvent
+ if err := json.Unmarshal(raw, &w); err != nil {
+ return Event{}, err
+ }
+ ev := w.toEvent()
+ ev.Raw = append(json.RawMessage(nil), raw...)
+ return ev, nil
+}
diff --git a/internal/gmail/ratelimit.go b/internal/gmail/ratelimit.go
index 896aa1b09..f31277d8d 100644
--- a/internal/gmail/ratelimit.go
+++ b/internal/gmail/ratelimit.go
@@ -25,6 +25,17 @@ const (
OpMessagesDelete // 10 units
OpMessagesBatchDelete // 50 units
OpProfile // 1 unit
+
+ // OpCalendarListList and the other Calendar API operations live on this
+ // shared Operation enum (rather than a parallel one in internal/gcal) so
+ // the Calendar client can reuse this package's token-bucket limiter and
+ // adaptive Throttle/RecoverRate backoff via NewRateLimiterWithCapacity.
+ // Each costs 1 quota unit (Calendar bills per request, not per field
+ // projection), which falls through Cost()'s default branch. See
+ // internal/gcal.
+ OpCalendarListList // 1 unit
+ OpEventsList // 1 unit
+ OpEventsGet // 1 unit
)
// Cost returns the quota cost for an operation.
@@ -115,6 +126,40 @@ func newRateLimiter(clk Clock, qps float64) *RateLimiter {
}
}
+// NewRateLimiterWithCapacity creates a token-bucket limiter with an explicit
+// bucket capacity and refill rate (tokens/second), bypassing the Gmail-tuned
+// NewRateLimiter defaults (capacity=250, refill scaled from QPS). This lets a
+// Google API with a different per-user budget use a correctly-sized bucket
+// instead of inheriting Gmail's 250-token instant burst — e.g. the Calendar
+// API at 600 req/min/user is built with capacity=10, refillRate=8 (80% of the
+// 10 req/s sustained ceiling, leaving headroom for jitter and retries).
+// capacity is clamped to >=1 and refillRate to >= MinQPS.
+func NewRateLimiterWithCapacity(capacity int, refillRate float64) *RateLimiter {
+ return newRateLimiterWithCapacity(realClock{}, capacity, refillRate)
+}
+
+// newRateLimiterWithCapacity is the clock-injectable form for tests.
+func newRateLimiterWithCapacity(clk Clock, capacity int, refillRate float64) *RateLimiter {
+ if clk == nil {
+ panic("gmail: RateLimiter requires a non-nil Clock")
+ }
+ if capacity < 1 {
+ capacity = 1
+ }
+ if refillRate < MinQPS {
+ refillRate = MinQPS
+ }
+ capF := float64(capacity)
+ return &RateLimiter{
+ clock: clk,
+ tokens: capF,
+ capacity: capF,
+ refillRate: refillRate,
+ baseRefillRate: refillRate,
+ lastRefill: clk.Now(),
+ }
+}
+
// reserve attempts to acquire tokens for the operation. Returns 0 if tokens
// were acquired immediately, or the duration to wait before retrying.
func (r *RateLimiter) reserve(op Operation) time.Duration {
diff --git a/internal/gmail/ratelimit_test.go b/internal/gmail/ratelimit_test.go
index e02f07ac2..3cc90e2ba 100644
--- a/internal/gmail/ratelimit_test.go
+++ b/internal/gmail/ratelimit_test.go
@@ -174,6 +174,53 @@ func (f *rlFixture) acquireAsync(ctx context.Context, t *testing.T, op Operation
}
}
+func TestNewRateLimiterWithCapacity_BurstCapped(t *testing.T) {
+ assert := assert.New(t)
+ clk := newMockClock()
+ rl := newRateLimiterWithCapacity(clk, 10, 8)
+
+ // Bucket starts at the explicit capacity, NOT Gmail's DefaultCapacity (250).
+ assert.InDelta(10.0, rl.Available(), 1e-9, "initial capacity")
+
+ // Exactly 10 cost-1 ops drain the bucket without blocking.
+ for i := range 10 {
+ assert.True(rl.TryAcquire(OpEventsList), "op %d should acquire", i)
+ }
+ // The 11th must fail — proving the burst is capped at 10, not 250.
+ assert.False(rl.TryAcquire(OpEventsList), "11th op must be throttled (no 250-burst)")
+
+ // Refill is 8 tok/s: after 1s, ~8 tokens return.
+ clk.Advance(1 * time.Second)
+ assert.InDelta(8.0, rl.Available(), 1e-9, "after 1s at 8 tok/s")
+
+ // Saturates at capacity, never above.
+ clk.Advance(10 * time.Second)
+ assert.InDelta(10.0, rl.Available(), 1e-9, "saturated at capacity")
+}
+
+func TestNewRateLimiterWithCapacity_Clamps(t *testing.T) {
+ clk := newMockClock()
+ rl := newRateLimiterWithCapacity(clk, 0, 0.0)
+ assert.InDelta(t, 1.0, rl.Available(), 1e-9, "capacity clamps to >=1")
+
+ rl.mu.Lock()
+ refill := rl.refillRate
+ rl.mu.Unlock()
+ assert.InDelta(t, MinQPS, refill, 1e-9, "refillRate clamps to MinQPS")
+}
+
+func TestNewRateLimiterWithCapacity_NilClockPanics(t *testing.T) {
+ assert.Panics(t, func() {
+ newRateLimiterWithCapacity(nil, 10, 8)
+ }, "newRateLimiterWithCapacity(nil, ...) should panic")
+}
+
+func TestCalendarOperationCost(t *testing.T) {
+ for _, op := range []Operation{OpCalendarListList, OpEventsList, OpEventsGet} {
+ assert.Equal(t, 1, op.Cost(), "calendar op %d cost", op)
+ }
+}
+
func TestOperationCost(t *testing.T) {
tests := []struct {
op Operation
diff --git a/internal/mcp/handlers.go b/internal/mcp/handlers.go
index c9a747154..b14f71ee1 100644
--- a/internal/mcp/handlers.go
+++ b/internal/mcp/handlers.go
@@ -128,6 +128,8 @@ func translateVectorErr(err error) *mcp.CallToolResult {
"embedding_timeout: the embedding endpoint did not respond in time; " +
"retry, or raise [vector.embeddings].timeout in config",
)
+ case errors.Is(err, vector.ErrIndexScopeMismatch):
+ return mcp.NewToolResultError("index_scope_mismatch: " + err.Error())
}
return nil
}
@@ -520,17 +522,26 @@ func (h *handlers) findSimilarMessages(ctx context.Context, req mcp.CallToolRequ
return mcp.NewToolResultError(fmt.Sprintf("load seed vector: %v", err)), nil
}
- active, err := h.backend.ActiveGeneration(ctx)
+ active, err := vector.ResolveActiveForFingerprint(ctx, h.backend, h.vectorCfg.GenerationFingerprint())
if err != nil {
if r := translateVectorErr(err); r != nil {
return r, nil
}
return mcp.NewToolResultError(fmt.Sprintf("active generation: %v", err)), nil
}
+ if err := hybrid.ValidateBuildScope(h.vectorCfg.Embed.Scope.BuildScope(), filter); err != nil {
+ if r := translateVectorErr(err); r != nil {
+ return r, nil
+ }
+ return mcp.NewToolResultError(err.Error()), nil
+ }
// +1 so we can drop the seed itself from results without coming up short.
hits, err := h.backend.Search(ctx, active.ID, seed, limit+1, filter)
if err != nil {
+ if r := translateVectorErr(err); r != nil {
+ return r, nil
+ }
return mcp.NewToolResultError(fmt.Sprintf("search failed: %v", err)), nil
}
@@ -596,6 +607,9 @@ func (h *handlers) filterFromFindSimilarArgs(ctx context.Context, args map[strin
if srcID != nil {
f.SourceIDs = []int64{*srcID}
}
+ if messageType, _ := args["message_type"].(string); messageType != "" {
+ f.MessageTypes = vector.NewBuildScope([]string{messageType}).MessageTypes
+ }
if v, ok := args["has_attachment"].(bool); ok && v {
tr := true
diff --git a/internal/mcp/server.go b/internal/mcp/server.go
index 58a3c23ec..875cce0e8 100644
--- a/internal/mcp/server.go
+++ b/internal/mcp/server.go
@@ -365,6 +365,9 @@ func findSimilarMessagesTool() mcp.Tool {
),
withLimit("20"),
withAccount(),
+ mcp.WithString("message_type",
+ mcp.Description("Restrict results to one message type, such as email, sms, mms, fbmessenger, or calendar_event"),
+ ),
withAfter(),
withBefore(),
mcp.WithBoolean("has_attachment",
diff --git a/internal/mcp/server_test.go b/internal/mcp/server_test.go
index e11cc76e4..ec84048ee 100644
--- a/internal/mcp/server_test.go
+++ b/internal/mcp/server_test.go
@@ -1471,6 +1471,8 @@ type fakeBackend struct {
activeErr error
searchHits []vector.Hit
searchErr error
+ searchCalls int
+ lastFilter vector.Filter
building *vector.Generation
buildingErr error
stats map[vector.GenerationID]vector.Stats
@@ -1489,7 +1491,9 @@ func (f *fakeBackend) EmbeddedMessageCount(_ context.Context, _ vector.Generatio
func (f *fakeBackend) ActiveGeneration(_ context.Context) (vector.Generation, error) {
return f.active, f.activeErr
}
-func (f *fakeBackend) Search(_ context.Context, _ vector.GenerationID, _ []float32, _ int, _ vector.Filter) ([]vector.Hit, error) {
+func (f *fakeBackend) Search(_ context.Context, _ vector.GenerationID, _ []float32, _ int, filter vector.Filter) ([]vector.Hit, error) {
+ f.searchCalls++
+ f.lastFilter = filter
return f.searchHits, f.searchErr
}
func (f *fakeBackend) CreateGeneration(_ context.Context, _ string, _ int, _ string) (vector.GenerationID, error) {
@@ -1536,6 +1540,29 @@ type generationSummary struct {
State string `json:"state"`
}
+func testSimilarVectorConfig(messageTypes ...string) vector.Config {
+ return vector.Config{
+ Embeddings: vector.EmbeddingsConfig{
+ Model: "nomic-embed",
+ Dimension: 4,
+ MaxInputChars: 6000,
+ },
+ Embed: vector.EmbedConfig{
+ Scope: vector.EmbedScopeConfig{MessageTypes: messageTypes},
+ },
+ }
+}
+
+func testSimilarActiveGeneration(cfg vector.Config) vector.Generation {
+ return vector.Generation{
+ ID: 7,
+ Model: cfg.Embeddings.Model,
+ Dimension: cfg.Embeddings.Dimension,
+ Fingerprint: cfg.GenerationFingerprint(),
+ State: vector.GenerationActive,
+ }
+}
+
func TestFindSimilarMessages_VectorNotEnabled(t *testing.T) {
h := newTestHandlers(&querytest.MockEngine{})
@@ -1564,6 +1591,11 @@ func TestSearchMessagesTool_AdvertisesVectorModesOnlyWhenAvailable(t *testing.T)
assert.Contains(enabled.Description, "free-text", "vectorAvailable=true: tool description should call out the free-text requirement, got: %q", enabled.Description)
}
+func TestFindSimilarMessagesTool_AdvertisesMessageTypeFilter(t *testing.T) {
+ tool := findSimilarMessagesTool()
+ assertpkg.Contains(t, tool.InputSchema.Properties, "message_type")
+}
+
func TestFindSimilarMessages_MissingID(t *testing.T) {
h := &handlers{
engine: &querytest.MockEngine{},
@@ -1581,15 +1613,10 @@ func TestFindSimilarMessages_HappyPath(t *testing.T) {
for i := range seed {
seed[i] = float32(i)
}
+ cfg := testSimilarVectorConfig()
fb := &fakeBackend{
loadVec: seed,
- active: vector.Generation{
- ID: 7,
- Model: "nomic-embed",
- Dimension: 4,
- Fingerprint: "nomic-embed:4",
- State: vector.GenerationActive,
- },
+ active: testSimilarActiveGeneration(cfg),
searchHits: []vector.Hit{
{MessageID: 100, Score: 0.99, Rank: 1}, // seed — must be filtered out
{MessageID: 200, Score: 0.95, Rank: 2},
@@ -1604,7 +1631,7 @@ func TestFindSimilarMessages_HappyPath(t *testing.T) {
},
}
- h := &handlers{engine: eng, backend: fb}
+ h := &handlers{engine: eng, backend: fb, vectorCfg: cfg}
resp := runTool[similarResponse](t, "find_similar_messages", h.findSimilarMessages, map[string]any{
"message_id": float64(100),
@@ -1614,7 +1641,7 @@ func TestFindSimilarMessages_HappyPath(t *testing.T) {
assert.Equal(int64(100), resp.SeedMessageID, "seed_message_id")
assert.Equal(2, resp.Returned, "returned")
assert.Equal(int64(7), resp.Generation.ID, "generation.id")
- assert.Equal("nomic-embed:4", resp.Generation.Fingerprint, "generation.fingerprint")
+ assert.Equal(cfg.GenerationFingerprint(), resp.Generation.Fingerprint, "generation.fingerprint")
requirepkg.Len(t, resp.Messages, 2, "messages")
for _, m := range resp.Messages {
assert.NotEqual(int64(100), m.ID, "seed message 100 must not appear in results")
@@ -1623,6 +1650,89 @@ func TestFindSimilarMessages_HappyPath(t *testing.T) {
assert.Equal(int64(300), resp.Messages[1].ID, "Messages[1].ID")
}
+func TestFindSimilarMessages_RejectsStaleActiveGeneration(t *testing.T) {
+ cfg := vector.Config{
+ Embeddings: vector.EmbeddingsConfig{
+ Model: "nomic-embed",
+ Dimension: 4,
+ MaxInputChars: 6000,
+ },
+ }
+ fb := &fakeBackend{
+ loadVec: []float32{0, 1, 2, 3},
+ active: vector.Generation{
+ ID: 7,
+ Model: "old-model",
+ Dimension: 4,
+ Fingerprint: "old-model:4:p1-111111:c6000:e1",
+ State: vector.GenerationActive,
+ },
+ }
+ h := &handlers{engine: &querytest.MockEngine{}, backend: fb, vectorCfg: cfg}
+
+ r := runToolExpectError(t, "find_similar_messages", h.findSimilarMessages, map[string]any{
+ "message_id": float64(100),
+ })
+ txt := resultText(t, r)
+ assertpkg.Contains(t, txt, "index_stale", "expected stale-index error, got: %s", txt)
+ assertpkg.Equal(t, 0, fb.searchCalls, "backend search calls")
+}
+
+func TestFindSimilarMessages_ScopedIndexRequiresMatchingMessageTypeFilter(t *testing.T) {
+ seed := []float32{0, 1, 2, 3}
+ cfg := testSimilarVectorConfig("sms")
+ active := testSimilarActiveGeneration(cfg)
+
+ t.Run("rejects unscoped request", func(t *testing.T) {
+ fb := &fakeBackend{loadVec: seed, active: active}
+ h := &handlers{
+ engine: &querytest.MockEngine{},
+ backend: fb,
+ vectorCfg: cfg,
+ }
+
+ r := runToolExpectError(t, "find_similar_messages", h.findSimilarMessages, map[string]any{
+ "message_id": float64(100),
+ })
+ txt := resultText(t, r)
+ assertpkg.Contains(t, txt, "index_scope_mismatch", "expected scoped-index error, got: %s", txt)
+ assertpkg.Equal(t, 0, fb.searchCalls, "backend search calls")
+ })
+
+ t.Run("passes matching message type filter to backend", func(t *testing.T) {
+ fb := &fakeBackend{loadVec: seed, active: active}
+ h := &handlers{
+ engine: &querytest.MockEngine{},
+ backend: fb,
+ vectorCfg: cfg,
+ }
+
+ runTool[similarResponse](t, "find_similar_messages", h.findSimilarMessages, map[string]any{
+ "message_id": float64(100),
+ "message_type": " SMS ",
+ })
+ assertpkg.Equal(t, 1, fb.searchCalls, "backend search calls")
+ assertpkg.Equal(t, []string{"sms"}, fb.lastFilter.MessageTypes, "MessageTypes")
+ })
+
+ t.Run("rejects conflicting message type filter", func(t *testing.T) {
+ fb := &fakeBackend{loadVec: seed, active: active}
+ h := &handlers{
+ engine: &querytest.MockEngine{},
+ backend: fb,
+ vectorCfg: cfg,
+ }
+
+ r := runToolExpectError(t, "find_similar_messages", h.findSimilarMessages, map[string]any{
+ "message_id": float64(100),
+ "message_type": "email",
+ })
+ txt := resultText(t, r)
+ assertpkg.Contains(t, txt, "index_scope_mismatch", "expected scoped-index error, got: %s", txt)
+ assertpkg.Equal(t, 0, fb.searchCalls, "backend search calls")
+ })
+}
+
func TestFindSimilarMessages_NoActiveGeneration(t *testing.T) {
fb := &fakeBackend{
loadErr: vector.ErrNoActiveGeneration,
diff --git a/internal/oauth/oauth.go b/internal/oauth/oauth.go
index 2cb2c23db..62b17d785 100644
--- a/internal/oauth/oauth.go
+++ b/internal/oauth/oauth.go
@@ -38,7 +38,26 @@ var ScopesDeletion = []string{
"https://mail.google.com/",
}
+// ScopeCalendarReadonly is the read-only Calendar scope: it covers both
+// calendarList enumeration and event reads, so an archival tool needs nothing
+// finer-grained.
+const ScopeCalendarReadonly = "https://www.googleapis.com/auth/calendar.readonly"
+
+// ScopesCalendar is the opt-in scope set for calendar sync.
+var ScopesCalendar = []string{
+ ScopeCalendarReadonly,
+}
+
+// ScopesGmailCalendar bundles the normal Gmail scopes with Calendar for
+// re-consent. Re-authorizing uses ApprovalForce with no include_granted_scopes,
+// which REPLACES (not unions) the granted scope set — so an existing Gmail
+// account opting into calendar must re-consent with BOTH scope families or it
+// silently loses Gmail access. Always pass this bundle (never ScopesCalendar
+// alone) when escalating an account that already has Gmail.
+var ScopesGmailCalendar = append(append([]string{}, Scopes...), ScopesCalendar...)
+
const defaultProfileURL = "https://gmail.googleapis.com/gmail/v1/users/me/profile"
+const defaultCalendarProfileURL = "https://www.googleapis.com/calendar/v3/users/me/calendarList/primary"
// TokenMismatchError is returned when the authorized Google account
// does not match the expected email. Callers can inspect Expected
@@ -60,7 +79,7 @@ type Manager struct {
config *oauth2.Config
tokensDir string
logger *slog.Logger
- profileURL string // Gmail profile endpoint; overridden in tests
+ profileURL string // profile endpoint override for tests
// browserFlowFn overrides browserFlow in tests to avoid starting
// a real HTTP server and browser. When nil, the real browserFlow
@@ -149,6 +168,65 @@ func PrintHeadlessInstructions(email, tokensDir, oauthApp string) {
fmt.Println()
}
+// PrintCalendarHeadlessInstructions prints setup instructions for adding
+// Calendar access on a headless server. As with Gmail, Google's device flow
+// does not support Calendar scopes, so the operator must authorize on a machine
+// with a browser and copy the token file to the server. Calendar re-consent
+// REPLACES the granted scopes, so the browser machine must keep existing
+// permissions plus Calendar checked or access is dropped.
+// tokensDir should be the configured tokens directory (e.g., cfg.TokensDir()).
+func PrintCalendarHeadlessInstructions(email, tokensDir, oauthApp string) {
+ tokenFile := sanitizeEmail(email) + ".json"
+ tokenPath := filepath.Join(tokensDir, tokenFile)
+
+ addCmd := " msgvault add-calendar " + email
+ syncCmd := " msgvault sync-calendar " + email
+ if oauthApp != "" {
+ addCmd += " --oauth-app " + oauthApp
+ syncCmd += " --oauth-app " + oauthApp
+ }
+
+ fmt.Println()
+ fmt.Println("=== Headless Server Calendar Setup ===")
+ fmt.Println()
+ fmt.Println("A headless server cannot complete Google's browser consent, and the OAuth")
+ fmt.Println("device flow does not support Calendar scopes. Authorize on a machine with a")
+ fmt.Println("browser and copy the token to your server.")
+ fmt.Println()
+ fmt.Println("Step 0: If this account already has a token on the headless server, copy")
+ fmt.Println(" that existing token to the browser machine first. This lets")
+ fmt.Println(" add-calendar preserve Drive or other previously granted scopes:")
+ fmt.Println()
+ fmt.Printf(" mkdir -p %s\n", shellQuote(tokensDir))
+ fmt.Printf(" scp user@server:%s %s\n", shellQuote(tokenPath), shellQuote(tokenPath))
+ fmt.Println()
+ fmt.Println(" Skip this step only when no token exists yet.")
+ fmt.Println()
+ fmt.Println("Step 1: On a machine with a browser (using the SAME client_secret.json as the")
+ fmt.Println(" server), run:")
+ fmt.Println()
+ fmt.Println(addCmd)
+ fmt.Println()
+ fmt.Println(" On the consent screen, keep all existing permissions plus Calendar")
+ fmt.Println(" checked — re-consent REPLACES scopes, so unchecking an existing")
+ fmt.Println(" permission would drop that access.")
+ fmt.Println()
+ fmt.Println("Step 2: Copy the token file to your headless server, replacing the existing one:")
+ fmt.Println()
+ fmt.Printf(" ssh user@server mkdir -p %s\n", shellQuote(tokensDir))
+ fmt.Printf(" scp %s user@server:%s\n", shellQuote(tokenPath), shellQuote(tokenPath))
+ fmt.Println()
+ fmt.Println("Step 3: On the headless server, register the calendars (no browser needed)")
+ fmt.Println(" and sync:")
+ fmt.Println()
+ fmt.Println(addCmd)
+ fmt.Println(syncCmd)
+ fmt.Println()
+ fmt.Println("The copied token carries Calendar plus the existing Google permissions,")
+ fmt.Println("so current sync jobs keep working.")
+ fmt.Println()
+}
+
// sanitizeEmail sanitizes an email for use in a filename.
func sanitizeEmail(email string) string {
safe := strings.ReplaceAll(email, "/", "_")
@@ -308,21 +386,51 @@ func (m *Manager) browserFlow(
const resolveTimeout = 10 * time.Second
-// resolveTokenEmail calls the Gmail profile API to confirm that
+// resolveTokenEmail calls a Google profile endpoint to confirm that
// the token belongs to an account matching the expected email.
-// Returns the canonical (primary) Gmail address for the account,
+// Returns the canonical Google account email when available,
// which may differ from the input when the user supplies an alias
// or secondary login address. The token is never persisted by this
// function — the caller decides what to do on success or failure.
func (m *Manager) resolveTokenEmail(
ctx context.Context, email string, token *oauth2.Token,
) (string, error) {
- profileURL := m.profileURL
- if profileURL == "" {
- profileURL = defaultProfileURL
+ endpoint := tokenProfileEndpointForScopes(m.config.Scopes)
+ if m.profileURL != "" {
+ endpoint.url = m.profileURL
}
ts := m.config.TokenSource(ctx, token)
- return fetchTokenProfileEmail(ctx, ts, profileURL, email, tokenProfileErrorOAuth)
+ return fetchTokenProfileEmailFromEndpoint(ctx, ts, endpoint, email, tokenProfileErrorOAuth)
+}
+
+type tokenProfileEndpoint struct {
+ url string
+ serviceName string
+}
+
+func tokenProfileEndpointForScopes(scopes []string) tokenProfileEndpoint {
+ if slices.Contains(scopes, ScopeCalendarReadonly) && !hasGmailProfileScope(scopes) {
+ return tokenProfileEndpoint{
+ url: defaultCalendarProfileURL,
+ serviceName: "Calendar API",
+ }
+ }
+ return tokenProfileEndpoint{
+ url: defaultProfileURL,
+ serviceName: "Gmail API",
+ }
+}
+
+func hasGmailProfileScope(scopes []string) bool {
+ if slices.Contains(scopes, "https://mail.google.com/") {
+ return true
+ }
+ for _, scope := range Scopes {
+ if slices.Contains(scopes, scope) {
+ return true
+ }
+ }
+ return false
}
// sameGoogleAccount returns true if two email addresses belong to the
@@ -450,6 +558,16 @@ func (m *Manager) HasScope(email string, scope string) bool {
return slices.Contains(tf.Scopes, scope)
}
+// GrantedScopes returns a copy of the stored scope metadata for the account.
+// Legacy tokens or missing token files return nil.
+func (m *Manager) GrantedScopes(email string) []string {
+ tf, err := m.loadTokenFile(email)
+ if err != nil || len(tf.Scopes) == 0 {
+ return nil
+ }
+ return append([]string(nil), tf.Scopes...)
+}
+
// saveToken saves a token for the given email with the specified scopes.
func (m *Manager) saveToken(email string, token *oauth2.Token, scopes []string) error {
if err := fileutil.SecureMkdirAll(m.tokensDir, 0700); err != nil {
@@ -681,12 +799,25 @@ func fetchTokenProfileEmail(
profileURL string,
email string,
mode tokenProfileErrorMode,
+) (string, error) {
+ return fetchTokenProfileEmailFromEndpoint(ctx, ts, tokenProfileEndpoint{
+ url: profileURL,
+ serviceName: "gmail API",
+ }, email, mode)
+}
+
+func fetchTokenProfileEmailFromEndpoint(
+ ctx context.Context,
+ ts oauth2.TokenSource,
+ endpoint tokenProfileEndpoint,
+ email string,
+ mode tokenProfileErrorMode,
) (string, error) {
valCtx, cancel := context.WithTimeout(ctx, resolveTimeout)
defer cancel()
client := oauth2.NewClient(valCtx, ts)
- req, err := http.NewRequestWithContext(valCtx, http.MethodGet, profileURL, nil)
+ req, err := http.NewRequestWithContext(valCtx, http.MethodGet, endpoint.url, nil)
if err != nil {
return "", fmt.Errorf("create profile request: %w", err)
}
@@ -707,15 +838,17 @@ func fetchTokenProfileEmail(
if mode == tokenProfileErrorOAuth {
return "", fmt.Errorf(
"could not verify token belongs to %s: "+
- "Gmail API returned HTTP %d: %s "+
+ "%s returned HTTP %d: %s "+
"(re-run the command to try again)",
- email, resp.StatusCode, string(body))
+ email, endpoint.serviceName, resp.StatusCode, string(body))
}
- return "", fmt.Errorf("gmail API returned HTTP %d for %s: %s", resp.StatusCode, email, string(body))
+ return "", fmt.Errorf("%s returned HTTP %d for %s: %s", endpoint.serviceName, resp.StatusCode, email, string(body))
}
var profile struct {
EmailAddress string `json:"emailAddress"`
+ Email string `json:"email"`
+ ID string `json:"id"`
}
if err := json.NewDecoder(resp.Body).Decode(&profile); err != nil {
if mode == tokenProfileErrorOAuth {
@@ -727,11 +860,28 @@ func fetchTokenProfileEmail(
return "", fmt.Errorf("parse profile for %s: %w", email, err)
}
- if !sameGoogleAccount(email, profile.EmailAddress) {
- return "", &TokenMismatchError{Expected: email, Actual: profile.EmailAddress}
+ profileEmail := profile.EmailAddress
+ if profileEmail == "" {
+ profileEmail = profile.Email
+ }
+ if profileEmail == "" {
+ profileEmail = profile.ID
+ }
+ if profileEmail == "" {
+ if mode == tokenProfileErrorOAuth {
+ return "", fmt.Errorf(
+ "could not verify token belongs to %s: "+
+ "profile response did not include an email address "+
+ "(re-run the command to try again)", email)
+ }
+ return "", fmt.Errorf("parse profile for %s: response did not include an email address", email)
+ }
+
+ if !sameGoogleAccount(email, profileEmail) {
+ return "", &TokenMismatchError{Expected: email, Actual: profileEmail}
}
- return profile.EmailAddress, nil
+ return profileEmail, nil
}
// ValidateTokenEmail calls the Gmail profile API to confirm that the token
diff --git a/internal/oauth/oauth_test.go b/internal/oauth/oauth_test.go
index 0c1d83241..32d466a5a 100644
--- a/internal/oauth/oauth_test.go
+++ b/internal/oauth/oauth_test.go
@@ -580,6 +580,38 @@ func TestAuthorize_SavesUnderOriginalIdentifier(t *testing.T) {
assert.Error(err, "token should NOT exist under canonical %q", canonicalEmail)
}
+func TestAuthorize_CalendarOnlyUsesCalendarProfileID(t *testing.T) {
+ require := requirepkg.New(t)
+ assert := assertpkg.New(t)
+
+ srv := httptest.NewServer(http.HandlerFunc(
+ func(w http.ResponseWriter, r *http.Request) {
+ assert.Equal("Bearer calendar-token", r.Header.Get("Authorization"), "Authorization")
+ w.Header().Set("Content-Type", "application/json")
+ _, _ = fmt.Fprint(w, `{"id":"user@gmail.com"}`)
+ }))
+ defer srv.Close()
+
+ mgr := setupTestManager(t, ScopesCalendar)
+ mgr.profileURL = srv.URL
+ mgr.browserFlowFn = func(
+ _ context.Context, _ string, _ bool,
+ ) (*oauth2.Token, error) {
+ return &oauth2.Token{
+ AccessToken: "calendar-token",
+ TokenType: "Bearer",
+ Expiry: time.Now().Add(time.Hour),
+ }, nil
+ }
+
+ require.NoError(mgr.Authorize(context.Background(), "user@gmail.com"), "Authorize")
+
+ loaded, err := mgr.loadTokenFile("user@gmail.com")
+ require.NoError(err, "loadTokenFile")
+ assert.Equal("calendar-token", loaded.AccessToken, "access token")
+ assert.ElementsMatch(ScopesCalendar, loaded.Scopes, "saved scopes")
+}
+
// TestAuthorize_RejectsMismatch verifies that authorize() rejects
// tokens where the profile email is for a different account and
// does NOT persist a token file.
diff --git a/internal/oauth/profile_test.go b/internal/oauth/profile_test.go
index 2a642c3d9..05f5c1c75 100644
--- a/internal/oauth/profile_test.go
+++ b/internal/oauth/profile_test.go
@@ -28,6 +28,12 @@ func TestFetchTokenProfileEmail(t *testing.T) {
body: `{"emailAddress":"user@gmail.com"}`,
wantEmail: "user@gmail.com",
},
+ {
+ name: "calendar primary id",
+ statusCode: http.StatusOK,
+ body: `{"id":"user@gmail.com"}`,
+ wantEmail: "user@gmail.com",
+ },
{
name: "mismatch",
statusCode: http.StatusOK,
@@ -83,3 +89,18 @@ func TestFetchTokenProfileEmail(t *testing.T) {
})
}
}
+
+func TestTokenProfileEndpointForScopes(t *testing.T) {
+ assert := assertpkg.New(t)
+
+ calendar := tokenProfileEndpointForScopes(ScopesCalendar)
+ assert.Equal(defaultCalendarProfileURL, calendar.url, "Calendar-only grants validate through Calendar")
+ assert.Equal("Calendar API", calendar.serviceName, "Calendar-only service name")
+
+ gmail := tokenProfileEndpointForScopes(Scopes)
+ assert.Equal(defaultProfileURL, gmail.url, "Gmail grants keep Gmail profile validation")
+ assert.Equal("Gmail API", gmail.serviceName, "Gmail service name")
+
+ combined := tokenProfileEndpointForScopes(ScopesGmailCalendar)
+ assert.Equal(defaultProfileURL, combined.url, "combined Gmail+Calendar grants keep Gmail profile validation")
+}
diff --git a/internal/query/duckdb.go b/internal/query/duckdb.go
index e4a31059c..f24265170 100644
--- a/internal/query/duckdb.go
+++ b/internal/query/duckdb.go
@@ -1643,6 +1643,19 @@ func (e *DuckDBEngine) Search(ctx context.Context, q *search.Query, limit, offse
conditions = append(conditions, "m.has_attachments = 1")
}
+ // message_type: filter — keep the non-delegated DuckDB fallback in sync
+ // with the SQLite FTS path so message_type scoping is honored regardless
+ // of which engine serves the search.
+ if len(q.MessageTypes) > 0 {
+ placeholders := make([]string, len(q.MessageTypes))
+ for i, typ := range q.MessageTypes {
+ placeholders[i] = "?"
+ args = append(args, typ)
+ }
+ conditions = append(conditions,
+ "m.message_type IN ("+strings.Join(placeholders, ",")+")")
+ }
+
// Date range filters
if q.AfterDate != nil {
conditions = append(conditions, "m.sent_at >= CAST(? AS TIMESTAMP)")
@@ -2460,11 +2473,24 @@ func (e *DuckDBEngine) buildSearchConditions(q *search.Query, filter MessageFilt
var args []any
// Apply basic filter conditions (ignoring join flags for search - we handle those differently)
- if len(q.MessageTypes) == 0 {
+ conditions = append(conditions,
+ store.LiveMessagesWhere("msg", filter.HideDeletedFromSource),
+ )
+ messageTypes, noMessageTypeMatches := scopedMessageTypes(q.MessageTypes, filter.MessageType)
+ switch {
+ case noMessageTypeMatches:
+ conditions = append(conditions, "1=0")
+ case len(messageTypes) > 0:
+ placeholders := make([]string, len(messageTypes))
+ for i, typ := range messageTypes {
+ placeholders[i] = "?"
+ args = append(args, typ)
+ }
+ conditions = append(conditions, "msg.message_type IN ("+strings.Join(placeholders, ",")+")")
+ default:
// Restrict to email messages only; NULL and '' handle pre-message_type data.
conditions = append(conditions, emailOnlyFilterMsg)
}
- conditions = append(conditions, store.LiveMessagesWhere("msg", filter.HideDeletedFromSource))
conditions, args = appendSourceFilter(conditions, args, "msg.", filter.SourceID, filter.SourceIDs)
if filter.After != nil {
conditions = append(conditions, "msg.sent_at >= CAST(? AS TIMESTAMP)")
diff --git a/internal/query/duckdb_test.go b/internal/query/duckdb_test.go
index a10e428c9..37671f375 100644
--- a/internal/query/duckdb_test.go
+++ b/internal/query/duckdb_test.go
@@ -788,6 +788,64 @@ func TestDuckDBEngine_AggregateBySenderName_SearchByPhone(t *testing.T) {
assertpkg.Len(t, results, 1, "phone search should isolate the phone-only sender")
}
+func TestDuckDBEngine_SearchFast_MessageTypeConflictReturnsNoMatches(t *testing.T) {
+ b := NewTestDataBuilder(t)
+ b.AddSource("test@gmail.com")
+ b.AddMessage(MessageOpt{
+ Subject: "zzducktypeterm email",
+ SentAt: makeDate(1, 15),
+ SizeEstimate: 1000,
+ MessageType: "email",
+ })
+ b.AddMessage(MessageOpt{
+ Subject: "zzducktypeterm sms",
+ SentAt: makeDate(1, 16),
+ SizeEstimate: 1000,
+ MessageType: messageTypeSMS,
+ })
+ b.SetEmptyAttachments()
+ engine := b.BuildEngine()
+
+ q := &search.Query{
+ TextTerms: []string{"zzducktypeterm"},
+ MessageTypes: []string{"email"},
+ }
+ results, err := engine.SearchFast(context.Background(), q, MessageFilter{MessageType: messageTypeSMS}, 100, 0)
+
+ requirepkg.NoError(t, err)
+ assertpkg.Empty(t, results)
+ assertpkg.Equal(t, []string{"email"}, q.MessageTypes, "base query MessageTypes must not be mutated")
+}
+
+func TestDuckDBEngine_SearchFast_MessageTypeFilterReturnsScopedType(t *testing.T) {
+ b := NewTestDataBuilder(t)
+ b.AddSource("test@gmail.com")
+ b.AddMessage(MessageOpt{
+ Subject: "zzducktypeterm email",
+ SentAt: makeDate(1, 15),
+ SizeEstimate: 1000,
+ MessageType: "email",
+ })
+ b.AddMessage(MessageOpt{
+ Subject: "zzducktypeterm sms",
+ SentAt: makeDate(1, 16),
+ SizeEstimate: 1000,
+ MessageType: messageTypeSMS,
+ })
+ b.SetEmptyAttachments()
+ engine := b.BuildEngine()
+
+ q := &search.Query{
+ TextTerms: []string{"zzducktypeterm"},
+ MessageTypes: []string{messageTypeSMS},
+ }
+ results, err := engine.SearchFast(context.Background(), q, MessageFilter{}, 100, 0)
+
+ requirepkg.NoError(t, err)
+ requirepkg.Len(t, results, 1)
+ assertpkg.Equal(t, messageTypeSMS, results[0].MessageType)
+}
+
func TestDuckDBEngine_ListMessages_MatchEmptySenderName(t *testing.T) {
// Build Parquet data with a message that has no sender
b := NewTestDataBuilder(t)
diff --git a/internal/query/message_type_filter_test.go b/internal/query/message_type_filter_test.go
new file mode 100644
index 000000000..693e3fe69
--- /dev/null
+++ b/internal/query/message_type_filter_test.go
@@ -0,0 +1,165 @@
+package query
+
+import (
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+ "go.kenn.io/msgvault/internal/search"
+ "go.kenn.io/msgvault/internal/testutil/dbtest"
+)
+
+// TestSearch_MessageTypeFilterCoversMultipleTypes is a regression test for the local FTS search
+// path honoring the message_type: filter. Before this fix, buildSearchQueryParts
+// (and the DuckDB Search fallback) silently dropped q.MessageTypes, so
+// `msgvault search --mode=fts` ignored message_type scoping for every non-email
+// type (sms, whatsapp, calendar_event, ...). The store/api.go path already
+// filtered correctly; this brings the query engine in line.
+func TestSearch_MessageTypeFilterCoversMultipleTypes(t *testing.T) {
+ env := newTestEnv(t)
+
+ // A unique term so we only count the rows we add, independent of the
+ // standard seed data set.
+ const term = "zzmsgtypeterm"
+
+ emailID := env.AddMessage(dbtest.MessageOpts{
+ Subject: term + " email edition",
+ SentAt: "2024-05-01 10:00:00",
+ MessageType: "email",
+ })
+ eventID := env.AddMessage(dbtest.MessageOpts{
+ Subject: term + " calendar edition",
+ SentAt: "2024-05-02 10:00:00",
+ MessageType: "calendar_event",
+ })
+ smsID := env.AddMessage(dbtest.MessageOpts{
+ Subject: term + " sms edition",
+ SentAt: "2024-05-03 10:00:00",
+ MessageType: messageTypeSMS,
+ })
+
+ // Index everything (one-time populate, so add rows first).
+ env.EnableFTS()
+
+ idsOf := func(results []MessageSummary) map[int64]bool {
+ m := make(map[int64]bool, len(results))
+ for _, r := range results {
+ m[r.ID] = true
+ }
+ return m
+ }
+
+ t.Run("no type filter returns all three (email unaffected)", func(t *testing.T) {
+ results := env.MustSearch(&search.Query{TextTerms: []string{term}}, 100, 0)
+ ids := idsOf(results)
+ assert.True(t, ids[emailID], "email should match when no type filter is set")
+ assert.True(t, ids[eventID], "calendar_event should match when no type filter is set")
+ assert.True(t, ids[smsID], "sms should match when no type filter is set")
+ })
+
+ t.Run("scope to calendar_event", func(t *testing.T) {
+ results := env.MustSearch(&search.Query{
+ TextTerms: []string{term},
+ MessageTypes: []string{"calendar_event"},
+ }, 100, 0)
+ require.Len(t, results, 1)
+ assert.Equal(t, eventID, results[0].ID)
+ })
+
+ t.Run("scope to email excludes calendar/sms", func(t *testing.T) {
+ results := env.MustSearch(&search.Query{
+ TextTerms: []string{term},
+ MessageTypes: []string{"email"},
+ }, 100, 0)
+ require.Len(t, results, 1)
+ assert.Equal(t, emailID, results[0].ID)
+ })
+
+ t.Run("multi-type IN filter", func(t *testing.T) {
+ assert := assert.New(t)
+ results := env.MustSearch(&search.Query{
+ TextTerms: []string{term},
+ MessageTypes: []string{"calendar_event", messageTypeSMS},
+ }, 100, 0)
+ ids := idsOf(results)
+ assert.Len(results, 2)
+ assert.True(ids[eventID])
+ assert.True(ids[smsID])
+ assert.False(ids[emailID], "email must be excluded by the IN filter")
+ })
+
+ t.Run("structured-only (no text term) still scopes by type", func(t *testing.T) {
+ results := env.MustSearch(&search.Query{
+ SubjectTerms: []string{term},
+ MessageTypes: []string{"calendar_event"},
+ }, 100, 0)
+ require.Len(t, results, 1)
+ assert.Equal(t, eventID, results[0].ID)
+ })
+}
+
+// TestMergeFilterIntoQuery_MessageType verifies that the SearchFast drill-down
+// path (MergeFilterIntoQuery) carries a MessageFilter.MessageType into the
+// query's MessageTypes, so type-scoped views (e.g. Texts mode) don't silently
+// widen back to all message types during an in-view search.
+func TestMergeFilterIntoQuery_MessageType(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ env := newTestEnv(t)
+ const term = "zzmergeterm"
+
+ _ = env.AddMessage(dbtest.MessageOpts{
+ Subject: term + " email",
+ MessageType: "email",
+ })
+ smsID := env.AddMessage(dbtest.MessageOpts{
+ Subject: term + " text",
+ MessageType: messageTypeSMS,
+ })
+ env.EnableFTS()
+
+ base := &search.Query{TextTerms: []string{term}}
+ merged := MergeFilterIntoQuery(base, MessageFilter{MessageType: messageTypeSMS})
+
+ // The original query must not be mutated.
+ assert.Empty(base.MessageTypes, "base query MessageTypes must not be mutated")
+ require.Equal([]string{messageTypeSMS}, merged.MessageTypes)
+
+ results := env.MustSearch(merged, 100, 0)
+ require.Len(results, 1)
+ assert.Equal(smsID, results[0].ID)
+}
+
+func TestMergeFilterIntoQuery_MessageTypeIntersectsExistingQueryTypes(t *testing.T) {
+ base := &search.Query{MessageTypes: []string{"email", messageTypeSMS}}
+ merged := MergeFilterIntoQuery(base, MessageFilter{MessageType: messageTypeSMS})
+
+ assert.Equal(t, []string{"email", messageTypeSMS}, base.MessageTypes,
+ "base query MessageTypes must not be mutated")
+ assert.Equal(t, []string{messageTypeSMS}, merged.MessageTypes)
+}
+
+func TestSearchFast_MessageTypeConflictReturnsNoMatches(t *testing.T) {
+ env := newTestEnv(t)
+ const term = "zzconflictingtype"
+
+ _ = env.AddMessage(dbtest.MessageOpts{
+ Subject: term + " email",
+ MessageType: "email",
+ })
+ _ = env.AddMessage(dbtest.MessageOpts{
+ Subject: term + " sms",
+ MessageType: messageTypeSMS,
+ })
+ env.EnableFTS()
+
+ q := &search.Query{
+ TextTerms: []string{term},
+ MessageTypes: []string{"email"},
+ }
+ results, err := env.Engine.SearchFast(env.Ctx, q, MessageFilter{MessageType: messageTypeSMS}, 100, 0)
+
+ require.NoError(t, err)
+ assert.Empty(t, results)
+ assert.Equal(t, []string{"email"}, q.MessageTypes, "base query MessageTypes must not be mutated")
+}
diff --git a/internal/query/sqlite.go b/internal/query/sqlite.go
index 88d78b8de..430f28278 100644
--- a/internal/query/sqlite.go
+++ b/internal/query/sqlite.go
@@ -6,6 +6,7 @@ import (
"errors"
"fmt"
"log"
+ "slices"
"strings"
"sync"
"time"
@@ -1497,6 +1498,20 @@ func (e *SQLiteEngine) buildSearchQueryParts(ctx context.Context, q *search.Quer
}
}
+ // message_type: filter (e.g. sms, whatsapp, calendar_event). The store
+ // API path (store/api.go) honors q.MessageTypes; the FTS query path must
+ // too, or `--mode=fts` search silently ignores message_type scoping for
+ // every non-email type. Mirrors the store/api.go IN(...) clause.
+ if len(q.MessageTypes) > 0 {
+ placeholders := make([]string, len(q.MessageTypes))
+ for i, typ := range q.MessageTypes {
+ placeholders[i] = "?"
+ args = append(args, typ)
+ }
+ conditions = append(conditions,
+ "m.message_type IN ("+strings.Join(placeholders, ",")+")")
+ }
+
// Has attachment filter
if q.HasAttachment != nil && *q.HasAttachment {
conditions = append(conditions, e.dialect.BoolTrueExpr("m.has_attachments"))
@@ -1683,12 +1698,13 @@ func (e *SQLiteEngine) executeSearchQuery(ctx context.Context, conditions []stri
}
// MergeFilterIntoQuery combines a MessageFilter context with a search.Query.
-// Context filters are appended to existing query filters.
+// Most context filters are appended to existing query filters.
//
// Note on semantics: Appending to FromAddrs/ToAddrs produces OR semantics
// within each dimension (IN clause). Labels use per-term EXISTS subqueries
-// with AND semantics (message must have all labels). Context filters widen
-// the search within other constraints.
+// with AND semantics (message must have all labels). MessageType and date
+// filters are scoped intersections so an in-view search cannot widen outside
+// the current drill-down context.
func MergeFilterIntoQuery(q *search.Query, filter MessageFilter) *search.Query {
// Copy all fields from original query (preserves any future non-slice fields)
merged := *q
@@ -1702,6 +1718,7 @@ func MergeFilterIntoQuery(q *search.Query, filter MessageFilter) *search.Query {
merged.BccAddrs = append([]string(nil), q.BccAddrs...)
merged.SubjectTerms = append([]string(nil), q.SubjectTerms...)
merged.Labels = append([]string(nil), q.Labels...)
+ merged.MessageTypes = append([]string(nil), q.MessageTypes...)
// Deep-copy AccountIDs alongside the other slices so the merged
// query never aliases the original's slice header. Filter overrides
// below replace the deep-copied slice when set.
@@ -1736,6 +1753,17 @@ func MergeFilterIntoQuery(q *search.Query, filter MessageFilter) *search.Query {
merged.Labels = append(merged.Labels, filter.Label)
}
+ // message_type filter - scope FTS search to the drill-down context's
+ // type (e.g. Texts mode → sms/mms). Without this, SearchFast within a
+ // type-scoped view would silently widen back to all message types.
+ if filter.MessageType != "" {
+ messageTypes, noMatches := scopedMessageTypes(merged.MessageTypes, filter.MessageType)
+ merged.MessageTypes = messageTypes
+ if noMatches {
+ merged.AccountIDs = []int64{}
+ }
+ }
+
// Attachment filter - set if context requires attachments
if filter.WithAttachmentsOnly {
hasAttachment := true
@@ -1792,6 +1820,23 @@ func MergeFilterIntoQuery(q *search.Query, filter MessageFilter) *search.Query {
return &merged
}
+func containsMessageType(types []string, want string) bool {
+ return slices.Contains(types, want)
+}
+
+func scopedMessageTypes(queryTypes []string, filterType string) ([]string, bool) {
+ if filterType == "" {
+ return append([]string(nil), queryTypes...), false
+ }
+ if len(queryTypes) == 0 {
+ return []string{filterType}, false
+ }
+ if containsMessageType(queryTypes, filterType) {
+ return []string{filterType}, false
+ }
+ return []string{filterType}, true
+}
+
// timePeriodToBounds converts a time period string to half-open date
// bounds [after, before). Returns ok=false if the format is unrecognized.
func timePeriodToBounds(period string) (after, before time.Time, ok bool) {
diff --git a/internal/query/text_models.go b/internal/query/text_models.go
index dc9d7572d..9f4e9976d 100644
--- a/internal/query/text_models.go
+++ b/internal/query/text_models.go
@@ -93,9 +93,11 @@ type TextStatsOptions struct {
SearchQuery string
}
+const messageTypeSMS = "sms"
+
// TextMessageTypes lists the message_type values included in Texts mode.
var TextMessageTypes = []string{
- "whatsapp", "imessage", "sms", "mms", "google_voice_text",
+ "whatsapp", "imessage", messageTypeSMS, "mms", "google_voice_text",
}
// textSortFieldToSortField converts a TextSortField to the generic SortField
diff --git a/internal/remote/engine_test.go b/internal/remote/engine_test.go
index dae7242a6..f2e3a7ea3 100644
--- a/internal/remote/engine_test.go
+++ b/internal/remote/engine_test.go
@@ -95,7 +95,6 @@ func TestEngineGetMessageSummariesByIDs_CarriesFromAndAttachmentCount(t *testing
assert.Equal(2, s.AttachmentCount, "AttachmentCount")
assert.True(s.HasAttachments, "HasAttachments")
}
-
func TestEngineSearchSerializesMessageTypes(t *testing.T) {
require := requirepkg.New(t)
assert := assertpkg.New(t)
@@ -125,3 +124,35 @@ func TestEngineSearchSerializesMessageTypes(t *testing.T) {
}, 10, 0)
require.NoError(err, "Search")
}
+
+func TestEngineSearchForwardsMessageTypeOnlyTerms(t *testing.T) {
+ require := requirepkg.New(t)
+ assert := assertpkg.New(t)
+ called := false
+ srv := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ called = true
+ assert.Equal("/api/v1/search/deep", r.URL.Path, "path")
+ assert.Equal("message_type:sms", r.URL.Query().Get("q"), "q")
+ w.Header().Set("Content-Type", "application/json")
+ _ = json.NewEncoder(w).Encode(map[string]any{
+ "query": r.URL.Query().Get("q"),
+ "messages": []map[string]any{},
+ "count": 0,
+ "has_more": false,
+ "offset": 0,
+ "limit": 10,
+ })
+ }))
+ defer srv.Close()
+
+ store := &Store{baseURL: srv.URL, httpClient: srv.Client()}
+ engine := NewEngineFromStore(store)
+
+ msgs, err := engine.Search(context.Background(), &search.Query{
+ MessageTypes: []string{"sms"},
+ }, 10, 0)
+
+ require.NoError(err, "Search")
+ assert.Empty(msgs, "messages")
+ assert.True(called, "message_type-only searches must not be treated as empty")
+}
diff --git a/internal/store/api_search_test.go b/internal/store/api_search_test.go
index 32bea8bc2..d3808468c 100644
--- a/internal/store/api_search_test.go
+++ b/internal/store/api_search_test.go
@@ -148,7 +148,6 @@ func TestSearchMessages_LegacyRawString(t *testing.T) {
})
}
}
-
func TestSearchMessagesQuery_MessageTypeFilter(t *testing.T) {
require := requirepkg.New(t)
assert := assertpkg.New(t)
diff --git a/internal/store/calendar_helpers_test.go b/internal/store/calendar_helpers_test.go
new file mode 100644
index 000000000..b7433bd61
--- /dev/null
+++ b/internal/store/calendar_helpers_test.go
@@ -0,0 +1,116 @@
+package store_test
+
+import (
+ "database/sql"
+ "encoding/json"
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+ "go.kenn.io/msgvault/internal/store"
+ "go.kenn.io/msgvault/internal/testutil"
+ "go.kenn.io/msgvault/internal/testutil/storetest"
+)
+
+// TestSetMessageMetadata_RoundTrip proves the new metadata write path persists
+// JSON to the messages.metadata column (JSON on SQLite, JSONB on PG) and reads
+// back semantically intact. Runs under both dialects (make test-pg) so the
+// JSONBindExpr cast is exercised on Postgres.
+func TestSetMessageMetadata_RoundTrip(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ f := storetest.New(t)
+ msgID := f.CreateMessage("cal-meta-1")
+
+ meta := `{"status":"confirmed","all_day":false,"end":"2024-05-01T11:00:00Z","recurrence":["RRULE:FREQ=WEEKLY"]}`
+ require.NoError(f.Store.SetMessageMetadata(msgID, sql.NullString{String: meta, Valid: true}))
+
+ var got sql.NullString
+ err := f.Store.DB().QueryRow(
+ f.Store.Rebind("SELECT metadata FROM messages WHERE id = ?"), msgID,
+ ).Scan(&got)
+ require.NoError(err)
+ require.True(got.Valid, "metadata should be non-NULL after write")
+
+ // Compare semantically: PG JSONB normalizes whitespace/key order, so a raw
+ // string compare would be brittle across dialects.
+ var parsed map[string]any
+ require.NoError(json.Unmarshal([]byte(got.String), &parsed))
+ assert.Equal("confirmed", parsed["status"])
+ assert.Equal(false, parsed["all_day"])
+ assert.Equal("2024-05-01T11:00:00Z", parsed["end"])
+ rec, ok := parsed["recurrence"].([]any)
+ require.True(ok, "recurrence should round-trip as a JSON array")
+ require.Len(rec, 1)
+ assert.Equal("RRULE:FREQ=WEEKLY", rec[0])
+}
+
+// TestSetMessageMetadata_Clear proves an invalid sql.NullString writes SQL NULL,
+// clearing a previously-set metadata value.
+func TestSetMessageMetadata_Clear(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ f := storetest.New(t)
+ msgID := f.CreateMessage("cal-meta-clear")
+ require.NoError(f.Store.SetMessageMetadata(msgID, sql.NullString{String: `{"a":1}`, Valid: true}))
+
+ require.NoError(f.Store.SetMessageMetadata(msgID, sql.NullString{}))
+
+ var got sql.NullString
+ err := f.Store.DB().QueryRow(
+ f.Store.Rebind("SELECT metadata FROM messages WHERE id = ?"), msgID,
+ ).Scan(&got)
+ require.NoError(err)
+ assert.False(got.Valid, "metadata should be NULL after clear")
+}
+
+// TestGetSourcesByTypeAndAccount proves the account-scoped source lookup that
+// calendar sync uses to enumerate one OAuth account's calendars: it filters by
+// source_type AND sync_config.account_email, decoupled from the per-source
+// identifier (the natural calendar key). A source with NULL/garbage sync_config
+// is skipped, not fatal.
+func TestGetSourcesByTypeAndAccount(t *testing.T) {
+ assert := assert.New(t)
+ require := require.New(t)
+ st := testutil.NewTestStore(t)
+
+ mk := func(identifier, accountEmail string) *store.Source {
+ src, err := st.GetOrCreateSource("gcal", identifier)
+ require.NoError(err)
+ cfg, err := json.Marshal(map[string]string{
+ "account_email": accountEmail,
+ "calendar_id": identifier,
+ })
+ require.NoError(err)
+ require.NoError(st.UpdateSourceSyncConfig(src.ID, string(cfg)))
+ return src
+ }
+
+ a1 := mk("a@example.com/primary", "a@example.com")
+ a2 := mk("a@example.com/work", "a@example.com")
+ bSrc := mk("b@example.com/primary", "b@example.com")
+
+ // A gcal source with NULL sync_config must be skipped (not matched, not fatal).
+ noCfg, err := st.GetOrCreateSource("gcal", "orphan-calendar")
+ require.NoError(err)
+ // A same-email gmail source must not bleed into the gcal-typed result.
+ _, err = st.GetOrCreateSource("gmail", "a@example.com")
+ require.NoError(err)
+
+ got, err := st.GetSourcesByTypeAndAccount("gcal", "a@example.com")
+ require.NoError(err)
+ require.Len(got, 2)
+ ids := map[int64]bool{}
+ for _, s := range got {
+ ids[s.ID] = true
+ assert.Equal("gcal", s.SourceType)
+ }
+ assert.True(ids[a1.ID])
+ assert.True(ids[a2.ID])
+ assert.False(ids[bSrc.ID], "account B's calendar must not be returned")
+ assert.False(ids[noCfg.ID], "NULL sync_config source must be skipped")
+
+ none, err := st.GetSourcesByTypeAndAccount("gcal", "nobody@example.com")
+ require.NoError(err)
+ assert.Empty(none)
+}
diff --git a/internal/store/embed_gen.go b/internal/store/embed_gen.go
index e7517283a..3218a0fbc 100644
--- a/internal/store/embed_gen.go
+++ b/internal/store/embed_gen.go
@@ -3,6 +3,7 @@ package store
import (
"context"
"fmt"
+ "sort"
"strings"
)
@@ -33,16 +34,26 @@ var embedGenStampChunkRows = 500
// the embeddings upsert — the worker orders the steps (upsert, then
// stamp) and relies on idempotency, see internal/vector/embed/worker.go.
func (s *Store) ScanForEmbedding(ctx context.Context, target int64, afterID int64, limit int) ([]int64, error) {
+ return s.ScanForEmbeddingWithMessageTypes(ctx, target, afterID, limit, nil)
+}
+
+// ScanForEmbeddingWithMessageTypes is ScanForEmbedding constrained to a
+// configured message_type scope. An empty messageTypes slice means unscoped.
+func (s *Store) ScanForEmbeddingWithMessageTypes(ctx context.Context, target int64, afterID int64, limit int, messageTypes []string) ([]int64, error) {
if limit <= 0 {
return nil, nil
}
- q := `SELECT id FROM messages
- WHERE (embed_gen IS NULL OR embed_gen <> ?)
+ where := `(embed_gen IS NULL OR embed_gen <> ?)
AND ` + LiveMessagesWhere("", true) + `
- AND id > ?
+ AND id > ?`
+ args := []any{target, afterID}
+ where, args = appendMessageTypeScope(where, args, messageTypes)
+ q := `SELECT id FROM messages
+ WHERE ` + where + `
ORDER BY id
LIMIT ?`
- rows, err := s.db.QueryContext(ctx, q, target, afterID, limit)
+ args = append(args, limit)
+ rows, err := s.db.QueryContext(ctx, q, args...)
if err != nil {
return nil, fmt.Errorf("scan for embedding: %w", err)
}
@@ -239,14 +250,22 @@ func (s *Store) ResetEmbedGen(ctx context.Context, ids []int64) error {
// activeGen == 0 means "no active/target generation"; then everything
// live is missing and stamped is 0.
func (s *Store) CoverageCounts(ctx context.Context, activeGen int64) (live, stamped, blank, missing int64, err error) {
- live, err = s.countLiveMessages(ctx)
+ return s.CoverageCountsWithMessageTypes(ctx, activeGen, nil)
+}
+
+// CoverageCountsWithMessageTypes is CoverageCounts constrained to a
+// configured message_type scope. An empty messageTypes slice means unscoped.
+func (s *Store) CoverageCountsWithMessageTypes(ctx context.Context, activeGen int64, messageTypes []string) (live, stamped, blank, missing int64, err error) {
+ live, err = s.countLiveMessagesWithMessageTypes(ctx, messageTypes)
if err != nil {
return 0, 0, 0, 0, err
}
if activeGen != 0 {
- q := `SELECT COUNT(*) FROM messages
- WHERE embed_gen = ? AND ` + LiveMessagesWhere("", true)
- if err := s.db.QueryRowContext(ctx, q, activeGen).Scan(&stamped); err != nil {
+ where := `embed_gen = ? AND ` + LiveMessagesWhere("", true)
+ args := []any{activeGen}
+ where, args = appendMessageTypeScope(where, args, messageTypes)
+ q := `SELECT COUNT(*) FROM messages WHERE ` + where
+ if err := s.db.QueryRowContext(ctx, q, args...).Scan(&stamped); err != nil {
return 0, 0, 0, 0, fmt.Errorf("count stamped: %w", err)
}
}
@@ -259,7 +278,13 @@ func (s *Store) CoverageCounts(ctx context.Context, activeGen int64) (live, stam
// activeGen). It is a thin accessor for the scheduler/CLI activation
// gates, which only consult the missing count; missing = live - stamped.
func (s *Store) MissingCount(ctx context.Context, activeGen int64) (int64, error) {
- live, err := s.countLiveMessages(ctx)
+ return s.MissingCountWithMessageTypes(ctx, activeGen, nil)
+}
+
+// MissingCountWithMessageTypes is MissingCount constrained to a configured
+// message_type scope. An empty messageTypes slice means unscoped.
+func (s *Store) MissingCountWithMessageTypes(ctx context.Context, activeGen int64, messageTypes []string) (int64, error) {
+ live, err := s.countLiveMessagesWithMessageTypes(ctx, messageTypes)
if err != nil {
return 0, err
}
@@ -267,21 +292,53 @@ func (s *Store) MissingCount(ctx context.Context, activeGen int64) (int64, error
return live, nil
}
var stamped int64
- q := `SELECT COUNT(*) FROM messages
- WHERE embed_gen = ? AND ` + LiveMessagesWhere("", true)
- if err := s.db.QueryRowContext(ctx, q, activeGen).Scan(&stamped); err != nil {
+ where := `embed_gen = ? AND ` + LiveMessagesWhere("", true)
+ args := []any{activeGen}
+ where, args = appendMessageTypeScope(where, args, messageTypes)
+ q := `SELECT COUNT(*) FROM messages WHERE ` + where
+ if err := s.db.QueryRowContext(ctx, q, args...).Scan(&stamped); err != nil {
return 0, fmt.Errorf("count stamped: %w", err)
}
return max(live-stamped, 0), nil
}
-// countLiveMessages returns the total live-message count. Shared by
-// CoverageCounts; kept separate so the live-predicate stays in one place.
-func (s *Store) countLiveMessages(ctx context.Context) (int64, error) {
+func (s *Store) countLiveMessagesWithMessageTypes(ctx context.Context, messageTypes []string) (int64, error) {
var n int64
- q := `SELECT COUNT(*) FROM messages WHERE ` + LiveMessagesWhere("", true)
- if err := s.db.QueryRowContext(ctx, q).Scan(&n); err != nil {
+ where, args := appendMessageTypeScope(LiveMessagesWhere("", true), nil, messageTypes)
+ q := `SELECT COUNT(*) FROM messages WHERE ` + where
+ if err := s.db.QueryRowContext(ctx, q, args...).Scan(&n); err != nil {
return 0, fmt.Errorf("count live messages: %w", err)
}
return n, nil
}
+
+func appendMessageTypeScope(where string, args []any, messageTypes []string) (string, []any) {
+ messageTypes = normalizeMessageTypes(messageTypes)
+ if len(messageTypes) == 0 {
+ return where, args
+ }
+ placeholders := make([]string, len(messageTypes))
+ for i, typ := range messageTypes {
+ placeholders[i] = "?"
+ args = append(args, typ)
+ }
+ return where + " AND message_type IN (" + strings.Join(placeholders, ",") + ")", args
+}
+
+func normalizeMessageTypes(messageTypes []string) []string {
+ seen := make(map[string]struct{}, len(messageTypes))
+ out := make([]string, 0, len(messageTypes))
+ for _, typ := range messageTypes {
+ typ = strings.TrimSpace(strings.ToLower(typ))
+ if typ == "" {
+ continue
+ }
+ if _, ok := seen[typ]; ok {
+ continue
+ }
+ seen[typ] = struct{}{}
+ out = append(out, typ)
+ }
+ sort.Strings(out)
+ return out
+}
diff --git a/internal/store/messages.go b/internal/store/messages.go
index 60e6aba4f..e8d319b38 100644
--- a/internal/store/messages.go
+++ b/internal/store/messages.go
@@ -89,6 +89,41 @@ func (s *Store) MessageExistsBatch(sourceID int64, sourceMessageIDs []string) (m
return result, nil
}
+// SetMessageMetadata writes the messages.metadata JSON/JSONB column for an
+// already-persisted message. The column exists in both dialects (schema.sql:
+// `metadata JSON`, schema_pg.sql: `metadata JSONB`) but the hot upsertMessageSQL
+// path never writes it, so non-email importers that need structured per-message
+// metadata (e.g. calendar events: end/all_day/status/recurrence) call this
+// immediately after UpsertMessage returns the id. Passing an invalid
+// sql.NullString writes SQL NULL, clearing the column. The dialect supplies the
+// JSONB cast on PG (?::JSONB) and a bare ? on SQLite, so a JSON string binds in
+// both backends.
+func (s *Store) SetMessageMetadata(messageID int64, metadata sql.NullString) error {
+ _, err := s.db.Exec(fmt.Sprintf(`
+ UPDATE messages
+ SET metadata = %s
+ WHERE id = ?
+ `, s.dialect.JSONBindExpr()), metadata, messageID)
+ if err != nil {
+ return fmt.Errorf("set message metadata (id=%d): %w", messageID, err)
+ }
+ return err
+}
+
+// GetMessageMetadata reads the messages.metadata column for a message. It is
+// the read counterpart to SetMessageMetadata; importers use it to merge a flag
+// into existing metadata (e.g. flipping a calendar event to status=cancelled)
+// without losing the rest of the stored JSON. Returns an invalid NullString when
+// the column is NULL.
+func (s *Store) GetMessageMetadata(messageID int64) (sql.NullString, error) {
+ var meta sql.NullString
+ err := s.db.QueryRow(`SELECT metadata FROM messages WHERE id = ?`, messageID).Scan(&meta)
+ if err != nil {
+ return sql.NullString{}, fmt.Errorf("get message metadata (id=%d): %w", messageID, err)
+ }
+ return meta, nil
+}
+
// GetMessageIDByRFC822ID returns the internal ID of a message
// with the given RFC822 Message-ID for this source, or 0 if
// no match exists.
@@ -496,8 +531,30 @@ func replaceMessageRecipientsTx(tx *loggedTx, messageID int64, rs RecipientSet)
return nil
}
+ // Collapse duplicate participants within this set. The table holds at most
+ // one row per (message_id, participant_id, recipient_type), so a participant
+ // repeated in one call — a calendar event listing the same attendee twice, or
+ // two address forms that resolve to the same participant — is redundant and
+ // would otherwise trip the UNIQUE constraint and abort the entire write. The
+ // first occurrence's display name wins.
+ seen := make(map[int64]struct{}, len(rs.ParticipantIDs))
+ ids := make([]int64, 0, len(rs.ParticipantIDs))
+ names := make([]string, 0, len(rs.ParticipantIDs))
+ for i, pid := range rs.ParticipantIDs {
+ if _, dup := seen[pid]; dup {
+ continue
+ }
+ seen[pid] = struct{}{}
+ ids = append(ids, pid)
+ name := ""
+ if i < len(rs.DisplayNames) {
+ name = rs.DisplayNames[i]
+ }
+ names = append(names, name)
+ }
+
return insertInChunks(tx, chunkInsert{
- totalRows: len(rs.ParticipantIDs),
+ totalRows: len(ids),
valuesPerRow: 4,
prefix: "INSERT INTO message_recipients (message_id, participant_id, recipient_type, display_name) VALUES ",
}, func(start, end int) ([]string, []any) {
@@ -505,11 +562,7 @@ func replaceMessageRecipientsTx(tx *loggedTx, messageID int64, rs RecipientSet)
args := make([]any, 0, (end-start)*4)
for i := start; i < end; i++ {
values[i-start] = "(?, ?, ?, ?)"
- displayName := ""
- if i < len(rs.DisplayNames) {
- displayName = rs.DisplayNames[i]
- }
- args = append(args, messageID, rs.ParticipantIDs[i], rs.Type, displayName)
+ args = append(args, messageID, ids[i], rs.Type, names[i])
}
return values, args
})
diff --git a/internal/store/sources.go b/internal/store/sources.go
index 0698a89e9..5d6acbc77 100644
--- a/internal/store/sources.go
+++ b/internal/store/sources.go
@@ -3,6 +3,7 @@ package store
import (
"context"
"database/sql"
+ "encoding/json"
"errors"
"fmt"
)
@@ -122,6 +123,40 @@ func (s *Store) GetSourcesByDisplayName(displayName string) ([]*Source, error) {
return sources, rows.Err()
}
+// GetSourcesByTypeAndAccount returns every source of the given source_type
+// whose sync_config JSON carries the given account_email.
+//
+// Config-driven sources (calendar, and any future per-account fan-out) decouple
+// their per-source identifier — a natural key like a calendarId — from the
+// OAuth account/token key, which lives in sync_config.account_email. A single
+// account may own many sources (e.g. several calendars), all sharing one token
+// file. Filtering happens in Go after a typed list query so it stays
+// dialect-portable (no SQLite json_extract vs PG ->> divergence); the set of one
+// account's sources is small, so this is not a hot path. A source whose
+// sync_config is NULL or unparseable is skipped rather than aborting the scan.
+func (s *Store) GetSourcesByTypeAndAccount(sourceType, accountEmail string) ([]*Source, error) {
+ all, err := s.ListSources(sourceType)
+ if err != nil {
+ return nil, fmt.Errorf("list sources by type %q: %w", sourceType, err)
+ }
+ var matched []*Source
+ for _, src := range all {
+ if !src.SyncConfig.Valid {
+ continue
+ }
+ var cfg struct {
+ AccountEmail string `json:"account_email"`
+ }
+ if err := json.Unmarshal([]byte(src.SyncConfig.String), &cfg); err != nil {
+ continue
+ }
+ if cfg.AccountEmail == accountEmail {
+ matched = append(matched, src)
+ }
+ }
+ return matched, nil
+}
+
// RemoveSource deletes a source and all its associated data.
// FTS5 rows are cleaned up explicitly (no FK cascade for virtual tables).
// CASCADE handles conversations, messages, labels, attachments, sync state.
diff --git a/internal/store/store_test.go b/internal/store/store_test.go
index f82be377c..1377e9639 100644
--- a/internal/store/store_test.go
+++ b/internal/store/store_test.go
@@ -806,6 +806,28 @@ func TestStore_ReplaceMessageRecipients_Empty(t *testing.T) {
f.AssertRecipientCount(msgID, "to", 0)
}
+// TestStore_ReplaceMessageRecipients_DuplicateParticipants is the regression for
+// the production crash where one recipient set contained the same participant
+// twice (e.g. a calendar event listing the same attendee twice, or two address
+// forms resolving to one participant). The plain INSERT tripped the
+// UNIQUE(message_id, participant_id, recipient_type) constraint and aborted the
+// write. Duplicates must collapse to a single row instead of erroring.
+func TestStore_ReplaceMessageRecipients_DuplicateParticipants(t *testing.T) {
+ f := storetest.New(t)
+
+ msgID := f.CreateMessage("msg-dup-recip")
+ pid1 := f.EnsureParticipant("alice@example.com", "Alice", "example.com")
+ pid2 := f.EnsureParticipant("bob@example.com", "Bob", "example.com")
+
+ // alice appears twice; this must not error and must store one row per
+ // distinct participant.
+ err := f.Store.ReplaceMessageRecipients(msgID, "to",
+ []int64{pid1, pid1, pid2}, []string{"Alice", "Alice (dup)", "Bob"})
+ requirepkg.NoError(t, err, "duplicate participant IDs must not error")
+
+ f.AssertRecipientCount(msgID, "to", 2)
+}
+
func TestStore_GetActiveSync_NoSync(t *testing.T) {
f := storetest.New(t)
f.AssertNoActiveSync()
diff --git a/internal/synctechsms/importer.go b/internal/synctechsms/importer.go
index c16141e7e..d414c97ab 100644
--- a/internal/synctechsms/importer.go
+++ b/internal/synctechsms/importer.go
@@ -98,39 +98,44 @@ func (i *Importer) importRecord(sourceID int64, record Record, summary *ImportSu
switch record.Kind {
case RecordSMS:
if i.opts.IncludeSMS && record.SMS != nil {
- if err := i.importSMS(sourceID, *record.SMS); err != nil {
+ msgID, err := i.importSMS(sourceID, *record.SMS)
+ if err != nil {
return err
}
+ summary.MessageIDs = append(summary.MessageIDs, msgID)
summary.SMSImported++
}
case RecordMMS:
if i.opts.IncludeMMS && record.MMS != nil {
- attachments, err := i.importMMS(sourceID, *record.MMS)
+ msgID, attachments, err := i.importMMS(sourceID, *record.MMS)
if err != nil {
return err
}
+ summary.MessageIDs = append(summary.MessageIDs, msgID)
summary.MMSImported++
summary.AttachmentsImported += attachments
}
case RecordCall:
if i.opts.IncludeCalls && record.Call != nil {
- if err := i.importCall(sourceID, *record.Call); err != nil {
+ msgID, err := i.importCall(sourceID, *record.Call)
+ if err != nil {
return err
}
+ summary.MessageIDs = append(summary.MessageIDs, msgID)
summary.CallsImported++
}
}
return nil
}
-func (i *Importer) importSMS(sourceID int64, sms SMS) error {
+func (i *Importer) importSMS(sourceID int64, sms SMS) (int64, error) {
remoteID, err := i.participantID(sms.Address, sms.ContactName.String)
if err != nil {
- return err
+ return 0, err
}
ownerID, err := i.participantID(i.opts.OwnerPhone, "Me")
if err != nil {
- return err
+ return 0, err
}
// Drafts are owner-authored messages that never made it out, but
// they still belong on the owner's side of the conversation. Without
@@ -144,36 +149,36 @@ func (i *Importer) importSMS(sourceID int64, sms SMS) error {
}
convID, err := i.ensureConversation(sourceID, textConversationKey([]int64{ownerID, remoteID}), sms.ContactName.String)
if err != nil {
- return err
+ return 0, err
}
if err := i.store.EnsureConversationParticipant(convID, remoteID, "member"); err != nil {
- return err
+ return 0, err
}
if err := i.store.EnsureConversationParticipant(convID, ownerID, "member"); err != nil {
- return err
+ return 0, err
}
msgID := stableID("sms", sms.Address, sms.Timestamp.String(), fmt.Sprint(sms.Type), sms.Body)
return i.upsertTextMessage(sourceID, convID, msgID, "sms", senderID, recipientIDs, fromMe, sms.Timestamp, sms.Body, sms.Body, 0, sms)
}
-func (i *Importer) importMMS(sourceID int64, mms MMS) (int, error) {
+func (i *Importer) importMMS(sourceID int64, mms MMS) (int64, int, error) {
ownerID, err := i.participantID(i.opts.OwnerPhone, "Me")
if err != nil {
- return 0, err
+ return 0, 0, err
}
participantIDs, senderID, recipientIDs, err := i.mmsParticipants(mms, ownerID)
if err != nil {
- return 0, err
+ return 0, 0, err
}
// Drafts belong to the owner — see the matching note in importSMS.
fromMe := mms.MessageBox == MMSBoxSent || mms.MessageBox == MMSBoxOutbox || mms.MessageBox == MMSBoxDraft
convID, err := i.ensureConversation(sourceID, textConversationKey(participantIDs), mms.ContactName.String)
if err != nil {
- return 0, err
+ return 0, 0, err
}
for _, participantID := range participantIDs {
if err := i.store.EnsureConversationParticipant(convID, participantID, "member"); err != nil {
- return 0, err
+ return 0, 0, err
}
}
body := mmsText(mms)
@@ -183,21 +188,26 @@ func (i *Importer) importMMS(sourceID int64, mms MMS) (int, error) {
}
msgID := stableID("mms", srcIDPart, mms.Timestamp.String(), sortedKey(participantIDs))
attachmentCount := countImportableAttachments(mms, i.opts.IncludeAttachments)
- if err := i.upsertTextMessage(sourceID, convID, msgID, "mms", senderID, recipientIDs, fromMe, mms.Timestamp, body, mms.Subject.String, attachmentCount, mms); err != nil {
- return 0, err
+ messageID, err := i.upsertTextMessage(sourceID, convID, msgID, "mms", senderID, recipientIDs, fromMe, mms.Timestamp, body, mms.Subject.String, attachmentCount, mms)
+ if err != nil {
+ return 0, 0, err
}
- return i.importMMSAttachments(sourceID, msgID, mms)
+ imported, err := i.importMMSAttachments(sourceID, msgID, mms)
+ if err != nil {
+ return 0, 0, err
+ }
+ return messageID, imported, nil
}
-func (i *Importer) importCall(sourceID int64, call Call) error {
+func (i *Importer) importCall(sourceID int64, call Call) (int64, error) {
remoteAddress := callParticipantAddress(call)
remoteID, err := i.participantID(remoteAddress, call.ContactName.String)
if err != nil {
- return err
+ return 0, err
}
ownerID, err := i.participantID(i.opts.OwnerPhone, "Me")
if err != nil {
- return err
+ return 0, err
}
fromMe := call.Type == CallOutgoing
senderID := remoteID
@@ -208,20 +218,20 @@ func (i *Importer) importCall(sourceID int64, call Call) error {
}
convID, err := i.ensureConversation(sourceID, "calls:"+canonicalAddress(remoteAddress), call.ContactName.String)
if err != nil {
- return err
+ return 0, err
}
if err := i.store.EnsureConversationParticipant(convID, remoteID, "member"); err != nil {
- return err
+ return 0, err
}
if err := i.store.EnsureConversationParticipant(convID, ownerID, "member"); err != nil {
- return err
+ return 0, err
}
body := fmt.Sprintf("Call %s, %d seconds", callTypeLabel(call.Type), call.DurationSeconds)
msgID := stableID("call", remoteAddress, call.Timestamp.String(), fmt.Sprint(call.Type), strconv.Itoa(call.DurationSeconds))
return i.upsertTextMessage(sourceID, convID, msgID, "synctech_sms_call", senderID, recipientIDs, fromMe, call.Timestamp, body, body, 0, call)
}
-func (i *Importer) upsertTextMessage(sourceID, convID int64, sourceMessageID, messageType string, senderID int64, recipientIDs []int64, fromMe bool, sentAt time.Time, body, subject string, attachmentCount int, raw any) error {
+func (i *Importer) upsertTextMessage(sourceID, convID int64, sourceMessageID, messageType string, senderID int64, recipientIDs []int64, fromMe bool, sentAt time.Time, body, subject string, attachmentCount int, raw any) (int64, error) {
msgID, err := i.store.UpsertMessage(&store.Message{
ConversationID: convID,
SourceID: sourceID,
@@ -237,33 +247,33 @@ func (i *Importer) upsertTextMessage(sourceID, convID int64, sourceMessageID, me
AttachmentCount: attachmentCount,
})
if err != nil {
- return fmt.Errorf("upsert message: %w", err)
+ return 0, fmt.Errorf("upsert message: %w", err)
}
if body != "" {
if err := i.store.UpsertMessageBody(msgID, sql.NullString{String: body, Valid: true}, sql.NullString{}); err != nil {
- return fmt.Errorf("upsert body: %w", err)
+ return 0, fmt.Errorf("upsert body: %w", err)
}
}
rawJSON, err := json.Marshal(raw)
if err != nil {
- return fmt.Errorf("marshal raw record: %w", err)
+ return 0, fmt.Errorf("marshal raw record: %w", err)
}
if err := i.store.UpsertMessageRawWithFormat(msgID, rawJSON, RawFormat); err != nil {
- return fmt.Errorf("upsert raw record: %w", err)
+ return 0, fmt.Errorf("upsert raw record: %w", err)
}
if err := i.store.ReplaceMessageRecipients(msgID, "from", []int64{senderID}, []string{""}); err != nil {
- return fmt.Errorf("replace from recipient: %w", err)
+ return 0, fmt.Errorf("replace from recipient: %w", err)
}
if err := i.store.ReplaceMessageRecipients(msgID, "to", recipientIDs, blankNames(len(recipientIDs))); err != nil {
- return fmt.Errorf("replace to recipient: %w", err)
+ return 0, fmt.Errorf("replace to recipient: %w", err)
}
if err := i.store.UpsertFTS(msgID, subject, body, "", "", ""); err != nil {
- // FTS is an index, not data: a failure to populate it must never abort
- // the import. Warn and continue, matching the other UpsertFTS callers
- // (sync.go, importer/ingest.go, fbmessenger, whatsapp). [C2]
- slog.Warn("failed to upsert FTS", "message", msgID, "error", err)
+ slog.Warn("synctech-sms: failed to upsert FTS; continuing",
+ "message_id", msgID,
+ "error", err,
+ )
}
- return nil
+ return msgID, nil
}
func (i *Importer) participantID(address, displayName string) (int64, error) {
diff --git a/internal/synctechsms/importer_test.go b/internal/synctechsms/importer_test.go
index 769ebefe9..a3994b8fe 100644
--- a/internal/synctechsms/importer_test.go
+++ b/internal/synctechsms/importer_test.go
@@ -7,6 +7,7 @@ import (
assertpkg "github.com/stretchr/testify/assert"
requirepkg "github.com/stretchr/testify/require"
"go.kenn.io/msgvault/internal/store"
+ "go.kenn.io/msgvault/internal/testutil"
"go.kenn.io/msgvault/internal/testutil/storetest"
)
@@ -67,6 +68,28 @@ func TestImporterRejectsMissingOwnerPhone(t *testing.T) {
requirepkg.Error(t, err, "ImportPath returned nil error")
}
+func TestImporterContinuesWhenFTSUpsertFails(t *testing.T) {
+ testutil.SkipIfPostgres(t, "drops SQLite's messages_fts virtual table; PostgreSQL FTS is a messages.search_fts column")
+ f := storetest.New(t)
+ if f.Store.FTS5Available() {
+ _, err := f.Store.DB().Exec(`DROP TABLE messages_fts`)
+ requirepkg.NoError(t, err, "drop FTS table")
+ }
+ dir := t.TempDir()
+ writeFile(t, filepath.Join(dir, "messages.xml"), `
+
+`)
+
+ imp := NewImporter(f.Store, ImportOptions{
+ OwnerPhone: "+15550000001",
+ IncludeSMS: true,
+ })
+ summary, err := imp.ImportPath(dir)
+ requirepkg.NoError(t, err, "ImportPath")
+ assertpkg.Equal(t, 1, summary.SMSImported, "summary = %#v", summary)
+ assertMessageCount(t, f.Store, 1)
+}
+
func TestImporterImportsCallWithBlankNumber(t *testing.T) {
f := storetest.New(t)
dir := t.TempDir()
diff --git a/internal/synctechsms/types.go b/internal/synctechsms/types.go
index c2d5d8eb3..1ac427444 100644
--- a/internal/synctechsms/types.go
+++ b/internal/synctechsms/types.go
@@ -160,4 +160,5 @@ type ImportSummary struct {
MMSImported int
CallsImported int
AttachmentsImported int
+ MessageIDs []int64
}
diff --git a/internal/testutil/dbtest/dbtest.go b/internal/testutil/dbtest/dbtest.go
index 04d3f911e..f2d454c34 100644
--- a/internal/testutil/dbtest/dbtest.go
+++ b/internal/testutil/dbtest/dbtest.go
@@ -292,6 +292,7 @@ type MessageOpts struct {
BccIDs []int64 // participant IDs for 'bcc' recipients
SourceID int64 // defaults to 1 if 0
ConversationID int64 // defaults to 1 if 0
+ MessageType string // defaults to "email" if empty
}
// AddMessage inserts a message with its from/to/cc recipients and returns the message ID.
@@ -334,9 +335,14 @@ func (tdb *TestDB) AddMessage(opts MessageOpts) int64 {
"AddMessage: SourceID %d does not match conversation %d source_id %d", srcID, convID, convSourceID)
}
+ messageType := opts.MessageType
+ if messageType == "" {
+ messageType = "email"
+ }
+
_, err := tdb.DB.Exec(
- `INSERT INTO messages (id, conversation_id, source_id, source_message_id, message_type, sent_at, subject, snippet, size_estimate, has_attachments) VALUES (?, ?, ?, ?, 'email', ?, ?, 'test', ?, ?)`,
- id, convID, srcID, sourceMessageID, sentAt, opts.Subject, size, opts.HasAttachments,
+ `INSERT INTO messages (id, conversation_id, source_id, source_message_id, message_type, sent_at, subject, snippet, size_estimate, has_attachments) VALUES (?, ?, ?, ?, ?, ?, ?, 'test', ?, ?)`,
+ id, convID, srcID, sourceMessageID, messageType, sentAt, opts.Subject, size, opts.HasAttachments,
)
require.NoError(tdb.T, err, "AddMessage")
diff --git a/internal/vector/build_scope.go b/internal/vector/build_scope.go
new file mode 100644
index 000000000..6b53cdbb1
--- /dev/null
+++ b/internal/vector/build_scope.go
@@ -0,0 +1,65 @@
+package vector
+
+import (
+ "slices"
+ "sort"
+ "strings"
+)
+
+// BuildScope limits which messages are eligible for an embedding
+// generation. A zero-value scope means the full corpus.
+type BuildScope struct {
+ MessageTypes []string
+}
+
+// NewBuildScope returns a normalized, stable scope. Message types are
+// lowercase, trimmed, de-duplicated, and sorted so fingerprints and SQL
+// bindings are deterministic.
+func NewBuildScope(messageTypes []string) BuildScope {
+ seen := make(map[string]struct{}, len(messageTypes))
+ out := make([]string, 0, len(messageTypes))
+ for _, typ := range messageTypes {
+ typ = strings.TrimSpace(strings.ToLower(typ))
+ if typ == "" {
+ continue
+ }
+ if _, ok := seen[typ]; ok {
+ continue
+ }
+ seen[typ] = struct{}{}
+ out = append(out, typ)
+ }
+ sort.Strings(out)
+ return BuildScope{MessageTypes: out}
+}
+
+func (s BuildScope) IsEmpty() bool {
+ return len(s.MessageTypes) == 0
+}
+
+func (s BuildScope) Fingerprint() string {
+ if s.IsEmpty() {
+ return ""
+ }
+ return "mt-" + strings.Join(s.MessageTypes, ",")
+}
+
+func (s BuildScope) ContainsMessageType(messageType string) bool {
+ messageType = strings.TrimSpace(strings.ToLower(messageType))
+ return slices.Contains(s.MessageTypes, messageType)
+}
+
+func (s BuildScope) AllowsMessageTypes(messageTypes []string) bool {
+ if s.IsEmpty() {
+ return true
+ }
+ if len(messageTypes) == 0 {
+ return false
+ }
+ for _, typ := range messageTypes {
+ if !s.ContainsMessageType(typ) {
+ return false
+ }
+ }
+ return true
+}
diff --git a/internal/vector/config.go b/internal/vector/config.go
index 5dd3ebd85..23ca84483 100644
--- a/internal/vector/config.go
+++ b/internal/vector/config.go
@@ -213,7 +213,8 @@ type EmbedConfig struct {
// stragglers from repair-encoding resets, transient errors, or crashes)
// in addition to the per-tick incremental scan. Zero uses the EmbedJob
// default (24h); a negative value disables the auto-backstop.
- BackstopInterval time.Duration `toml:"backstop_interval"`
+ BackstopInterval time.Duration `toml:"backstop_interval"`
+ Scope EmbedScopeConfig `toml:"scope"`
}
// EmbedScheduleConfig controls when the embed worker runs on its own
@@ -223,6 +224,16 @@ type EmbedScheduleConfig struct {
RunAfterSync bool `toml:"run_after_sync"` // trigger after each successful sync
}
+// EmbedScopeConfig limits which messages are included in newly-built
+// embedding generations. The zero value means the full corpus.
+type EmbedScopeConfig struct {
+ MessageTypes []string `toml:"message_types"`
+}
+
+func (s EmbedScopeConfig) BuildScope() BuildScope {
+ return NewBuildScope(s.MessageTypes)
+}
+
// Fingerprint returns the ":" identifier for the
// embedding endpoint half of the policy (§6.7 of the spec). Use
// Config.GenerationFingerprint when storing or comparing index
@@ -256,8 +267,12 @@ func (e EmbeddingsConfig) Fingerprint() string {
// in, an active generation built under the old single-vector policy
// would silently accept new chunked entries from an upgraded worker.
func (c *Config) GenerationFingerprint() string {
- return fmt.Sprintf("%s:%s:c%d:e%d",
+ fp := fmt.Sprintf("%s:%s:c%d:e%d",
c.Embeddings.Fingerprint(), c.Preprocess.Fingerprint(), c.Embeddings.MaxInputChars, embedPolicyVersion)
+ if scopeFP := c.Embed.Scope.BuildScope().Fingerprint(); scopeFP != "" {
+ fp = fmt.Sprintf("%s:s%s", fp, scopeFP)
+ }
+ return fp
}
// Validate returns a descriptive error if the config is unusable.
diff --git a/internal/vector/config_test.go b/internal/vector/config_test.go
index cba341454..dd76eee40 100644
--- a/internal/vector/config_test.go
+++ b/internal/vector/config_test.go
@@ -433,3 +433,18 @@ func TestConfig_GenerationFingerprint_IncludesEmbedPolicyVersion(t *testing.T) {
suffix := fmt.Sprintf(":e%d", embedPolicyVersion)
assertpkg.True(t, strings.HasSuffix(got, suffix), "GenerationFingerprint() = %q, want suffix %q", got, suffix)
}
+
+func TestConfig_GenerationFingerprint_IncludesEmbedScope(t *testing.T) {
+ base := Config{
+ Embeddings: EmbeddingsConfig{Model: "m", Dimension: 8, MaxInputChars: 6000},
+ }
+ baseline := base.GenerationFingerprint()
+
+ scoped := base
+ scoped.Embed.Scope.MessageTypes = []string{"MMS", "sms", "sms"}
+
+ assertpkg.NotEqual(t, baseline, scoped.GenerationFingerprint(),
+ "GenerationFingerprint should change when embedding is scoped")
+ assertpkg.Contains(t, scoped.GenerationFingerprint(), ":smt-mms,sms",
+ "scope fingerprint should normalize and sort message types")
+}
diff --git a/internal/vector/embed/worker.go b/internal/vector/embed/worker.go
index 7bb0943e3..52dcc8d8c 100644
--- a/internal/vector/embed/worker.go
+++ b/internal/vector/embed/worker.go
@@ -48,6 +48,10 @@ type WorkStore interface {
SetEmbedGenIfUnchanged(ctx context.Context, items []store.EmbedGenStamp, target int64) (missed []int64, err error)
}
+type scopedWorkStore interface {
+ ScanForEmbeddingWithMessageTypes(ctx context.Context, target int64, afterID int64, limit int, messageTypes []string) ([]int64, error)
+}
+
// WorkerDeps bundles the collaborators a Worker needs. Backend, VectorsDB,
// MainDB, Store, and Client are required; the remaining fields have
// sensible defaults when zero: BatchSize defaults to 32,
@@ -61,7 +65,9 @@ type WorkerDeps struct {
// embedBatch's body-fetch query.
MainDB *sql.DB
// Store finds work and stamps coverage against MainDB. Required.
- Store WorkStore
+ Store WorkStore
+ BuildScope vector.BuildScope
+
Client EmbeddingClient
Preprocess PreprocessConfig
MaxInputChars int
@@ -309,7 +315,7 @@ func (w *Worker) run(ctx context.Context, gen vector.GenerationID, backstop bool
return res, fmt.Errorf("RunOnce: %w", err)
}
batchStart := time.Now()
- ids, err := w.deps.Store.ScanForEmbedding(ctx, int64(gen), afterID, w.deps.BatchSize)
+ ids, err := w.scanForEmbedding(ctx, int64(gen), afterID)
if err != nil {
return res, fmt.Errorf("scan for embedding: %w", err)
}
@@ -552,6 +558,18 @@ func (w *Worker) run(ctx context.Context, gen vector.GenerationID, backstop bool
}
}
+func (w *Worker) scanForEmbedding(ctx context.Context, target int64, afterID int64) ([]int64, error) {
+ scope := vector.NewBuildScope(w.deps.BuildScope.MessageTypes)
+ if scope.IsEmpty() {
+ return w.deps.Store.ScanForEmbedding(ctx, target, afterID, w.deps.BatchSize)
+ }
+ scoped, ok := w.deps.Store.(scopedWorkStore)
+ if !ok {
+ return nil, errors.New("scoped embedding build requires a store that supports message_type scans")
+ }
+ return scoped.ScanForEmbeddingWithMessageTypes(ctx, target, afterID, w.deps.BatchSize, scope.MessageTypes)
+}
+
// advanceWatermark persists the per-gen forward-scan cursor to id after a
// batch made forward progress. The backstop never persists (it scans from
// 0 by design and must not push the optimistic watermark backward or
diff --git a/internal/vector/errors.go b/internal/vector/errors.go
index b4d5a5ecf..03a7ba3e0 100644
--- a/internal/vector/errors.go
+++ b/internal/vector/errors.go
@@ -64,4 +64,10 @@ var (
// "transient backend slow" response so clients can retry instead
// of treating it as a permanent failure.
ErrEmbeddingTimeout = errors.New("embedding request timed out")
+
+ // ErrIndexScopeMismatch is returned when a scoped embedding index
+ // is used without an equivalent structured filter. For example, an
+ // index built only for message_type=sms must not answer an unscoped
+ // vector query over email + SMS.
+ ErrIndexScopeMismatch = errors.New("index scope mismatch")
)
diff --git a/internal/vector/hybrid/engine.go b/internal/vector/hybrid/engine.go
index f264c1ca1..7fe973075 100644
--- a/internal/vector/hybrid/engine.go
+++ b/internal/vector/hybrid/engine.go
@@ -63,7 +63,8 @@ type Config struct {
// participant/label lookup SQL that BuildFilter runs against mainDB.
// Pass PostgreSQLDialect.Rebind on PG (pgx rejects bare ?); leave nil
// (or SQLiteDialect.Rebind, which is identity) on SQLite.
- Rebind func(string) string
+ Rebind func(string) string
+ BuildScope vector.BuildScope
}
// Engine orchestrates the generation check, query embedding, and fusion
@@ -111,6 +112,9 @@ func (e *Engine) Search(ctx context.Context, req SearchRequest) ([]vector.FusedH
if err != nil {
return nil, ResultMeta{}, err
}
+ if err := e.validateBuildScope(req.Filter); err != nil {
+ return nil, ResultMeta{}, err
+ }
if req.FreeText == "" {
return nil, ResultMeta{}, errors.New("empty query")
@@ -192,6 +196,31 @@ func (e *Engine) Search(ctx context.Context, req SearchRequest) ([]vector.FusedH
}, nil
}
+func (e *Engine) validateBuildScope(filter vector.Filter) error {
+ return ValidateBuildScope(e.cfg.BuildScope, filter)
+}
+
+// ValidateBuildScope rejects filters that cannot safely answer from a scoped
+// embedding index. A non-empty build scope only covers those message types, so
+// callers must make the query scope explicit and compatible before running ANN.
+func ValidateBuildScope(buildScope vector.BuildScope, filter vector.Filter) error {
+ scope := vector.NewBuildScope(buildScope.MessageTypes)
+ if scope.IsEmpty() {
+ return nil
+ }
+ if len(filter.MessageTypes) == 0 {
+ return fmt.Errorf("%w: index is scoped to message_type=%s; add a matching message_type filter",
+ vector.ErrIndexScopeMismatch, strings.Join(scope.MessageTypes, ","))
+ }
+ if !scope.AllowsMessageTypes(filter.MessageTypes) {
+ return fmt.Errorf("%w: index is scoped to message_type=%s, query requested message_type=%s",
+ vector.ErrIndexScopeMismatch,
+ strings.Join(scope.MessageTypes, ","),
+ strings.Join(vector.NewBuildScope(filter.MessageTypes).MessageTypes, ","))
+ }
+ return nil
+}
+
// vectorHitsToFused wraps pure-vector hits in the FusedHit schema.
// BM25Score and RRFScore are both set to math.NaN(): "not present in
// this signal." Pure vector mode never applies Reciprocal Rank Fusion
diff --git a/internal/vector/hybrid/engine_test.go b/internal/vector/hybrid/engine_test.go
index 3eaef6da3..8e424c184 100644
--- a/internal/vector/hybrid/engine_test.go
+++ b/internal/vector/hybrid/engine_test.go
@@ -165,6 +165,35 @@ func TestEngine_Hybrid_HappyPath(t *testing.T) {
assert.Equal(len(results), meta.ReturnedCount)
}
+func TestEngine_ScopedIndexRequiresMatchingMessageTypeFilter(t *testing.T) {
+ ctx := context.Background()
+ f := newEngineFixture(t)
+ f.Engine.cfg.BuildScope = vector.NewBuildScope([]string{"sms", "mms"})
+
+ _, _, err := f.Engine.Search(ctx, SearchRequest{
+ Mode: ModeVector,
+ FreeText: "lunch",
+ Limit: 5,
+ })
+ requirepkg.ErrorIs(t, err, vector.ErrIndexScopeMismatch)
+
+ _, _, err = f.Engine.Search(ctx, SearchRequest{
+ Mode: ModeVector,
+ FreeText: "lunch",
+ Limit: 5,
+ Filter: vector.Filter{MessageTypes: []string{"email"}},
+ })
+ requirepkg.ErrorIs(t, err, vector.ErrIndexScopeMismatch)
+
+ _, _, err = f.Engine.Search(ctx, SearchRequest{
+ Mode: ModeVector,
+ FreeText: "lunch",
+ Limit: 5,
+ Filter: vector.Filter{MessageTypes: []string{"sms"}},
+ })
+ requirepkg.NoError(t, err)
+}
+
// TestFTSTerms covers the FreeText → dialect-neutral term-slice
// tokenizer directly (no DB needed). FreeText is split on whitespace
// and terms the FTS5/tsquery tokenizers would drop entirely
diff --git a/internal/vector/pgvector/backend.go b/internal/vector/pgvector/backend.go
index fb0782e62..3f7ff07c7 100644
--- a/internal/vector/pgvector/backend.go
+++ b/internal/vector/pgvector/backend.go
@@ -39,6 +39,9 @@ type Options struct {
// per-dimension HNSW index on first migration. Optional; if zero
// the index is created on first CreateGeneration.
Dimension int
+ // BuildScope limits generation coverage to matching messages. Empty
+ // means the full corpus.
+ BuildScope vector.BuildScope
// SkipMigrate suppresses the privileged CREATE EXTENSION + full
// migrate. A WRITABLE open still applies the (extension-less) schema so
// the one-time upgrade lands — read-only-ness is now signalled by
@@ -74,7 +77,8 @@ type Options struct {
// with the pgvector extension. The same *sql.DB also serves the main
// msgvault schema (messages, message_recipients, message_labels).
type Backend struct {
- db *sql.DB
+ db *sql.DB
+ scope vector.BuildScope
}
// Open verifies the database is reachable, applies the embedding schema
@@ -85,7 +89,10 @@ func Open(ctx context.Context, opts Options) (*Backend, error) {
if opts.DB == nil {
return nil, errors.New("pgvector.Open: Options.DB is required")
}
- b := &Backend{db: opts.DB}
+ b := &Backend{
+ db: opts.DB,
+ scope: vector.NewBuildScope(opts.BuildScope.MessageTypes),
+ }
if !opts.SkipMigrate {
// serve / build / search: full migrate incl. CREATE EXTENSION (the
// extension step is gated by SkipExtension for managed PG). The eager
@@ -247,17 +254,23 @@ func isUniqueViolation(err error) bool {
return pgErr.Code == "23505"
}
-// missingForGenExistsClause is the coverage gate predicate: a generation
-// is fully covered when no live message still needs embedding for it
-// (embed_gen IS NULL OR embed_gen <> gen). Built once and reused by
-// ActivateGeneration (in-tx, single-DB on PG) and Stats. The $N ordinal
-// of the generation id is supplied by the caller.
-func missingForGenExistsClause(genArg string) string {
- return fmt.Sprintf(`EXISTS (
- SELECT 1 FROM messages
- WHERE (embed_gen IS NULL OR embed_gen <> %s)
- AND %s
- )`, genArg, store.LiveMessagesWhere("", true))
+// missingWhere is the coverage predicate: a generation is fully covered
+// when no scoped live message still needs embedding for it
+// (embed_gen IS NULL OR embed_gen <> gen). The genArg and scopeArg
+// parameters are PostgreSQL placeholders supplied by the caller.
+func (b *Backend) missingWhere(genArg, scopeArg string) string {
+ where := fmt.Sprintf("(embed_gen IS NULL OR embed_gen <> %s) AND %s",
+ genArg, store.LiveMessagesWhere("", true))
+ if !b.scope.IsEmpty() {
+ where += fmt.Sprintf(" AND message_type = ANY(%s::text[])", scopeArg)
+ }
+ return where
+}
+
+// missingForGenExistsClause wraps missingWhere as an EXISTS clause for
+// activation gating and detailed activation errors.
+func (b *Backend) missingForGenExistsClause(genArg, scopeArg string) string {
+ return "EXISTS (SELECT 1 FROM messages WHERE " + b.missingWhere(genArg, scopeArg) + ")"
}
// ActivateGeneration atomically retires the current active generation
@@ -321,18 +334,22 @@ func (b *Backend) ActivateGeneration(ctx context.Context, gen vector.GenerationI
// phase, which scan-and-fill no longer has, so a legacy/crashed gen with
// seeded_at=NULL but full coverage must be activatable. Coverage
// (missing==0) is the real gate.
+ activateArgs := []any{now, now, int64(gen), force}
+ if !b.scope.IsEmpty() {
+ activateArgs = append(activateArgs, textArray(b.scope.MessageTypes))
+ }
res, err := tx.ExecContext(ctx,
`UPDATE index_generations
SET state = 'active', activated_at = $1, completed_at = COALESCE(completed_at, $2)
WHERE id = $3 AND state = 'building'
- AND ($4 OR NOT `+missingForGenExistsClause("$3")+`)`,
- now, now, int64(gen), force)
+ AND ($4 OR NOT `+b.missingForGenExistsClause("$3", "$5")+`)`,
+ activateArgs...)
if err != nil {
return fmt.Errorf("activate: %w", err)
}
n, _ := res.RowsAffected()
if n == 0 {
- return activateGateError(ctx, tx, gen, force)
+ return b.activateGateError(ctx, tx, gen, force)
}
if err := tx.Commit(); err != nil {
return fmt.Errorf("commit activate generation %d: %w", gen, err)
@@ -346,7 +363,7 @@ func (b *Backend) ActivateGeneration(ctx context.Context, gen vector.GenerationI
// also satisfies the coverage predicate (embed_gen <> gen is true for an
// unknown gen id), so checking coverage first would surface the misleading
// "messages needing embedding" error instead of the real lifecycle reason.
-func activateGateError(ctx context.Context, tx *sql.Tx, gen vector.GenerationID, force bool) error {
+func (b *Backend) activateGateError(ctx context.Context, tx *sql.Tx, gen vector.GenerationID, force bool) error {
var state vector.GenerationState
if err := tx.QueryRowContext(ctx,
`SELECT state FROM index_generations WHERE id = $1`, int64(gen)).Scan(&state); err != nil {
@@ -361,8 +378,12 @@ func activateGateError(ctx context.Context, tx *sql.Tx, gen vector.GenerationID,
// Gen exists and is building, so the only remaining reason the gated
// promote affected zero rows is the coverage term.
var missing bool
+ missingArgs := []any{int64(gen)}
+ if !b.scope.IsEmpty() {
+ missingArgs = append(missingArgs, textArray(b.scope.MessageTypes))
+ }
if err := tx.QueryRowContext(ctx,
- `SELECT `+missingForGenExistsClause("$1"), int64(gen)).Scan(&missing); err != nil {
+ `SELECT `+b.missingForGenExistsClause("$1", "$2"), missingArgs...).Scan(&missing); err != nil {
return fmt.Errorf("check coverage for generation %d: %w", gen, err)
}
if missing && !force {
@@ -1185,11 +1206,13 @@ func (b *Backend) Stats(ctx context.Context, gen vector.GenerationID) (vector.St
// generation, so it reports 0 — the StatsView consumer sums per-gen
// pending across the active/building generations anyway.
if gen != 0 {
+ missingArgs := []any{int64(gen)}
+ if !b.scope.IsEmpty() {
+ missingArgs = append(missingArgs, textArray(b.scope.MessageTypes))
+ }
if err := b.db.QueryRowContext(ctx,
- `SELECT COUNT(*) FROM messages
- WHERE (embed_gen IS NULL OR embed_gen <> $1)
- AND `+store.LiveMessagesWhere("", true),
- int64(gen)).Scan(&s.PendingCount); err != nil {
+ `SELECT COUNT(*) FROM messages WHERE `+b.missingWhere("$1", "$2"),
+ missingArgs...).Scan(&s.PendingCount); err != nil {
return s, fmt.Errorf("count missing: %w", err)
}
}
@@ -1227,14 +1250,18 @@ func (b *Backend) Stats(ctx context.Context, gen vector.GenerationID) (vector.St
// store.LiveMessagesWhere's predicate.
func (b *Backend) EmbeddedMessageCount(ctx context.Context, gen vector.GenerationID) (int64, error) {
var n int64
- if err := b.db.QueryRowContext(ctx,
- `SELECT COUNT(DISTINCT e.message_id)
- FROM embeddings e
- JOIN messages m ON m.id = e.message_id
- WHERE e.generation_id = $1
- AND m.embed_gen = $1
- AND `+store.LiveMessagesWhere("m", true),
- int64(gen)).Scan(&n); err != nil {
+ q := `SELECT COUNT(DISTINCT e.message_id)
+ FROM embeddings e
+ JOIN messages m ON m.id = e.message_id
+ WHERE e.generation_id = $1
+ AND m.embed_gen = $1
+ AND ` + store.LiveMessagesWhere("m", true)
+ args := []any{int64(gen)}
+ if !b.scope.IsEmpty() {
+ q += ` AND m.message_type = ANY($2::text[])`
+ args = append(args, textArray(b.scope.MessageTypes))
+ }
+ if err := b.db.QueryRowContext(ctx, q, args...).Scan(&n); err != nil {
return 0, fmt.Errorf("count embedded messages: %w", err)
}
return n, nil
diff --git a/internal/vector/pgvector/backend_filter_test.go b/internal/vector/pgvector/backend_filter_test.go
index dfebe2056..d9d68de98 100644
--- a/internal/vector/pgvector/backend_filter_test.go
+++ b/internal/vector/pgvector/backend_filter_test.go
@@ -58,6 +58,11 @@ func TestBackendSearchStructuredFilters(t *testing.T) {
filter vector.Filter
want []int64
}{
+ {
+ name: "message type",
+ filter: vector.Filter{MessageTypes: []string{"sms"}},
+ want: []int64{2},
+ },
{
name: "sender group",
filter: vector.Filter{SenderGroups: [][]int64{{100}}},
diff --git a/internal/vector/pgvector/backend_testhelpers_test.go b/internal/vector/pgvector/backend_testhelpers_test.go
index d6fcec92a..8613b03e4 100644
--- a/internal/vector/pgvector/backend_testhelpers_test.go
+++ b/internal/vector/pgvector/backend_testhelpers_test.go
@@ -92,10 +92,10 @@ func testSetupPGSchema(t *testing.T, db *sql.DB) {
_, err := db.Exec(`
CREATE TABLE messages (
id BIGINT PRIMARY KEY,
+ message_type TEXT NOT NULL DEFAULT 'email',
source_id BIGINT,
sender_id BIGINT,
subject TEXT,
- message_type TEXT NOT NULL DEFAULT 'email',
has_attachments BOOLEAN DEFAULT FALSE,
size_estimate BIGINT,
sent_at TIMESTAMPTZ,
diff --git a/internal/vector/pgvector/ext_stub.go b/internal/vector/pgvector/ext_stub.go
index f82e31309..05c0980ca 100644
--- a/internal/vector/pgvector/ext_stub.go
+++ b/internal/vector/pgvector/ext_stub.go
@@ -27,6 +27,7 @@ type Options struct {
SkipMigrate bool
ReadOnly bool
SkipExtension bool
+ BuildScope vector.BuildScope
}
// Backend is a placeholder type so non-pgvector builds can compile
diff --git a/internal/vector/pgvector/filter_test.go b/internal/vector/pgvector/filter_test.go
new file mode 100644
index 000000000..5589c4170
--- /dev/null
+++ b/internal/vector/pgvector/filter_test.go
@@ -0,0 +1,27 @@
+//go:build pgvector
+
+package pgvector
+
+import (
+ "fmt"
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+ "go.kenn.io/msgvault/internal/vector"
+)
+
+func TestBuildPGFilterClausesMessageTypes(t *testing.T) {
+ var args []any
+ bind := func(v any) string {
+ args = append(args, v)
+ return fmt.Sprintf("$%d", len(args))
+ }
+
+ clauses := buildPGFilterClauses(vector.Filter{MessageTypes: []string{"sms", "mms"}}, bind)
+
+ require.Len(t, clauses, 1)
+ assert.Equal(t, "m.message_type = ANY($1::text[])", clauses[0])
+ require.Len(t, args, 1)
+ assert.Equal(t, `{"sms","mms"}`, args[0])
+}
diff --git a/internal/vector/pgvector/parity_test.go b/internal/vector/pgvector/parity_test.go
index 532f7372c..37aca933a 100644
--- a/internal/vector/pgvector/parity_test.go
+++ b/internal/vector/pgvector/parity_test.go
@@ -69,6 +69,7 @@ func buildSqlitevecParity(t *testing.T, corpus []parityDoc) (*sqlitevec.Backend,
CREATE TABLE messages (
id INTEGER PRIMARY KEY,
subject TEXT,
+ message_type TEXT NOT NULL DEFAULT 'email',
source_id INTEGER,
sender_id INTEGER,
has_attachments INTEGER DEFAULT 0,
diff --git a/internal/vector/sqlitevec/backend.go b/internal/vector/sqlitevec/backend.go
index a38c19f05..316cdc11f 100644
--- a/internal/vector/sqlitevec/backend.go
+++ b/internal/vector/sqlitevec/backend.go
@@ -28,10 +28,11 @@ var _ vector.Backend = (*Backend)(nil)
// Options configures how Open establishes a Backend.
type Options struct {
- Path string
- MainPath string // filesystem path to msgvault.db; required for FusedSearch
- Dimension int // default dimension for EnsureVectorTable at open
- MainDB *sql.DB // handle to the main msgvault.db
+ Path string
+ MainPath string // filesystem path to msgvault.db; required for FusedSearch
+ Dimension int // default dimension for EnsureVectorTable at open
+ MainDB *sql.DB // handle to the main msgvault.db
+ BuildScope vector.BuildScope // empty means full corpus
// ReadOnly indicates the main DB handle (MainDB) was opened read-only
// — e.g. the MCP server's store.OpenReadOnly (_query_only=true). When
// set, Open SKIPS BackfillEmbedGenForUpgrade, which would otherwise
@@ -54,6 +55,7 @@ type Backend struct {
// one-time upgrade backfill self-guards on it so it never writes
// through the read-only main handle. See Options.ReadOnly.
readOnly bool
+ scope vector.BuildScope
}
// Open opens vectors.db, runs migrations, and retains the main database
@@ -77,6 +79,7 @@ func Open(ctx context.Context, opts Options) (*Backend, error) {
mainPath: opts.MainPath,
dim: opts.Dimension,
readOnly: opts.ReadOnly,
+ scope: vector.NewBuildScope(opts.BuildScope.MessageTypes),
}
// Orphaned-stamp reset (vectors.db-recreate safety): clear embed_gen for
// any message whose stamp points to a generation id that no longer exists
@@ -264,6 +267,21 @@ func isUniqueConstraintErr(err error) bool {
sqliteErr.ExtendedCode == sqlite3.ErrConstraintPrimaryKey
}
+func scopedMissingWhere(gen int64, messageTypes []string) (string, []any) {
+ where := `(embed_gen IS NULL OR embed_gen <> ?) AND ` + store.LiveMessagesWhere("", true)
+ args := []any{gen}
+ scope := vector.NewBuildScope(messageTypes)
+ if !scope.IsEmpty() {
+ placeholders := make([]string, len(scope.MessageTypes))
+ for i, typ := range scope.MessageTypes {
+ placeholders[i] = "?"
+ args = append(args, typ)
+ }
+ where += " AND message_type IN (" + strings.Join(placeholders, ",") + ")"
+ }
+ return where, args
+}
+
// hasMissingForGen reports whether any live message in the main DB still
// needs embedding for gen (embed_gen IS NULL OR embed_gen <> gen). This is
// the scan-and-fill coverage gate. On SQLite the messages live in the main
@@ -273,13 +291,10 @@ func isUniqueConstraintErr(err error) bool {
// intentionally-non-atomic scheduler gate). The full-scan backstop covers
// any TOCTOU window between this read and the flip.
func (b *Backend) hasMissingForGen(ctx context.Context, gen vector.GenerationID) (bool, error) {
+ where, args := scopedMissingWhere(int64(gen), b.scope.MessageTypes)
var exists int
err := b.mainDB.QueryRowContext(ctx,
- `SELECT EXISTS (
- SELECT 1 FROM messages
- WHERE (embed_gen IS NULL OR embed_gen <> ?)
- AND `+store.LiveMessagesWhere("", true)+`
- )`, int64(gen)).Scan(&exists)
+ `SELECT EXISTS (SELECT 1 FROM messages WHERE `+where+`)`, args...).Scan(&exists)
if err != nil {
return false, fmt.Errorf("check missing coverage for generation %d: %w", gen, err)
}
@@ -1364,11 +1379,10 @@ func (b *Backend) Stats(ctx context.Context, gen vector.GenerationID) (vector.St
// (e.g. management commands that open the backend without the main
// handle) reports 0 rather than failing Stats.
if gen != 0 && b.mainDB != nil {
+ where, missingArgs := scopedMissingWhere(int64(gen), b.scope.MessageTypes)
if err := b.mainDB.QueryRowContext(ctx,
- `SELECT COUNT(*) FROM messages
- WHERE (embed_gen IS NULL OR embed_gen <> ?)
- AND `+store.LiveMessagesWhere("", true),
- int64(gen)).Scan(&s.PendingCount); err != nil {
+ `SELECT COUNT(*) FROM messages WHERE `+where,
+ missingArgs...).Scan(&s.PendingCount); err != nil {
return s, fmt.Errorf("count missing: %w", err)
}
}
@@ -1439,13 +1453,22 @@ func (b *Backend) EmbeddedMessageCount(ctx context.Context, gen vector.Generatio
if err != nil {
return 0, fmt.Errorf("encode embedded ids: %w", err)
}
+ where := `id IN (SELECT value FROM json_each(?))
+ AND embed_gen = ?
+ AND ` + store.LiveMessagesWhere("", true)
+ args := []any{string(blob), int64(gen)}
+ if !b.scope.IsEmpty() {
+ placeholders := make([]string, len(b.scope.MessageTypes))
+ for i, typ := range b.scope.MessageTypes {
+ placeholders[i] = "?"
+ args = append(args, typ)
+ }
+ where += " AND message_type IN (" + strings.Join(placeholders, ",") + ")"
+ }
var n int64
if err := b.mainDB.QueryRowContext(ctx,
- `SELECT COUNT(*) FROM messages
- WHERE id IN (SELECT value FROM json_each(?))
- AND embed_gen = ?
- AND `+store.LiveMessagesWhere("", true),
- string(blob), int64(gen)).Scan(&n); err != nil {
+ `SELECT COUNT(*) FROM messages WHERE `+where,
+ args...).Scan(&n); err != nil {
return 0, fmt.Errorf("count live embedded messages: %w", err)
}
return n, nil
diff --git a/internal/vector/sqlitevec/backend_test.go b/internal/vector/sqlitevec/backend_test.go
index 697fd6d14..16045852a 100644
--- a/internal/vector/sqlitevec/backend_test.go
+++ b/internal/vector/sqlitevec/backend_test.go
@@ -292,6 +292,57 @@ func TestBackend_ActivateGeneration_NullSeededAtActivatesWithCoverage(t *testing
assertpkg.Equal(t, vector.GenerationActive, genStateSV(t, b, gen), "now active")
}
+func TestBackend_CreateGeneration_ScopeLimitsCoverage(t *testing.T) {
+ ctx := context.Background()
+ main, err := sql.Open("sqlite3", ":memory:")
+ requirepkg.NoError(t, err, "open main")
+ t.Cleanup(func() { _ = main.Close() })
+ _, err = main.Exec(`CREATE TABLE messages (
+ id INTEGER PRIMARY KEY,
+ message_type TEXT NOT NULL,
+ embed_gen INTEGER,
+ deleted_at DATETIME,
+ deleted_from_source_at DATETIME
+ )`)
+ requirepkg.NoError(t, err, "create messages")
+ _, err = main.Exec(`
+ INSERT INTO messages (id, message_type, deleted_from_source_at) VALUES
+ (1, 'email', NULL),
+ (2, 'sms', NULL),
+ (3, 'mms', NULL),
+ (4, 'sms', CURRENT_TIMESTAMP)`)
+ requirepkg.NoError(t, err, "insert messages")
+
+ b, err := Open(ctx, Options{
+ Path: filepath.Join(t.TempDir(), "vectors.db"),
+ Dimension: 768,
+ MainDB: main,
+ BuildScope: vector.NewBuildScope([]string{"sms", "mms"}),
+ })
+ requirepkg.NoError(t, err, "Open")
+ t.Cleanup(func() { _ = b.Close() })
+
+ gid, err := b.CreateGeneration(ctx, "m", 768, "")
+ requirepkg.NoError(t, err, "Create")
+ stats, err := b.Stats(ctx, gid)
+ requirepkg.NoError(t, err, "Stats")
+ assertpkg.Equal(t, int64(2), stats.PendingCount, "only scoped live messages should count as missing")
+
+ missing, err := b.hasMissingForGen(ctx, gid)
+ requirepkg.NoError(t, err, "hasMissingForGen")
+ assertpkg.True(t, missing, "scoped messages still need embeddings")
+
+ _, err = main.Exec(`UPDATE messages SET embed_gen = ? WHERE id IN (2, 3)`, int64(gid))
+ requirepkg.NoError(t, err, "stamp scoped messages")
+
+ stats, err = b.Stats(ctx, gid)
+ requirepkg.NoError(t, err, "Stats after stamp")
+ assertpkg.Equal(t, int64(0), stats.PendingCount, "unscoped email must not block scoped coverage")
+ missing, err = b.hasMissingForGen(ctx, gid)
+ requirepkg.NoError(t, err, "hasMissingForGen after stamp")
+ assertpkg.False(t, missing, "unscoped email must not block scoped activation")
+}
+
// TestBackend_CreateGeneration_ResumesBuilding confirms that calling
// CreateGeneration while a building row already exists with the same
// fingerprint returns the existing id instead of failing on the unique
diff --git a/internal/vector/sqlitevec/ext_stub.go b/internal/vector/sqlitevec/ext_stub.go
index 081a8398b..55b50a811 100644
--- a/internal/vector/sqlitevec/ext_stub.go
+++ b/internal/vector/sqlitevec/ext_stub.go
@@ -32,11 +32,12 @@ func Available() bool { return false }
// can reference sqlitevec.Options without a compile error; the struct is
// never populated at runtime when the PG code path is taken.
type Options struct {
- Path string
- MainPath string
- Dimension int
- MainDB *sql.DB
- ReadOnly bool
+ Path string
+ MainPath string
+ Dimension int
+ MainDB *sql.DB
+ BuildScope vector.BuildScope
+ ReadOnly bool
}
// Backend is the stub backend type for builds without sqlite_vec.