Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 33 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,12 @@ Archive a lifetime of email. Analytics and search in milliseconds, entirely offl

Your messages are yours. Decades of correspondence, attachments, and history shouldn't be locked behind a web interface or an API. msgvault downloads a complete local copy and then everything runs offline. Search, analytics, and the MCP server all work against local data with no network access required.

Currently supports Gmail and IMAP sync, plus offline imports from MBOX exports and Apple Mail (.emlx) directories.
Currently supports Gmail, Google Calendar, and IMAP sync, plus offline imports from MBOX exports and Apple Mail (.emlx) directories.

## Features

- **Full Gmail backup**: raw MIME, attachments, labels, and metadata
- **Google Calendar sync**: archive events, organizers, and attendees; searchable alongside email
- **IMAP sync**: archive mail from any standard IMAP server
- **MBOX / Apple Mail import**: import email from MBOX exports or Apple Mail (.emlx) directories
- **Interactive TUI**: drill-down analytics over your entire message history, powered by DuckDB over Parquet — connects to a remote `msgvault serve` instance or runs locally
Expand Down Expand Up @@ -87,6 +88,8 @@ msgvault tui
| `add-account EMAIL` | Authorize a Gmail account (use `--headless` for servers) or add an IMAP account |
| `sync-full EMAIL` | Full sync (`--limit N`, `--after`/`--before` for date ranges) |
| `sync EMAIL` | Sync only new/changed messages |
| `add-calendar EMAIL` | Authorize read-only Google Calendar access and register calendars |
| `sync-calendar NAME\|EMAIL` | Sync Google Calendar events (full first run, then incremental) |
| `tui` | Launch the interactive TUI (`--account` to filter, `--local` to force local) |
| `search QUERY` | Search messages (`--account` to filter, `--json` for machine output) |
| `show-message ID` | View full message details (`--json` for machine output) |
Expand Down Expand Up @@ -118,6 +121,8 @@ A separate MCP tool, `find_similar_messages`, returns nearest neighbors for a se

> **Run only one embedding process at a time.** Don't run `msgvault embeddings build`/`resume` or `repair-encoding` concurrently with a `msgvault serve` daemon — they write the same embedding state, and concurrent writers are not coordinated across processes.

Large archives can scope an embedding generation with `[vector.embed.scope] message_types = ["sms", "mms"]`. Scoped vector and hybrid searches must include a matching `message_type` filter so a partial index is never used as if it covered the whole archive.

## Importing from MBOX or Apple Mail

Import email from providers that offer MBOX exports or from a local Apple Mail data directory:
Expand Down Expand Up @@ -173,6 +178,27 @@ msgvault import-synctech-sms --owner-phone +15550000001 ~/Downloads/sms-backup.z

SMS and MMS messages appear in text-message search. Call logs are imported as searchable call records with `message_type = synctech_sms_call`, so missed and outgoing calls do not mix into normal text threads.

### Google Calendar

Archive your calendars alongside email. Events become searchable (full-text and, when vector search is enabled, semantic) and join the same contact graph as your email, so organizers and attendees dedupe with the people you email.

```bash
# Authorize read-only Calendar access and register your calendars.
# If the account already has Gmail access, the consent screen asks for
# Gmail + Calendar together — keep BOTH checked so Gmail access is kept.
msgvault add-calendar you@gmail.com

# First run does a full sync; later runs are incremental.
msgvault sync-calendar you@gmail.com
msgvault sync-calendar you@gmail.com --full # force a full re-sync
msgvault sync-calendar you@gmail.com --all-calendars # include subscribed/holiday calendars

# Find events
msgvault search "standup" --message-type calendar_event
```

By default only calendars you own or can write to are synced (add `--all-calendars` for subscribed and holiday calendars). Calendar sync is read-only and never modifies your Google Calendar. Cancelled events are kept (marked cancelled), not deleted, so your archive preserves that a meeting once existed. The Calendar API must be enabled on your Google Cloud OAuth project.

Msgvault stores Google OAuth refresh tokens under the Msgvault home directory with file permissions restricted to the current user. Tokens and client secrets are not written into `config.toml`, logs, README examples, or exported fixtures.

## Configuration
Expand Down Expand Up @@ -225,7 +251,7 @@ Workspace admins can use a Google service account with domain-wide delegation in
service_account_key = "/secure/path/service-account.json"
```

In Google Admin Console, authorize the service account client for `https://www.googleapis.com/auth/gmail.readonly` and `https://www.googleapis.com/auth/gmail.modify`. If you will run `delete-staged` with permanent deletion, also authorize `https://mail.google.com/`. Keep the key file owner-only, for example `chmod 600 /secure/path/service-account.json`.
In Google Admin Console, authorize the service account client for `https://www.googleapis.com/auth/gmail.readonly` and `https://www.googleapis.com/auth/gmail.modify`. If you will archive Google Calendar, also authorize `https://www.googleapis.com/auth/calendar.readonly`. If you will run `delete-staged` with permanent deletion, also authorize `https://mail.google.com/`. Keep the key file owner-only, for example `chmod 600 /secure/path/service-account.json`.

```bash
msgvault add-account you@acme.com --oauth-app acme
Expand All @@ -252,6 +278,11 @@ email = "you@gmail.com"
schedule = "0 2 * * *" # 2am daily (cron)
enabled = true

[[gcal]] # scheduled Google Calendar sync
email = "you@gmail.com"
schedule = "0 */6 * * *" # every 6 hours
enabled = true

[server]
api_port = 8080
bind_addr = "0.0.0.0"
Expand Down
38 changes: 19 additions & 19 deletions cmd/msgvault/cmd/add_synctech_sms_drive.go
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ func runConfiguredSynctechSMSSourceWithStore(ctx context.Context, st *store.Stor
}
_, err = synctechsms.NewImporter(st, opts).ImportPath(src.Path)
case "drive":
err = runSynctechSMSDriveSource(ctx, st, src, opts)
_, err = runSynctechSMSDriveSource(ctx, st, src, opts)
default:
return fmt.Errorf("unsupported synctech-sms backend %q", src.Backend)
}
Expand Down Expand Up @@ -167,28 +167,28 @@ func validateSynctechSMSDriveSource(src config.SynctechSMSSource) error {
return nil
}

func runSynctechSMSDriveSource(ctx context.Context, st *store.Store, src config.SynctechSMSSource, opts synctechsms.ImportOptions) error {
func runSynctechSMSDriveSource(ctx context.Context, st *store.Store, src config.SynctechSMSSource, opts synctechsms.ImportOptions) (synctechsms.ImportSummary, error) {
if err := validateSynctechSMSDriveSource(src); err != nil {
return err
return synctechsms.ImportSummary{}, err
}
client, err := newSynctechSMSDriveClient(ctx, src)
if err != nil {
return err
return synctechsms.ImportSummary{}, err
}
return runSynctechSMSDriveSourceWithClient(ctx, st, src, opts, client)
}

func runSynctechSMSDriveSourceWithClient(ctx context.Context, st *store.Store, src config.SynctechSMSSource, opts synctechsms.ImportOptions, client synctechsms.DriveClient) (retErr error) {
func runSynctechSMSDriveSourceWithClient(ctx context.Context, st *store.Store, src config.SynctechSMSSource, opts synctechsms.ImportOptions, client synctechsms.DriveClient) (summary synctechsms.ImportSummary, retErr error) {
if err := validateSynctechSMSDriveSource(src); err != nil {
return err
return summary, err
}
source, err := ensureConfiguredSynctechSMSSource(st, src, opts)
if err != nil {
return err
return summary, err
}
syncID, err := st.StartSync(source.ID, synctechsms.AdapterName)
if err != nil {
return fmt.Errorf("start sync: %w", err)
return summary, fmt.Errorf("start sync: %w", err)
}
completed := false
defer func() {
Expand All @@ -204,55 +204,55 @@ func runSynctechSMSDriveSourceWithClient(ctx context.Context, st *store.Store, s
}()
files, err := client.ListBackupFiles(ctx, src.FolderID)
if err != nil {
return fmt.Errorf("list Drive backup files: %w", err)
return summary, fmt.Errorf("list Drive backup files: %w", err)
}
imported, err := st.ListImportedSourceItemChecksums(source.ID, "drive")
if err != nil {
return fmt.Errorf("list imported Drive checksums: %w", err)
return summary, fmt.Errorf("list imported Drive checksums: %w", err)
}
stableAfter, err := time.ParseDuration(src.StableAfter)
if err != nil {
return fmt.Errorf("parse stable_after: %w", err)
return summary, fmt.Errorf("parse stable_after: %w", err)
}
selected := synctechsms.SelectStableDriveFiles(files, time.Now(), stableAfter, imported)
stagingDir := filepath.Join(cfg.Data.DataDir, "imports", "synctech-sms", src.Name)
if err := os.MkdirAll(stagingDir, 0o700); err != nil {
return fmt.Errorf("create staging directory: %w", err)
return summary, fmt.Errorf("create staging directory: %w", err)
}
imp := synctechsms.NewImporter(st, opts)
var summary synctechsms.ImportSummary
for _, file := range selected {
fileSummary, err := importOneDriveBackup(ctx, st, imp, client, source.ID, file, stagingDir)
if err != nil {
return err
return summary, err
}
summary.FilesSeen += fileSummary.FilesSeen
summary.FilesImported += fileSummary.FilesImported
summary.SMSImported += fileSummary.SMSImported
summary.MMSImported += fileSummary.MMSImported
summary.CallsImported += fileSummary.CallsImported
summary.AttachmentsImported += fileSummary.AttachmentsImported
summary.MessageIDs = append(summary.MessageIDs, fileSummary.MessageIDs...)
}
if summary.FilesImported > 0 {
if err := st.RecomputeConversationStats(source.ID); err != nil {
return fmt.Errorf("recompute conversation stats: %w", err)
return summary, fmt.Errorf("recompute conversation stats: %w", err)
}
}
totalRecords := int64(summary.SMSImported + summary.MMSImported + summary.CallsImported)
if err := st.UpdateSyncCheckpoint(syncID, &store.Checkpoint{
MessagesProcessed: totalRecords,
MessagesAdded: totalRecords,
}); err != nil {
return fmt.Errorf("update sync checkpoint: %w", err)
return summary, fmt.Errorf("update sync checkpoint: %w", err)
}
if err := st.TouchSourceLastSyncAt(source.ID); err != nil {
return fmt.Errorf("touch source last sync: %w", err)
return summary, fmt.Errorf("touch source last sync: %w", err)
}
if err := st.CompleteSync(syncID, ""); err != nil {
return fmt.Errorf("complete sync: %w", err)
return summary, fmt.Errorf("complete sync: %w", err)
}
completed = true
return nil
return summary, nil
}

func importOneDriveBackup(ctx context.Context, st *store.Store, imp *synctechsms.Importer, client synctechsms.DriveClient, sourceID int64, file synctechsms.DriveFile, stagingDir string) (synctechsms.ImportSummary, error) {
Expand Down
37 changes: 33 additions & 4 deletions cmd/msgvault/cmd/add_synctech_sms_drive_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,9 @@ func TestSynctechSMSDriveRunUsesSingleOuterSyncRun(t *testing.T) {
},
}

err := runSynctechSMSDriveSourceWithClient(context.Background(), f.Store, src, synctechImportOptions(src), client)
summary, err := runSynctechSMSDriveSourceWithClient(context.Background(), f.Store, src, synctechImportOptions(src), client)
require.NoError(err, "runSynctechSMSDriveSourceWithClient")
require.Len(summary.MessageIDs, 1, "summary message IDs")

source := getSynctechSource(t, f.Store, src.OwnerPhone)
assert.Equal(1, countSyncRuns(t, f.Store, source.ID), "sync run count")
Expand Down Expand Up @@ -113,7 +114,7 @@ func TestSynctechSMSDriveRunSetsUpIdentityAndPostSourceMigration(t *testing.T) {
src := synctechDriveTestSource()
client := fakeSynctechDriveClient{}

err = runSynctechSMSDriveSourceWithClient(context.Background(), st, src, synctechImportOptions(src), client)
_, err = runSynctechSMSDriveSourceWithClient(context.Background(), st, src, synctechImportOptions(src), client)
require.NoError(err, "runSynctechSMSDriveSourceWithClient")

synctechSource := getSynctechSource(t, st, src.OwnerPhone)
Expand Down Expand Up @@ -154,7 +155,7 @@ func TestSynctechSMSDriveRunRecordsZeroSelectedPoll(t *testing.T) {
}},
}

err := runSynctechSMSDriveSourceWithClient(context.Background(), f.Store, src, synctechImportOptions(src), client)
_, err := runSynctechSMSDriveSourceWithClient(context.Background(), f.Store, src, synctechImportOptions(src), client)
require.NoError(err, "runSynctechSMSDriveSourceWithClient")

source := getSynctechSource(t, f.Store, src.OwnerPhone)
Expand Down Expand Up @@ -188,7 +189,7 @@ func TestSynctechSMSDriveRunMarksOuterSyncFailedOnDownloadError(t *testing.T) {
downloadErr: downloadErr,
}

err := runSynctechSMSDriveSourceWithClient(context.Background(), f.Store, src, synctechImportOptions(src), client)
_, err := runSynctechSMSDriveSourceWithClient(context.Background(), f.Store, src, synctechImportOptions(src), client)
require.ErrorIs(err, downloadErr, "runSynctechSMSDriveSourceWithClient")

source := getSynctechSource(t, f.Store, src.OwnerPhone)
Expand All @@ -204,6 +205,34 @@ func TestSynctechSMSDriveRunMarksOuterSyncFailedOnDownloadError(t *testing.T) {
assert.Contains(item.ErrorMessage.String, downloadErr.Error(), "source import error")
}

func TestConfiguredSynctechSMSCompletesAfterImport(t *testing.T) {
require := requirepkg.New(t)
assert := assertpkg.New(t)
home := t.TempDir()
savedCfg := cfg
t.Cleanup(func() { cfg = savedCfg })
cfg = config.NewDefaultConfig()
cfg.HomeDir = home
cfg.Data.DataDir = home

f := storetest.New(t)
xmlPath := filepath.Join(home, "sms.xml")
require.NoError(os.WriteFile(xmlPath, []byte(`<smses count="1">
<sms address="+15551234567" date="1717214400000" type="1" body="hello from local" read="1" status="-1" contact_name="Alice" />
</smses>`), 0o600), "write sms fixture")
src := synctechDriveTestSource()
src.Backend = "local"
src.Path = xmlPath

err := runConfiguredSynctechSMSSourceWithStore(context.Background(), f.Store, src)

require.NoError(err, "configured synctech-sms import")
source := getSynctechSource(t, f.Store, src.OwnerPhone)
run := getOnlySyncRun(t, f.Store, source.ID)
assert.Equal(store.SyncStatusCompleted, run.Status, "sync status")
assertSourceMessageCount(t, f.Store, source.ID, 1)
}

type fakeSynctechDriveClient struct {
files []synctechsms.DriveFile
downloads map[string]string
Expand Down
Loading