Skip to content

fix: normalize importer message/reaction dates to UTC#399

Merged
wesm merged 3 commits into
kenn-io:mainfrom
njt:fix/sync-oldest-date-utc
Jun 22, 2026
Merged

fix: normalize importer message/reaction dates to UTC#399
wesm merged 3 commits into
kenn-io:mainfrom
njt:fix/sync-oldest-date-utc

Conversation

@njt

@njt njt commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

What

Several importers built time.Time values from epoch timestamps with time.Unix/time.UnixMilli but without .UTC(), leaving them in the runner's local zone — while the rest of each importer stores dates in UTC. Any code reading the calendar day (or the Parquet year partition) is then off by one in zones east of UTC.

Fixes:

  • internal/sync/sync.goprocessBatch oldest-message date (progress tracking).
  • internal/whatsapp/mapping.go — message SentAt.
  • internal/whatsapp/importer.go — reaction createdAt.

Why it matters

TestProcessBatch_OldestDatePropagation fails on any machine east of UTC (e.g. NZ): the fixture 2024-01-10T12:00:00Z reads back as Jan 11 local. The tests are correct; the production code was the bug. Adds TestMapMessageSentAtIsUTC (asserts the stored zone is UTC, machine-independent).

Possible later fixes (out of scope here)

The same time.Unix(...)-without-.UTC() pattern also appears in the embedding-generation status timestamps, but these are operator-facing status values round-tripped from unix-int columns (not message dates), so they don't affect partitioning/dedup/cross-system date semantics. Local-time display is arguably fine; normalizing them to UTC would be a consistency-only follow-up. Sites:

  • cmd/msgvault/cmd/embeddings_manage.goStartedAt, SeededAt, CompletedAt, ActivatedAt.
  • internal/vector/pgvector/backend.goStartedAt, CompletedAt, ActivatedAt.
  • internal/vector/sqlitevec/backend.goStartedAt, CompletedAt, ActivatedAt.

Left unchanged here to avoid churning working code on a style call; documented so a future pass can decide.

Scope

Independent of the Teams PR (#398) — branched from main, touches only internal/sync and internal/whatsapp.

processBatch tracked the oldest message date via time.UnixMilli without
.UTC(), leaving it in the local zone — so code reading its calendar day
(and TestProcessBatch_OldestDatePropagation) was off by one in zones east
of UTC. Match the .UTC() normalization already used for the stored
message date and parsed.Date.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UVTtwc4MNS8L4ztnJ87QMj
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Review Unavailable (7ac97e1)

The review agent repeatedly failed to run (likely an agent or configuration error). roborev will try again on the next commit.

Last error: agent: claude-code failed stream: stream errors: You've hit your session limit · resets 5:50am (UTC): exit status 1

Same class as the sync oldest-date bug: WhatsApp message SentAt
(mapping.go) and reaction createdAt (importer.go) were built with
time.Unix without .UTC(), leaving them in the local zone while every
other importer stores UTC. Off-by-one calendar day and wrong Parquet
year-partition near boundaries east of UTC. Adds TestMapMessageSentAtIsUTC.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UVTtwc4MNS8L4ztnJ87QMj
@njt njt changed the title fix(sync): normalize oldest-message date to UTC fix: normalize importer message/reaction dates to UTC Jun 20, 2026
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (fa5ba1a)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (claude-code/default, done, 1m15s), codex_security (claude-code/security, done, 28s) | Total: 1m43s

@wesm

wesm commented Jun 22, 2026

Copy link
Copy Markdown
Member

Great, thank you!

@wesm

wesm commented Jun 22, 2026

Copy link
Copy Markdown
Member

fixing CI

The WhatsApp UTC regression test used enough direct testify package calls to trip the repo's custom assertion-helper lint in CI, even though the behavior test itself passed. Use a local assert helper in that assertion-heavy test so the regression coverage stays in place while matching the enforced test style.

Generated with Codex (GPT-5)
Co-authored-by: Codex <codex@openai.com>
@roborev-ci

roborev-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown

roborev: Combined Review (bb9a7c0)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m6s), codex_security (codex/security, done, 21s) | Total: 2m27s

@wesm wesm merged commit 7a73f39 into kenn-io:main Jun 22, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants