fix(fbmessenger): don't collapse multiple attachments to one row#406
fix(fbmessenger): don't collapse multiple attachments to one row#406njt wants to merge 2 commits into
Conversation
…e row UpsertAttachment dedupes attachments with an empty content_hash to a single row per message (SELECT-then-insert on message_id). The fbmessenger importer stored an empty content_hash whenever an attachment file was missing or no attachments dir was configured, so a Messenger message with several photos whose files were absent recorded only ONE attachment row — the rest were silently dropped. Give hashless attachments a stable synthetic content_hash (sha256 of the export-relative URI, not file bytes) so siblings coexist and re-imports stay idempotent. storage_path stays empty, so no stored content is implied and the file-cleanup paths (which filter on non-empty storage_path) are unaffected. Audit of the other UpsertAttachment callers found no further instances: gmail and the generic ingest path always carry a real MIME content hash, synctechsms always hashes the part bytes, whatsapp stores at most one attachment per message, and teams already uses a synthetic link hash. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012Ri7QdvXXUMQPke9wrLjSS
roborev: Combined Review (
|
|
looking at this |
Messenger imports still need a stable per-attachment identity when files are missing, rejected, or skipped because no attachments directory is configured. Without that identity, the store's empty-hash fallback collapses multiple hashless attachments on the same message into one row. The previous fallback identity looked exactly like a SHA-256 content hash even though no bytes were stored. Prefix the synthetic key so JSON output and export flows cannot confuse it with content-addressed attachment data, while preserving idempotent re-import behavior. Validation: focused Messenger attachment tests were run red/green; the new assertions failed against bare 64-hex fallback keys before the importer change and passed after prefixing them. Generated with Codex (GPT-5) Co-authored-by: Codex <codex@openai.com>
roborev: Combined Review (
|
Facebook Messenger messages that carry multiple attachments without downloaded bytes (files missing from the DYI export, or no attachments dir configured) were recording only one attachment row — the rest were silently dropped.
UpsertAttachmentdedupes rows with an emptycontent_hashto one per message. These link/missing attachments now get a stable syntheticcontent_hashderived from the export-relative URI, so siblings coexist and re-imports stay idempotent.storage_pathstays empty, so no bytes are implied and file-cleanup paths are unaffected.🤖 Generated with Claude Code