feat(whatsapp): comprehensive media recovery pipeline + anti-ban hardening by CyPack · Pull Request #11 · ownpilot/OwnPilot

CyPack · 2026-03-06T09:19:36Z

Summary

Comprehensive WhatsApp reliability improvements across multiple sessions:

Core Fixes (S28 — Latest)

type='append' silent drop fix: Previously ALL offline/reconnect messages were silently discarded. WhatsApp delivers missed messages via messages.upsert type='append' after reconnect — these now save to DB with offlineSync: true metadata, no AI response triggered.
Connection gap tracking: lastDisconnectedAt instance var logs reconnect gap ("Reconnected after Xs gap")
Helpers extracted: addToProcessedMsgIds() (FIFO cap eviction), parseMessageTimestamp() (null-safe)

Integration Test Evidence (S29)

docker logs: UPSERT EVENT received — type: append, count: 1
docker logs: Offline sync saved 1/1 messages to DB
DB: metadata.offlineSync = true, id = channel.whatsapp:3EB0AF2E7FCAE4B2CD19A3

Unit Tests (S29 — 17 scenarios)

New file: packages/gateway/src/channels/plugins/whatsapp/whatsapp-api.test.ts

B-series (7): core offline handler — DB save, processedMsgIds seeding, no AI response, metadata-only media, empty message skip
C-series (6): edge cases — empty batch, no pushName, group without participant, fromMe filter, self-chat pass-through, stub message skip
D-series (2): FIFO cap eviction at 5000 entries
E-series (2): reconnect dedup — append then same notify deduped

Media Recovery Pipeline (S21-S26)

retryMediaFromMetadata() — 15/15 SOR files recovered
Short-circuit optimization: 40-87s → 1.06s per file (60x faster, skip history sync when mediaKey stored)
enrichMediaMetadata() — fixes ON CONFLICT DO NOTHING silent drop of mediaKey updates
POST /channels/:id/recover-media — throttled bulk recovery endpoint
POST /channels/:id/batch-retry-media — sequential batch with safety caps

Anti-Ban & Connection Hardening (S4, S22)

cleanupSocket() + isReconnecting guard prevents zombie connections and concurrent reconnects
440 (connectionReplaced) special handling: 10s base delay prevents immediate reconnect loop
30s timeout on updateMediaMessage (Baileys has no built-in timeout)
LID → display name resolution via channel_users lookup

Test Results

11959 pass / 2 fail (pre-existing rate-limit.test.ts at lines 1357, unrelated to these changes)
TypeScript: 0 errors

Test Plan

Integration test: docker stop → send messages → docker start → verify type='append' events
DB verification: SELECT ... WHERE metadata->>'offlineSync' = 'true'
No AI response triggered for offline messages (handleIncomingMessage not called)
Unit tests: 17/17 pass
Full test suite: 11959/11961 pass (2 pre-existing failures)

🤖 Generated with Claude Code

- Add downloadMediaWithRetry() method with 410/404 retry logic - Update real-time handler to download image/video/document/sticker - Update history sync handler to download all attachment types - Use sock.updateMediaMessage for re-uploading expired media URLs

- Add ChannelMessageAttachmentInput type and serializeAttachments() helper that correctly converts Uint8Array/Buffer to base64 before JSON.stringify - Update createBatch() and create() to use serializeAttachments internally - Fix service-impl.ts to pass mimeType/filename/data through to DB - Fix whatsapp-api.ts to use ChannelMessageAttachmentInput[] type - Add GET /channels/messages/:messageId/media/:index endpoint Previously Uint8Array serialized as {"0":255,"1":216,...} and data was lost. Now stored as base64 string in JSONB attachments column. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Extract message-parser.ts for SOR/document content fallback (filename instead of [Attachment]) - Enrich metadata.document with filename, mimeType, size, hasMediaKey, hasUrl, hasDirectPath - Add retry-media endpoint for cold-cache recovery of missing attachments - Add SOR export endpoint for batch downloading attachment data - 129/129 tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ages Both history sync and real-time handlers now use the media descriptor's filename as content fallback before falling back to generic [Attachment]. This ensures SOR files show their actual filename (e.g. 2728JA_45_V1.SOR) in the content field. Also backfilled 72 existing rows and enriched metadata.document for 80 rows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Store actual base64-encoded mediaKey, CDN directPath, and URL in ParsedWhatsAppMessageMetadata instead of just boolean flags. This enables media re-upload requests for messages with expired CDN URLs. Also adds PROTO-DIAG logging for document messages during history sync to verify mediaKey presence from the WhatsApp protocol layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add retryMediaFromMetadata() to WhatsAppChannelAPI that reconstructs a WAMessage from DB-stored mediaKey/directPath/url and explicitly calls sock.updateMediaMessage() to request the sender's phone to re-upload expired media to CDN. This works around a Baileys 7.0.0-rc.9 bug where downloadMediaMessage checks error.status for 410/404, but Boom errors store it in output.statusCode — so automatic reuploadRequest never triggers. The retry-media endpoint now falls back to stored metadata when the in-memory cache miss + history sync repair path both fail. Verified: 5/5 SOR files from Dec 2025 successfully recovered (20-101KB). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ersinkoc · 2026-03-06T10:20:09Z

Thanks for the contribution! After reviewing against our current main branch, here's what we found:

Already merged in main

The bug fixes listed in this PR are already resolved in main:

440 reconnect fix — merged in 57785d9 (anti-ban, cleanupSocket(), consecutive 440 tracking, backoff)
getMessage callback — already implemented with message cache
History sync — passive (messaging-history.set) + on-demand (fetchMessageHistory) both implemented
Group/chat listing — merged in dae789f

These were part of v0.1.5 and the commits leading to v0.1.6.

New feature work (not yet in main)

The following are genuinely new additions that we don't have:

Feature	Description
`message-parser.ts`	Document metadata extraction (mediaKey, directPath, URL)
`downloadMediaWithRetry()`	Binary media download with retry + Baileys RC9 `updateMediaMessage` workaround
`retryMediaFromMetadata()`	Reconstruct WAMessage from stored metadata for expired CDN URLs
`serializeAttachments()`	Base64 binary attachment serialization for DB storage
`POST /retry-media`	Endpoint to trigger media re-download from stored metadata
`GET /media/:index`	Serve stored binary media
`historyAnchorByJid`	Anchor-based pagination for incremental history sync

Recommendation

This PR's branch is based on an older version of main (before the 440/anti-ban fixes were merged). The diff will have significant conflicts with the current code.

Could you please:

Rebase onto current main (which already has the 440, anti-ban, and history sync fixes)
Submit a clean feature PR with just the media download/storage/retry additions
Update the PR title to reflect the actual new work (e.g., feat(whatsapp): media binary storage, download retry, and CDN recovery)

This will make the review much cleaner. The media recovery feature (especially the Baileys RC9 workaround) looks useful — we'd like to review it as a focused changeset.

…apper - Short-circuit: skip history sync when stored mediaKey exists in DB, go directly to retryMediaFromMetadata(). Reduces per-file time from 40-87s to ~1s (60x improvement). - Batch endpoint: POST /channels/:id/batch-retry-media with configurable throttle (default 5s), max 50 messages per batch, sequential processing. - Timeout: 30s Promise.race on updateMediaMessage to prevent infinite hang when sender phone is offline (Baileys has no built-in timeout). Results: 31/35 media files recovered (4 NOT_FOUND — sender deleted). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

During WhatsApp history sync, messages are re-delivered with updated fields (mediaKey, directPath, url) that weren't present in the original delivery. However, createBatch() uses INSERT ... ON CONFLICT DO NOTHING, which silently drops these updates when the row already exists. This caused hundreds of messages to remain without downloadable media metadata despite the data being available. Fix: - Add enrichMediaMetadata() to merge mediaKey/directPath/url into existing rows after createBatch(), so re-delivered metadata is never lost - Call enrichment pass automatically in the history sync handler - Add getAttachmentsNeedingRecovery() with flexible filters (groupJid, date range, needsKey, needsData) and SQL LIMIT New endpoint: - POST /channels/:id/recover-media — targeted media recovery pipeline that queries DB, triggers sync, waits for enrichment, and batch downloads with throttle. Safety defaults: limit=20 (max 50), throttleMs=5000 (min 2000), syncWaitMs=8000 (max 30000), dryRun mode. Includes date validation and platformMessageId null guard. Docs: - WHATSAPP-GUIDE.md: add media recovery pipeline architecture, endpoint reference, safety guidelines, and known limitation documenting history sync chunk boundary edge case (~2% of date ranges may arrive without media metadata) - Remove personal data from guide examples (use placeholders) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hardcoded WhatsApp group JID in test files with a generic placeholder to avoid leaking real identifiers in the public repo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…, shared type - Extract WhatsAppDocumentMetadata interface from inline casts (4 files) - Add enrichMediaMetadataBatch() — CTE+VALUES single SQL replaces N+1 loop - Add media recovery concurrency guard (channelId-level lock with 5min TTL) Guards both recover-media and batch-retry-media endpoints (409 Conflict) - Fix parseJsonBody: remove broken Content-Type check (Hono c.req.json() ignores Content-Type; the 415 response was silently discarded) - Fix ui-auth login/password routes: replace ?? {} anti-pattern with proper null check so parse errors propagate correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

History sync messages stored WhatsApp Linked IDs (LIDs) as sender_name instead of human-readable names because pushName is often empty in history sync payloads. This made 6400+ messages show numeric IDs. - Add resolveDisplayName() helper with 10-min TTL cache - Lookup channel_users.display_name when pushName is missing or numeric - Apply to both history sync and real-time message handlers - DB batch fix already applied: UPDATE 6408 rows via LID→name join Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ntly dropping Root cause: messages.upsert handler had `if (upsert.type !== 'notify') return;` which silently dropped ALL offline/reconnect messages. WhatsApp delivers missed messages via type='append' after reconnect with full payload. Changes: - Handle type='append' via new handleOfflineMessages() method - Batch-collect + single createBatch() (ON CONFLICT DO NOTHING) - Metadata-only for media (no download — ban risk from burst CDN hits) - Serialized via historySyncQueue (prevents race with messaging-history.set) - SAFETY: Never emits MESSAGE_RECEIVED (no AI auto-reply for offline msgs) - Extract addToProcessedMsgIds() helper (DRY: notify, history, offline) - Extract parseMessageTimestamp() helper (number/BigInt/Long) - Add lastDisconnectedAt tracking with reconnection gap logging Research: 10 specialist agents analyzed Baileys source, Evolution API patterns, dedup safety, event bus chain, media ban risk, and DB schema. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…/C/D/E series) Covers the offline sync path (type='append') introduced in 735bbfc: - B-series (7): core behavior — DB save, processedMsgIds seeding, no AI response, metadata-only media, empty message skip - C-series (6): edge cases — empty batch, no pushName, group without participant, fromMe filtering, self-chat pass-through, stub message skip - D-series (2): FIFO cap eviction at 5000 entries - E-series (2): reconnect dedup — append then notify for same message deduped Fix: vi.fn(function() {...}) required for constructor mock in vitest 4.x (vi.fn().mockImplementation(() => ({})) arrow function causes silent constructor failure) Integration test evidence (S29): docker logs: 'UPSERT EVENT received — type: append, count: 1' docker logs: 'Offline sync saved 1/1 messages to DB' DB: metadata.offlineSync = true, no duplicate rows Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- CREATE TABLE sor_queue with status/message_id/channel_id/filename cols - UNIQUE(message_id) constraint for idempotent enqueue - enqueue_sor_message() PG trigger fires AFTER INSERT on channel_messages - Filters: direction=inbound, content ILIKE %.sor, attachments not empty, attachments[0].data present, channel jid=120363423491841999@g.us - COALESCE(attachments, '[]'::jsonb) safe null handling - ON CONFLICT DO NOTHING for replay safety - Indexes: idx_sor_queue_status, idx_sor_queue_created_at Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… config The FTTH group JID is stored in channel_messages.metadata->>'jid', not in channels.config->>'jid'. Updated enqueue_sor_message() trigger to use: COALESCE(NEW.metadata, '{}')::jsonb->>'jid' = '120363423491841999@g.us' Verified with docker exec psql — trigger correctly enqueues matching rows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…load audit log P0 — Bug fixes (data loss prevention): - Trigger: AFTER INSERT OR UPDATE OF attachments — history sync SORs no longer lost - Trigger fn: guard for UPDATE when binary already existed (OLD.data IS NOT NULL) - Stale processing cleanup: rows stuck >10min reset to pending on each cron run - retry_count column + SELECT includes error rows with retry_count < 3 - All mark_error paths: retry_count = retry_count + 1 P1 — Observability: - content_hash TEXT column on sor_queue (SHA-256, populated after b64decode) - Dedup: same binary with status=done skips re-upload, logged as 'skipped' - sor_upload_log table: full audit history per upload attempt (outcome, content_hash, content_size, opdracht_id, error_message) - _log_attempt() helper called on success, failure, and duplicate skip - Indexes: idx_sor_queue_content_hash (partial), idx_sor_upload_log_* MIGRATIONS_SQL: all changes idempotent (IF NOT EXISTS guards) Note: pre-existing CLI typecheck errors unrelated to this change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ng, PG LISTEN/NOTIFY processing_started_at: - New TIMESTAMP column on sor_queue — set when status→processing - Stale cleanup uses COALESCE(processing_started_at, created_at) for accuracy JID → DB config (no more hardcoded trigger): - Trigger reads COALESCE(NULLIF(current_setting('app.sor_jid', TRUE), ''), fallback) - ALTER DATABASE ownpilot SET "app.sor_jid" = '...' persists across connections - MIGRATIONS_SQL sets default value; override without redeployment PG LISTEN/NOTIFY: - Trigger: PERFORM pg_notify('sor_new_file', NEW.id) on every sor_queue INSERT - Voorinfra MCP: daemon thread (sor-notify-listener) starts at server startup - Thread: psycopg2 LISTEN + select.select(5s timeout) + asyncio.run(process_queue) - Latency: 60s (cron) → <100ms (event-driven) for new SOR arrivals - Cron remains as fallback for reliability Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…point - ChannelMessageAttachment + ChannelMessageAttachmentInput: add local_path?: string - serializeAttachments(): pass through local_path from input to stored JSONB - WhatsAppAPI: add writeSorToDisk() — writes .sor files to /app/data/sor-files/{messageId}.sor - WhatsAppAPI: inject disk write at both download call sites (history sync + live message) - GET /api/v1/sor-files/:messageId — auth-protected file download endpoint (streams from disk) - base64 JSONB fallback preserved (no breakage for old messages) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…story/offline sync ON CONFLICT DO NOTHING skips existing rows — local_path was never persisted for re-delivered SOR messages. After createBatch, iterate rows with local_path set and call updateAttachments() so disk path is reachable from DB. Applied to both history sync and offline sync paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

historyAnchorByJid in-memory cache tracks NEWEST message per chat (needed for dedup/upsert). fetchMessageHistory requires OLDEST known message as anchor to page backward in time — using newest yields empty ON_DEMAND batches (WhatsApp treats history as already synced). Fix: fetchGroupHistory now always loads from DB via loadHistoryAnchorFromDatabase (getOldestByChat ASC) instead of checking in-memory cache first. Result: /groups/:jid/sync now returns 50 messages instead of 0, correctly paging backward from the oldest known DB message. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add contacts.upsert and contacts.update event handlers to whatsapp-api.ts - Add contacts sync to messaging-history.set handler (passive sync) - Add syncContactsToDb() private method with name resolution hierarchy (name > notify > verifiedName > phone fallback) - Add upsertByExternal() to ContactsRepository (ON CONFLICT upsert) - Add unique constraint (external_id, external_source) to contacts table - Skip group JIDs and status broadcasts in contact sync - Retroactive migration: 41 contacts populated from existing messages Resolves: 190 individual chats showing "(bilinmiyor)" for contact names. Evolution API + WAHA reference patterns applied. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Devil's Advocate finding: contact.id can be LID format (e.g. 179203@lid) which produces garbage phone numbers. Now prefers contact.phoneNumber (real phone JID) and skips contacts with only LID identifiers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@lid

…ution, history sync mapping 1. Early return fix: PUSH_NAME type (0 messages, 822 contacts) was skipped entirely because `messages.length === 0` returned early before contacts sync block. Now: skip only when BOTH messages AND contacts are empty. 2. LID phoneNumber resolution (WAHA pattern): contacts with @lid JIDs may have a `phoneNumber` field containing the real @s.whatsapp.net JID. Use phoneNumber when available, skip only pure LID contacts with no real phone. 3. History sync contacts: removed redundant filter/map that dropped contacts without name/notify before passing to syncContactsToDb. The method already handles fallback naming (phone number as name). Before: contacts.upsert 585 OK, history sync 0/582, PUSH_NAME skipped entirely After: contacts.upsert 585 OK, history sync 476/582, PUSH_NAME 620/822 — total 970 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

1. messages.upsert handler: extract pushName from incoming messages and sync to contacts DB. Uses softName mode to avoid overwriting phonebook names with pushName. 2. upsertByExternal softName option: when true, only updates name if existing name is just a phone number (regex ^[0-9+]+$ or name=phone). Preserves higher-quality phonebook names from contacts.upsert. 3. syncContactsToDb: accepts options.softName parameter, forwarded to upsertByExternal. Before: individual chat names showed phone numbers only After: pushName from messages enriches contacts with display names Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Root cause: container rebuild lost /app/data/sor-files/ (no named volume). 18 phantom files: local_path in DB, disk gone, data=NULL → PG trigger never fired. Recovery system documented: - retryMediaFromMetadata(): explicit sock.updateMediaMessage() (Baileys RC9 bug workaround) - POST /api/v1/channels/:id/recover-media: batch endpoint - Auth bypass: Authorization: Bearer bypass (AUTH_TYPE=none + ui_password_hash conflict) - mediaKey MUST be Uint8Array not Buffer - sor_queue ON CONFLICT needs manual reset after recovery Results 2026-03-08: 181/210 SOR binary recovered (86.2%). 29 permanently lost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

CyPack and others added 6 commits March 4, 2026 22:37

CyPack and others added 2 commits March 6, 2026 11:24

CyPack changed the title ~~feat(whatsapp): media recovery via re-upload + history sync hardening~~ feat(whatsapp): comprehensive media recovery pipeline + anti-ban hardening Mar 6, 2026

CyPack and others added 17 commits March 6, 2026 13:13

chore: replace real group JID with placeholder in test fixtures

206c091

Replace hardcoded WhatsApp group JID in test files with a generic placeholder to avoid leaking real identifiers in the public repo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(whatsapp): comprehensive media recovery pipeline + anti-ban hardening#11

feat(whatsapp): comprehensive media recovery pipeline + anti-ban hardening#11
CyPack wants to merge 25 commits intoownpilot:mainfrom
CyPack:fix/whatsapp-440-reconnect-loop

CyPack commented Mar 6, 2026 •

edited

Loading

Uh oh!

ersinkoc commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CyPack commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core Fixes (S28 — Latest)

Integration Test Evidence (S29)

Unit Tests (S29 — 17 scenarios)

Media Recovery Pipeline (S21-S26)

Anti-Ban & Connection Hardening (S4, S22)

Test Results

Test Plan

Uh oh!

ersinkoc commented Mar 6, 2026

Already merged in main

New feature work (not yet in main)

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CyPack commented Mar 6, 2026 •

edited

Loading