Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
5fa3f02
feat(whatsapp): download media with retry on expired URLs
Mar 4, 2026
7cf8105
feat(whatsapp): fix media binary storage and add serve endpoint
Mar 4, 2026
1d58af8
feat(whatsapp): add message parser + document metadata enrichment
Mar 6, 2026
ddf9e95
fix(whatsapp): use filename instead of [Attachment] for document mess…
Mar 6, 2026
d50eccc
feat(whatsapp): persist mediaKey/directPath/url in message metadata
Mar 6, 2026
11cadff
feat(whatsapp): implement media re-upload from stored metadata
Mar 6, 2026
b59c45a
feat(whatsapp): short-circuit retry-media, batch endpoint, timeout wr…
Mar 6, 2026
1e57144
fix(whatsapp): recover media metadata lost by ON CONFLICT DO NOTHING
Mar 6, 2026
206c091
chore: replace real group JID with placeholder in test fixtures
Mar 6, 2026
a34399b
fix(whatsapp): batch enrichment, concurrency guard, parseJsonBody fix…
Mar 6, 2026
9c2beaf
fix(whatsapp): resolve LID sender names via channel_users lookup
Mar 6, 2026
735bbfc
fix(whatsapp): process offline messages (type=append) instead of sile…
Mar 6, 2026
d6c5a32
test(whatsapp): unit tests for handleOfflineMessages (17 scenarios, B…
Mar 6, 2026
1074ce2
feat(sor-pipeline): add sor_queue table + PG trigger to schema.ts
Mar 7, 2026
d9880e8
fix(sor-pipeline): correct JID filter — use metadata->jid not channel…
Mar 7, 2026
da8e72d
fix(sor-pipeline): P0+P1 — trigger fix, retry, content_hash dedup, up…
Mar 7, 2026
ea0f278
feat(sor-pipeline): P2 — processing_started_at, JID config, PII maski…
Mar 7, 2026
4cd022f
feat(sor-pipeline): S38 — SOR binary disk storage + HTTP download end…
Mar 7, 2026
52311f5
fix(sor-pipeline): S39 — update local_path for existing msgs after hi…
Mar 7, 2026
659dd2e
fix(whatsapp): use DB oldest anchor for on-demand history fetch
Mar 8, 2026
0f05660
feat(whatsapp): contact sync — Baileys contacts.upsert/update → DB
Mar 8, 2026
ef9a752
fix(whatsapp): use phoneNumber JID for LID contacts, skip pure LID
Mar 8, 2026
1c3e65a
fix(whatsapp): contact sync — early return bug, LID phoneNumber resol…
Mar 8, 2026
c6e7971
feat(whatsapp): pushName enrichment from messages + softName upsert
Mar 8, 2026
c21a30d
docs(whatsapp): SOR media recovery runbook + research notes
Mar 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
268 changes: 268 additions & 0 deletions RESEARCH-whatsapp-media-retry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
# WhatsApp Web Old Media Download Mechanism - Research

**Researched:** 2026-03-06
**Domain:** WhatsApp protocol, media encryption, multi-device sync
**Confidence:** HIGH (verified across whatsmeow Go source, Baileys JS source, Meta engineering blog, protobuf definitions)

## Executive Summary

WhatsApp Web CAN download old media that is no longer on CDN servers. The mechanism is called **Media Retry** (internally `MediaRetryNotification`). When a linked device clicks "Download" on an old message and gets a 404/410 from the CDN, it sends a **media retry receipt** to the primary phone, which **re-uploads** the media to the CDN with a **new directPath** (URL) but using the **same mediaKey**. The linked device then downloads from the new URL and decrypts with the original mediaKey.

**Critical insight:** The mediaKey is NOT re-generated. It stays the same. Only the CDN path changes. This means: if you have the mediaKey (from history sync), you can trigger a re-upload and download the media.

**Primary finding:** Baileys already has this mechanism built-in via `sock.updateMediaMessage()`, and OwnPilot already uses it via `reuploadRequest` in `downloadMediaWithRetry()`. However, the current implementation has a flaw -- it doesn't explicitly call `updateMediaMessage` before retrying, it just passes it as an option.

## How It Works: Complete Protocol Flow

### Step 1: Initial State
- Media message arrives during history sync
- Message contains: `mediaKey`, `directPath`, `url`, `fileEncSha256`, `fileSha256`, `fileLength`, `mimetype`
- The `directPath` points to WhatsApp CDN (`mmg.whatsapp.net`)
- CDN retains media for ~30 days (varies), then returns 404/410

### Step 2: Download Attempt Fails (404/410)
- Linked device tries `GET https://mmg.whatsapp.net/{directPath}`
- Server returns 404 (Not Found) or 410 (Gone)
- Media file has been purged from CDN

### Step 3: Media Retry Receipt (Key Mechanism)
The linked device sends an encrypted retry receipt to the primary phone:

```
Binary Node:
tag: "receipt"
attrs: { id: messageId, to: chatJID, type: "server-error" }
content: [
{ tag: "encrypt", content: [
{ tag: "enc_p", content: AES-GCM-encrypted(ServerErrorReceipt{stanzaId: messageId}) },
{ tag: "enc_iv", content: IV }
]}
]
```

**Encryption details:**
- Key derivation: HKDF-SHA256 from mediaKey with info string "WhatsApp Media Retry Notification"
- Cipher: AES-256-GCM
- The mediaKey itself is NOT sent -- both sides already have it

### Step 4: Phone Re-uploads Media
- Primary phone receives the retry receipt
- Phone looks up the media in its local storage (filesystem)
- Phone re-uploads the media to CDN (encrypted with SAME mediaKey)
- CDN returns a NEW directPath

### Step 5: MediaRetryNotification Response
Phone sends back an encrypted `MediaRetryNotification` protobuf:

```protobuf
message MediaRetryNotification {
optional string stanzaId = 1;
optional string directPath = 2;
optional Result result = 3;

enum Result {
GENERAL_ERROR = 0;
SUCCESS = 1;
NOT_FOUND = 2; // Media not on phone either!
DECRYPTION_ERROR = 3;
}
}
```

### Step 6: Download with New Path
- Linked device decrypts the notification using same HKDF-derived key
- If result == SUCCESS: use new `directPath` to download
- Decrypt downloaded file with ORIGINAL `mediaKey` (unchanged)
- File is recovered!

### Possible Failure: `NOT_FOUND` (result = 2)
- Phone no longer has the media file locally
- User deleted it, or phone was wiped
- **This is the ONLY case where recovery is truly impossible**

## Where Are Media Keys Stored?

| Location | What's Stored | Confidence |
|----------|--------------|------------|
| Phone local DB (msgstore.db) | mediaKey embedded in message metadata | HIGH |
| Linked device local DB (IndexedDB on Web) | mediaKey received during history sync | HIGH |
| WhatsApp CDN server | Encrypted media blob (NOT the key) | HIGH |
| WhatsApp routing server | Nothing - no keys, no media | HIGH |

**Key insight:** mediaKey is generated by the SENDER at send time, embedded in the message protobuf, and distributed to all devices via E2E encrypted channels. WhatsApp servers NEVER see the mediaKey.

## How History Sync Delivers Media Keys

1. When linked device connects (QR scan), primary phone bundles recent messages
2. Bundle includes FULL message protobufs WITH mediaKey, directPath, etc.
3. Bundle is E2E encrypted and transferred to linked device
4. Linked device stores messages locally (IndexedDB on Web, or in-memory/DB in Baileys)
5. **After this, mediaKey is available on the linked device permanently**

The media FILE is NOT transferred during history sync -- only metadata including mediaKey. The linked device must download the actual file from CDN using directPath. If CDN has purged it, media retry kicks in.

## Current OwnPilot Implementation Analysis

File: `packages/gateway/src/channels/plugins/whatsapp/whatsapp-api.ts` (lines 1501-1556)

### What Works
- `downloadMediaWithRetry()` passes `reuploadRequest: this.sock.updateMediaMessage` to `downloadMediaMessage()`
- Baileys internally handles the retry: if download fails with 404/410, it calls `updateMediaMessage` which sends the media retry receipt

### What's Broken / Suboptimal

1. **The retry logic re-calls `downloadMediaMessage` with the SAME options** -- it doesn't explicitly call `sock.updateMediaMessage(msg)` first to get the updated message. The `reuploadRequest` option should handle this internally, but there's a known bug in Baileys RC versions.

2. **Known Baileys bug:** Issue #507 reports "Download Media reupload not working" with error "Unsupported state or unable to authenticate data" -- this is an encryption/decryption error in the media retry receipt exchange.

3. **History sync messages arrive without url field** -- only `directPath` is present. The current code checks `hasUrl` but for history sync messages, url is often empty while directPath is set.

4. **The phone must be online** -- if the primary phone is off or disconnected, the media retry receipt can't be delivered, and the re-upload never happens. This is why history sync media often fails for users who linked a device and then turned off their phone.

## Baileys API for Media Retry

### Method 1: Automatic (via downloadMediaMessage)
```typescript
import { downloadMediaMessage } from '@whiskeysockets/baileys';

const buffer = await downloadMediaMessage(
msg, // WAMessage with mediaKey
'buffer',
{},
{
logger,
reuploadRequest: sock.updateMediaMessage // Baileys handles retry
}
);
```

### Method 2: Explicit (manual control)
```typescript
// 1. Try download
try {
const buffer = await downloadMediaMessage(msg, 'buffer', {});
} catch (err) {
if (err.message.includes('404') || err.message.includes('410')) {
// 2. Request re-upload explicitly
const updatedMsg = await sock.updateMediaMessage(msg);
// 3. Download with updated directPath
const buffer = await downloadMediaMessage(updatedMsg, 'buffer', {});
}
}
```

### Method 3: On-demand for stored messages (what we need)
```typescript
// For messages stored in DB with mediaKey but data=null:
// 1. Reconstruct WAMessage from DB fields
const waMessage = {
key: { remoteJid: chatJid, id: externalId, fromMe: false },
message: {
imageMessage: {
mediaKey: Buffer.from(storedMediaKey, 'base64'),
directPath: storedDirectPath,
url: storedUrl || '',
mimetype: storedMimeType,
fileEncSha256: storedFileEncSha256,
fileSha256: storedFileSha256,
fileLength: storedFileLength,
}
}
};

// 2. Try download (will fail with 404/410 for old media)
// 3. updateMediaMessage sends retry receipt to phone
// 4. Phone re-uploads, returns new directPath
// 5. Download succeeds
const buffer = await downloadMediaMessage(waMessage, 'buffer', {}, {
reuploadRequest: sock.updateMediaMessage
});
```

## Critical Requirements for Protocol-Based Recovery

| Requirement | Status in OwnPilot | Notes |
|-------------|-------------------|-------|
| mediaKey stored in DB | PARTIALLY | Stored in message content as base64 but not extracted separately |
| directPath stored in DB | NO | Not stored - only url field |
| fileEncSha256 stored | NO | Not stored |
| fileSha256 stored | NO | Not stored |
| Primary phone online | EXTERNAL | User must have phone connected |
| Primary phone has media locally | EXTERNAL | If user deleted media from phone, recovery impossible |
| Baileys sock connected | YES | OwnPilot maintains connection |

## Answers to Specific Questions

### a) WhatsApp Web'de eski mesajin "Download" butonuna tikladiginda ne oluyor?
1. Web client tries to download from CDN using stored `directPath`
2. If 404/410: sends `MediaRetryReceipt` to primary phone (encrypted with mediaKey-derived key)
3. Phone re-uploads media to CDN with new `directPath`
4. Phone sends `MediaRetryNotification` back with new `directPath`
5. Web client downloads from new URL, decrypts with original `mediaKey`

### b) mediaKey telefonda mi saklaniyor, yoksa WhatsApp server'da mi?
**Telefonda** (ve linked device'larda). WhatsApp server ASLA mediaKey'i gormez. mediaKey mesaj protobuf'unun icinde, E2E encrypted olarak iletilir ve her device'in local DB'sinde saklanir.

### c) Linked device mediaKey'i ilk sync'te mi aliyor, yoksa on-demand mi istiyor?
**Ilk sync'te** aliyor. History sync bundle'i tam mesaj protobuflari icerir -- mediaKey, directPath, fileEncSha256 dahil. On-demand istenen sey mediaKey degil, **yeni directPath** (re-upload sonrasi).

### d) Multi-device mimarisinde media sync nasil calisiyor?
- Client-fanout: Gonderici mesaji N device'a ayri ayri encrypt edip gonderiyor
- Her device kendi mediaKey kopyasini aliyor
- Media dosyasi CDN'de tek kopya (encrypted)
- CDN'den silindikten sonra: retry mekanizmasi ile phone re-upload yapiyor
- Linked device'lar birbirinden bagimsiz download edebilir

### e) requestPlaceholderResend() media key dondurur mu?
**Hayir.** `requestPlaceholderResend()` farkli bir mekanizma -- bu CTWA (Click-to-WhatsApp) ads icin. Mesaj kendisi placeholder olarak geldiginde (enc node olmadan), phone'dan mesajin tamamini istiyor. Media retry icin kullanilan mekanizma `sock.updateMediaMessage()` veya whatsmeow'daki `SendMediaRetryReceipt()`.

### f) MediaRetryNotification protobuf yapisi ne ise yariyor?
Phone'un media re-upload sonucunu linked device'a iletmesi icin. Icerir:
- `stanzaId`: Hangi mesajin medyasi
- `directPath`: CDN'deki yeni path (re-upload sonrasi)
- `result`: SUCCESS, NOT_FOUND (phone'da yok), GENERAL_ERROR, DECRYPTION_ERROR

## Implementation Recommendation for OwnPilot

### What Needs to Change

1. **DB schema**: Store `mediaKey`, `directPath`, `fileEncSha256`, `fileSha256`, `fileLength` alongside media data in channel_messages

2. **On-demand retry endpoint**: When a message has `data=null` but has `mediaKey`:
- Reconstruct WAMessage from stored metadata
- Call `downloadMediaMessage` with `reuploadRequest: sock.updateMediaMessage`
- If successful, update DB with binary data

3. **History sync handler**: Extract and store ALL media metadata fields, not just url

4. **Prerequisite**: Primary phone must be online and still have the media

### Expected Success Rate

| Scenario | Success Probability | Reason |
|----------|-------------------|--------|
| Recent media (< 30 days) | ~95% | CDN still has it, direct download works |
| Old media, phone has file | ~80% | Re-upload works if phone online |
| Old media, phone wiped/changed | 0% | Nobody has the file anymore |
| Old media, phone offline | 0% (temporary) | Will work when phone comes online |

## Sources

### Primary (HIGH confidence)
- [whatsmeow mediaretry.go](https://github.com/tulir/whatsmeow/blob/main/mediaretry.go) - Go implementation of SendMediaRetryReceipt and DecryptMediaRetryNotification
- [whatsmeow Go package docs](https://pkg.go.dev/go.mau.fi/whatsmeow) - Official API documentation
- [Baileys example.ts](https://github.com/WhiskeySockets/Baileys/blob/master/Example/example.ts) - Official Baileys usage examples
- [Baileys npm docs](https://www.npmjs.com/package/@whiskeysockets/baileys) - downloadMediaMessage + reuploadRequest API
- [Baileys PR #2334](https://github.com/WhiskeySockets/Baileys/pull/2334) - requestPlaceholderResend implementation details
- [Meta Engineering: WhatsApp Multi-Device](https://engineering.fb.com/2021/07/14/security/whatsapp-multi-device/) - Official multi-device architecture

### Secondary (MEDIUM confidence)
- [Baileys Issue #507](https://github.com/WhiskeySockets/Baileys/issues/507) - Known bug: re-upload not working
- [mautrix/whatsapp Issue #374](https://github.com/mautrix/whatsapp/issues/374) - History sync media 404 failures
- [wa-proto](https://github.com/wppconnect-team/wa-proto) - WhatsApp Web protobuf definitions
- [whatsapp-media-decrypt](https://github.com/ddz/whatsapp-media-decrypt) - Media encryption analysis
- [WABetaInfo: re-download deleted media](https://wabetainfo.com/whatsapp-allows-to-redownload-deleted-media/) - CDN retention behavior

### Tertiary (LOW confidence)
- [Mazzo.li WhatsApp backup](https://mazzo.li/posts/whatsapp-backup.html) - Reverse engineering observations
- [Android WhatsApp Forensics](https://belkasoft.com/android-whatsapp-forensics-analysis) - Database structure analysis
Loading