Programmatic analysis and visualization of iMessages (macOS chat.db).
Example questions you can answer:
- How many people have you corresponded with in the last month / year?
- Who's quickest (or slowest) to text back?
- Who have you messaged with most during a particular time?
- Have you lost touch with anyone?
- Who are your newest contacts? (people whose first-ever message is most recent)
Target audience: any iMessage user. The project reads directly from ~/Library/Messages/chat.db, so it requires minor technical skills and a macOS terminal with Full Disk Access granted.
The ETL never writes to ~/Library/Messages/chat.db:
- It opens the original read-only via the
file:…?mode=roSQLite URI (the OS will reject any write attempt at the file-descriptor level). - It uses SQLite's
backup()API to create a consistent timestamped snapshot in~/.imessage_analysis/snapshots/— safe even while Messages.app is writing. - All extraction and analysis runs against the snapshot, not the original. Only
SELECTstatements touch the source. - Analysis results are written to a separate file:
~/.imessage_analysis/analysis.db. - Snapshots are reused for 7 days, so the original is read at most once per week per run.
You can verify by comparing stat -f "%Sm %z" ~/Library/Messages/chat.db before and after a run — the mtime will not change.
# 1. One-time setup: Python 3.12 venv, editable install, frontend npm deps
./init.sh
# 2. Run everything: ETL (if stale) + FastAPI backend + Vite frontend
./run.sh
# 3. Run the full test suite (black, mypy, pytest, coverage, integration)
./test.shThen open http://127.0.0.1:5173.
- macOS (for
chat.dbat~/Library/Messages/chat.dband AddressBook at~/Library/Application Support/AddressBook) - Python 3.12 (Homebrew)
- Node.js + npm (for the frontend)
- Full Disk Access granted to whichever terminal app you run
./run.shfrom (System Settings → Privacy & Security → Full Disk Access). Contacts permission is also recommended, so AddressBook names can be matched to phone/email handles.
On macOS 26+ (Tahoe), Homebrew's python@3.12 bottle can ship linked against a newer libexpat than the system library, breaking pip with a _XML_SetAllocTrackerActivationThreshold symbol error. If you hit this during ./init.sh:
brew install expat
install_name_tool -change /usr/lib/libexpat.1.dylib \
/opt/homebrew/opt/expat/lib/libexpat.1.dylib \
$(brew --prefix)/Cellar/python@3.12/*/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/pyexpat.cpython-312-darwin.so
codesign --force --sign - \
$(brew --prefix)/Cellar/python@3.12/*/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/pyexpat.cpython-312-darwin.soThen re-run ./init.sh.
┌──────────────────────────┐ ┌────────────────────────────┐
│ ~/Library/Messages/ │ │ ~/Library/Application │
│ chat.db (read-only) │ │ Support/AddressBook/ │
└───────────┬──────────────┘ │ **/AddressBook-v*.abcddb │
│ SQLite backup API └─────────────┬──────────────┘
▼ │
┌────────────────────────────────┐ │
│ ~/.imessage_analysis/ │ │
│ snapshots/chat_<ts>.db │ │
└───────────┬────────────────────┘ │
│ ETL pipeline │
│ (snapshot-first, read-only) │
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ ETL: extract → normalize → load → resolve identity │
│ - fact_message, dim_handle, dim_person, dim_contact_method │
│ - pick richest AddressBook source via contacts_discovery │
└───────────┬─────────────────────────────────────────────────┘
▼
┌──────────────────────────┐
│ ~/.imessage_analysis/ │
│ analysis.db │ ◄──── FastAPI (imessage_analysis/api.py)
└──────────────────────────┘ │
▼
┌──────────────────┐
│ frontend (Vite + │
│ React + TS) │
└──────────────────┘
macOS stores contacts as multiple SQLite databases under ~/Library/Application Support/AddressBook:
AddressBook-v22.abcddb— local-only source (often nearly empty)Sources/<uuid-1>/AddressBook-v22.abcddb— e.g. iCloudSources/<uuid-2>/AddressBook-v22.abcddb— e.g. Google / CardDAV
imessage_analysis.etl.contacts_discovery enumerates every .abcddb under that root, counts ZABCDRECORD/ZABCDPHONENUMBER/ZABCDEMAILADDRESS, and picks the richest one (so the ETL ingests the source with the most real contacts, not the empty local-only one).
Diagnostic CLI:
# Show a table of every AddressBook DB and its record counts
.venv/bin/python -m imessage_analysis.etl.contacts_discovery
# Print just the chosen path (for shell pipelines)
.venv/bin/python -m imessage_analysis.etl.contacts_discovery --path-only
# JSON output
.venv/bin/python -m imessage_analysis.etl.contacts_discovery --jsonIf this prints nothing, your terminal lacks Contacts / Full Disk Access — grant it in System Settings and restart the terminal.
| Script | Purpose |
|---|---|
./init.sh |
Create .venv, install Python deps editable + [dev], install frontend npm deps. Idempotent. |
./run.sh |
Check prerequisites (venv, npm, FDA). Rebuild ETL if analysis.db is missing or older than 7 days. Start FastAPI backend (127.0.0.1:8000) and Vite frontend (127.0.0.1:5173). Handles Ctrl-C cleanup. |
./test.sh |
11-stage suite: black, mypy, import checks, bandit security, pytest unit + coverage (90% threshold), ETL tests, Hypothesis property tests, API endpoint tests, integration tests (skipped if no live chat.db). |
imessage-analysis/
├── imessage_analysis/ # Python package
│ ├── api.py # FastAPI endpoints
│ ├── analysis.py, queries.py # Higher-level analysis + SQL
│ ├── database.py, config.py # Connection management, config
│ ├── snapshot.py # chat.db snapshot utilities
│ ├── visualization.py, utils.py
│ └── etl/
│ ├── pipeline.py # Orchestration (run_etl_with_snapshot)
│ ├── extractors.py # chat.db + AddressBook SELECT-only extractors
│ ├── normalizers.py # Phone / email canonicalization
│ ├── identity.py # Handle → person resolution
│ ├── loaders.py # Writes to analysis.db
│ ├── schema.py # analysis.db DDL
│ ├── validation.py # Post-ETL integrity checks
│ └── contacts_discovery.py # Enumerate & pick AddressBook sources
├── frontend/ # Vite + React + TS + Tailwind + shadcn
│ └── src/
│ ├── App.tsx, api.ts
│ └── components/
│ ├── contacts/ # ContactsTable, etc.
│ └── recent/ # RecentConversationsList
├── tests/ # pytest + hypothesis; 460+ tests
├── init.sh / run.sh / test.sh
├── main.py # CLI entry point for ad-hoc analysis
├── pyproject.toml, setup.py, requirements.txt
└── *.md # README, IMPROVEMENTS, LEARNINGS, …
from pathlib import Path
from imessage_analysis.etl.pipeline import run_etl_with_snapshot
from imessage_analysis.etl.contacts_discovery import discover_contacts_dbs, pick_best
best = pick_best(discover_contacts_dbs())
contacts_path = best.path if best else None
result = run_etl_with_snapshot(
source_db_path=Path.home() / "Library/Messages/chat.db",
analysis_db_path=Path.home() / ".imessage_analysis/analysis.db",
snapshots_dir=Path.home() / ".imessage_analysis/snapshots",
contacts_db_path=contacts_path,
force_full=False,
)
print(result)# Newest contacts (smallest first_msg, sorted DESC)
sqlite3 -header -column ~/.imessage_analysis/analysis.db "
SELECT h.value_raw, MIN(m.date_utc) AS first_msg, COUNT(*) AS msgs
FROM dim_handle h JOIN fact_message m ON m.handle_id = h.handle_id
GROUP BY h.handle_id
HAVING SUM(m.is_from_me=0) >= 1 AND SUM(m.is_from_me=1) >= 1
ORDER BY first_msg DESC LIMIT 20;"GET /health— health checkGET /summary— overall countsGET /latest?limit=N— most recent messagesGET /top-chats?limit=N— chats by message countGET /contacts— all contacts sorted by most recent communication (last_received_message DESC)GET /contacts/{handle_id}— detail view for a single contact
DATA_ARCHITECTURE.md— analysis.db schema, dim/fact design, identity resolutionLEARNINGS.md— schema reverse-engineering notes on chat.db and AddressBookIMPROVEMENTS.md— changelog of structural improvementsMIGRATION.md— historical migration from the pre-package layoutCHATDB.md,DB_ANALYSIS.md— raw DB exploration notes