Skip to content

pieteradejong/imessage-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iMessage Analysis

Purpose and audience

Programmatic analysis and visualization of iMessages (macOS chat.db).

Example questions you can answer:

  • How many people have you corresponded with in the last month / year?
  • Who's quickest (or slowest) to text back?
  • Who have you messaged with most during a particular time?
  • Have you lost touch with anyone?
  • Who are your newest contacts? (people whose first-ever message is most recent)

Target audience: any iMessage user. The project reads directly from ~/Library/Messages/chat.db, so it requires minor technical skills and a macOS terminal with Full Disk Access granted.

Safety — your original chat.db is never modified

The ETL never writes to ~/Library/Messages/chat.db:

  1. It opens the original read-only via the file:…?mode=ro SQLite URI (the OS will reject any write attempt at the file-descriptor level).
  2. It uses SQLite's backup() API to create a consistent timestamped snapshot in ~/.imessage_analysis/snapshots/ — safe even while Messages.app is writing.
  3. All extraction and analysis runs against the snapshot, not the original. Only SELECT statements touch the source.
  4. Analysis results are written to a separate file: ~/.imessage_analysis/analysis.db.
  5. Snapshots are reused for 7 days, so the original is read at most once per week per run.

You can verify by comparing stat -f "%Sm %z" ~/Library/Messages/chat.db before and after a run — the mtime will not change.

Quick start

# 1. One-time setup: Python 3.12 venv, editable install, frontend npm deps
./init.sh

# 2. Run everything: ETL (if stale) + FastAPI backend + Vite frontend
./run.sh

# 3. Run the full test suite (black, mypy, pytest, coverage, integration)
./test.sh

Then open http://127.0.0.1:5173.

Requirements

  • macOS (for chat.db at ~/Library/Messages/chat.db and AddressBook at ~/Library/Application Support/AddressBook)
  • Python 3.12 (Homebrew)
  • Node.js + npm (for the frontend)
  • Full Disk Access granted to whichever terminal app you run ./run.sh from (System Settings → Privacy & Security → Full Disk Access). Contacts permission is also recommended, so AddressBook names can be matched to phone/email handles.

A note on macOS / Homebrew Python + pyexpat

On macOS 26+ (Tahoe), Homebrew's python@3.12 bottle can ship linked against a newer libexpat than the system library, breaking pip with a _XML_SetAllocTrackerActivationThreshold symbol error. If you hit this during ./init.sh:

brew install expat
install_name_tool -change /usr/lib/libexpat.1.dylib \
  /opt/homebrew/opt/expat/lib/libexpat.1.dylib \
  $(brew --prefix)/Cellar/python@3.12/*/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/pyexpat.cpython-312-darwin.so
codesign --force --sign - \
  $(brew --prefix)/Cellar/python@3.12/*/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload/pyexpat.cpython-312-darwin.so

Then re-run ./init.sh.

Architecture

┌──────────────────────────┐       ┌────────────────────────────┐
│ ~/Library/Messages/      │       │ ~/Library/Application      │
│   chat.db  (read-only)   │       │   Support/AddressBook/     │
└───────────┬──────────────┘       │   **/AddressBook-v*.abcddb │
            │ SQLite backup API    └─────────────┬──────────────┘
            ▼                                    │
┌────────────────────────────────┐               │
│ ~/.imessage_analysis/          │               │
│   snapshots/chat_<ts>.db       │               │
└───────────┬────────────────────┘               │
            │        ETL pipeline                │
            │  (snapshot-first, read-only)       │
            ▼                                    ▼
┌─────────────────────────────────────────────────────────────┐
│ ETL: extract → normalize → load → resolve identity          │
│  - fact_message, dim_handle, dim_person, dim_contact_method │
│  - pick richest AddressBook source via contacts_discovery   │
└───────────┬─────────────────────────────────────────────────┘
            ▼
┌──────────────────────────┐
│ ~/.imessage_analysis/    │
│   analysis.db            │ ◄──── FastAPI (imessage_analysis/api.py)
└──────────────────────────┘              │
                                          ▼
                                ┌──────────────────┐
                                │ frontend (Vite + │
                                │   React + TS)    │
                                └──────────────────┘

AddressBook / Contacts ingestion

macOS stores contacts as multiple SQLite databases under ~/Library/Application Support/AddressBook:

  • AddressBook-v22.abcddb — local-only source (often nearly empty)
  • Sources/<uuid-1>/AddressBook-v22.abcddb — e.g. iCloud
  • Sources/<uuid-2>/AddressBook-v22.abcddb — e.g. Google / CardDAV

imessage_analysis.etl.contacts_discovery enumerates every .abcddb under that root, counts ZABCDRECORD/ZABCDPHONENUMBER/ZABCDEMAILADDRESS, and picks the richest one (so the ETL ingests the source with the most real contacts, not the empty local-only one).

Diagnostic CLI:

# Show a table of every AddressBook DB and its record counts
.venv/bin/python -m imessage_analysis.etl.contacts_discovery

# Print just the chosen path (for shell pipelines)
.venv/bin/python -m imessage_analysis.etl.contacts_discovery --path-only

# JSON output
.venv/bin/python -m imessage_analysis.etl.contacts_discovery --json

If this prints nothing, your terminal lacks Contacts / Full Disk Access — grant it in System Settings and restart the terminal.

Scripts

Script Purpose
./init.sh Create .venv, install Python deps editable + [dev], install frontend npm deps. Idempotent.
./run.sh Check prerequisites (venv, npm, FDA). Rebuild ETL if analysis.db is missing or older than 7 days. Start FastAPI backend (127.0.0.1:8000) and Vite frontend (127.0.0.1:5173). Handles Ctrl-C cleanup.
./test.sh 11-stage suite: black, mypy, import checks, bandit security, pytest unit + coverage (90% threshold), ETL tests, Hypothesis property tests, API endpoint tests, integration tests (skipped if no live chat.db).

Project structure

imessage-analysis/
├── imessage_analysis/                 # Python package
│   ├── api.py                         # FastAPI endpoints
│   ├── analysis.py, queries.py        # Higher-level analysis + SQL
│   ├── database.py, config.py         # Connection management, config
│   ├── snapshot.py                    # chat.db snapshot utilities
│   ├── visualization.py, utils.py
│   └── etl/
│       ├── pipeline.py                # Orchestration (run_etl_with_snapshot)
│       ├── extractors.py              # chat.db + AddressBook SELECT-only extractors
│       ├── normalizers.py             # Phone / email canonicalization
│       ├── identity.py                # Handle → person resolution
│       ├── loaders.py                 # Writes to analysis.db
│       ├── schema.py                  # analysis.db DDL
│       ├── validation.py              # Post-ETL integrity checks
│       └── contacts_discovery.py      # Enumerate & pick AddressBook sources
├── frontend/                          # Vite + React + TS + Tailwind + shadcn
│   └── src/
│       ├── App.tsx, api.ts
│       └── components/
│           ├── contacts/              # ContactsTable, etc.
│           └── recent/                # RecentConversationsList
├── tests/                             # pytest + hypothesis; 460+ tests
├── init.sh / run.sh / test.sh
├── main.py                            # CLI entry point for ad-hoc analysis
├── pyproject.toml, setup.py, requirements.txt
└── *.md                               # README, IMPROVEMENTS, LEARNINGS, …

Programmatic usage

from pathlib import Path
from imessage_analysis.etl.pipeline import run_etl_with_snapshot
from imessage_analysis.etl.contacts_discovery import discover_contacts_dbs, pick_best

best = pick_best(discover_contacts_dbs())
contacts_path = best.path if best else None

result = run_etl_with_snapshot(
    source_db_path=Path.home() / "Library/Messages/chat.db",
    analysis_db_path=Path.home() / ".imessage_analysis/analysis.db",
    snapshots_dir=Path.home() / ".imessage_analysis/snapshots",
    contacts_db_path=contacts_path,
    force_full=False,
)
print(result)

Quick ad-hoc queries

# Newest contacts (smallest first_msg, sorted DESC)
sqlite3 -header -column ~/.imessage_analysis/analysis.db "
SELECT h.value_raw, MIN(m.date_utc) AS first_msg, COUNT(*) AS msgs
FROM dim_handle h JOIN fact_message m ON m.handle_id = h.handle_id
GROUP BY h.handle_id
HAVING SUM(m.is_from_me=0) >= 1 AND SUM(m.is_from_me=1) >= 1
ORDER BY first_msg DESC LIMIT 20;"

API endpoints

  • GET /health — health check
  • GET /summary — overall counts
  • GET /latest?limit=N — most recent messages
  • GET /top-chats?limit=N — chats by message count
  • GET /contacts — all contacts sorted by most recent communication (last_received_message DESC)
  • GET /contacts/{handle_id} — detail view for a single contact

Further reading

Appendix — Inspiration

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors