Automate the boring parts of expensing: this toolkit signs into your mailbox, fetches receipts, extracts structured information with AI, and patches up missing totals from PDFs.
| Script | Purpose |
|---|---|
expense_launcher.py |
Friendly entry point that creates .env, runs the collector, and offers to fix zero-amount rows with the PDF scraper. |
collect_expenses.py |
Talks to IMAP (password or Gmail OAuth), downloads attachments, sends emails through the AI analyzer, and streams rows into expenses_<year>.csv. |
pdf_amount_scraper.py |
Revisits rows where the amount is missing but a PDF exists and asks the AI to read the invoice. |
email_backup.py |
Bulk-export raw .eml files from selected folders. |
retrieve_conversation.py |
Pull an entire thread for a specific sender and save it as markdown. |
mcp_email_server.py |
MCP server exposing email tools for AI assistants (search, read, send, retrieve conversations, read PDFs). |
merge_conversations.py |
Merge multiple conversation markdown exports into a single deduplicated, chronological file. |
The heavy lifting lives in ai_expense_analyzer.py, which supports Anthropic Claude and OpenAI models, and oauth_client.py / oauth_microsoft.py for Gmail and Microsoft OAuth flows.
-
Install prerequisites
- Python 3.10+
- uv for dependency management (install once with
pip install uv)
-
Install project dependencies
uv sync
This creates a virtual environment under
.venvand installs everything frompyproject.toml. -
Launch the guided setup
uv run expense-launcherThe launcher will:
- Collect your OpenAI key (required) and optional Anthropic / Gmail app password
- Generate a
.envfile beside the scripts - Ask which year to process
- Walk through IMAP password or Gmail OAuth sign-in
- Run the collector and, afterwards, offer to fill zero-amount rows via the PDF scraper
Need to re-run without setup? uv run collect-expenses (or python collect_expenses.py) starts the collector directly.
- Copy
.env.exampleto.envif you want to edit values manually. - Supported keys:
OPENAI_API_KEY(required for current flow)ANTHROPIC_API_KEY(optional alternative provider)AI_PROVIDER(openaioranthropic; defaults toopenai)GOOGLE_APP_PASSWORD(optional Gmail app password - create one)MICROSOFT_APP_PASSWORD(optional Outlook/Hotmail app password - create one)MICROSOFT_OAUTH_CLIENT_ID(optional for Microsoft OAuth)
- The collector automatically prefers whichever provider has a key in
.env; you can still override choices interactively. - Gmail OAuth stores refresh tokens in
%USERPROFILE%\.email-backup\so you only authorise once per account. Placeclient_secret.jsonin the project root (already ignored by git) or pointGOOGLE_OAUTH_CLIENT_SECRETSto your downloaded file.
flowchart TD
A[Launch script] --> B{Credentials}
B -->|Password| C[IMAP login]
B -->|OAuth| D[Gmail OAuth flow]
C --> E[Search for expense mail]
D --> E
E --> F[Download PDFs]
F --> G[AI analysis]
G --> H[Stream rows into CSV]
H --> I{Zero-amount rows with PDFs?}
I -- Yes --> J[Offer PDF scraper]
I -- No --> K[Finish]
Outputs land in expenses_<year>/:
expenses_2025/
├── expenses_2025.csv # Structured data
├── attachments/ # Saved PDFs
└── expense_collector.log # Detailed run log
Run on-demand when you're missing totals:
uv run python pdf_amount_scraper.pyIt scans the latest CSV, finds rows with amount == 0 and PDF attachments, asks the configured AI to read the invoice, then updates the CSV in place (with a backup of the original).
uv run python email_backup.pyexports whole folders tomail_export/.uv run python retrieve_conversation.pyfetches all messages to/from a contact and writesconversation_<email>.md.uv run python merge_conversations.py <file1> <file2> ...merges multiple conversation exports into one deduplicated file.
The MCP server lets AI assistants (e.g. GitHub Copilot, Claude Desktop) interact with your mailbox directly. Start it with:
uv run python mcp_email_server.pyOr configure it in your editor's MCP settings. Exposed tools:
| Tool | Description |
|---|---|
list_folders |
List all IMAP mailbox folders |
search_emails |
Search emails by sender, subject, date range, folder |
get_email_by_uid |
Read a specific email by UID |
retrieve_conversation |
Export full conversation with a contact as markdown |
list_conversation_attachments |
List attachments from a saved conversation |
download_email_attachment |
Download a specific attachment from an email |
read_email_attachment_pdf |
Read PDF attachment content directly from an email |
read_pdf |
Extract text from a local PDF file |
convert_pdf_to_markdown |
Convert a PDF to markdown |
send_email |
Send an email via SMTP with OAuth |
draft_reply |
Draft a reply to an existing email |
| Option | When to use | Notes |
|---|---|---|
| IMAP password | Works with most providers | Gmail requires an App Password. Microsoft accounts may need an App Password. Prompted securely each run. |
| Gmail OAuth | No app password needed | Install google-auth / google-auth-oauthlib (already included). Place client_secret.json in the project root; tokens cached in %USERPROFILE%\.email-backup. |
| Microsoft OAuth | Outlook/Hotmail/Live/Office365 | Install msal (already included). Register an app in Azure Portal, add IMAP.AccessAsUser.All permission, set MICROSOFT_OAUTH_CLIENT_ID in .env; tokens cached in %USERPROFILE%\.email-backup. |
The collector auto-detects the appropriate OAuth provider based on your email domain.
- Go to Azure App Registrations
- Click "New registration"
- Choose "Personal Microsoft accounts only" or "Accounts in any organizational directory and personal Microsoft accounts"
- Add "Mobile and desktop applications" platform with redirect URI:
http://localhost - Under "API permissions", add
IMAP.AccessAsUser.All(Office 365 Exchange Online) - Copy the Application (client) ID and set it in
.envasMICROSOFT_OAUTH_CLIENT_ID - Run the collector and choose OAuth; it will launch a browser for device code authentication
- Default provider comes from
AI_PROVIDER; the launcher auto-selects the provider whose key is present. - OpenAI usage is summarised at the end of each run (tokens + estimated cost for supported models).
- AI safeguards normalise amounts, vendors, currencies, and refuse suspicious outputs.
- For Anthropic support you'll need the
anthropicPython package (installed viauv sync).
| Column | Meaning |
|---|---|
date |
Email date stamp |
subject, from |
Message metadata |
vendor, amount, currency, category, description |
AI-extracted fields |
confidence |
Normalised confidence (max 0.2 unless attachments were used) |
has_attachments, attachment_files |
Attachment presence and filenames |
folder |
Source mailbox folder |
The collector writes incrementally to minimise data loss if interrupted.
- OAuth keeps opening a browser: delete cached tokens in
%USERPROFILE%\.email-backupand retry. - Collector cannot find client secret: ensure
client_secret.jsonlives beside the scripts or setGOOGLE_OAUTH_CLIENT_SECRETSto an absolute path. - No AI key detected: confirm
.envexists, or rerun the launcher to regenerate it. - PDF scraper skipped rows: check
pdf_amount_scraper.loginside the expenses folder for errors (missing PDFs, unreadable files, etc.). - Dependency issues: rerun
uv sync; uv will recreate the.venvwith all required packages.
MIT License. Do whatever helps you tame your inbox—pull requests welcome!