A local-first NLP engine that analyzes text-based communication for behavioral patterns — built with Python, FastAPI, SQLite, React, and a three-layer AI agent architecture.
Privacy First: This tool runs 100% locally. No data leaves your machine. No cloud. No telemetry. No accounts.
- Behavioral pattern detection engine — regex-based NLP with context-aware filtering, detecting 30+ patterns across 6 research-backed categories (DARVO, Gottman's Four Horsemen, Coercive Control, and more).
- Three-layer Agentic RAG architecture — Structured Query → Keyword Retrieval → LLM Deep Analysis, so users can ask natural-language questions about their data.
- 726 tests, 0 lint errors, 0 type errors — 33 test files covering pattern detection, API endpoints, agent routing, edge cases, database integrity, and malformed data resilience.
- Full-stack local application — Python analysis engine, FastAPI REST API, React/TypeScript dashboard, SQLite persistence, Celery async processing.
Stack: Python 3.11 · FastAPI · SQLite · React · TypeScript · Celery · Redis · structlog · Pydantic · defusedxml · Docker
| Claim | Proof |
|---|---|
| 726 tests collected | python -m pytest --co -q |
| 693 passing, 0 collection errors | python -m pytest --tb=short -q |
| 0 lint errors | ruff check . → "All checks passed!" |
| 0 type errors | mypy engine api --ignore-missing-imports → "Success: no issues found in 36 source files" |
| Pattern detection accuracy | 400+ test cases across 5 dedicated pattern test suites |
| Data never leaves machine | No outbound calls except optional LLM queries (user-provided API key). See DATA_PRIVACY.md |
| API works | uvicorn api.main:app --reload → Swagger UI at localhost:8000/docs |
| Dashboard builds | cd dashboard && npm run build → production bundle |
- Research-grounded patterns — every detection category cites peer-reviewed sources (Freyd 1997, Gottman 1999, Stark 2007, Johnson 2008)
- Context-aware false-positive filtering — apologies, self-directed negativity, de-escalation, and banter are suppressed before flagging
- Bidirectional analysis — both parties are analyzed equally; no assumption about who is "the bad actor"
- Supportive pattern detection — not just negatives; empathy, validation, accountability, and appreciation are tracked
- Structured logging — every request gets a UUID, every agent answer logs layer, duration, and confidence via structlog
- Security hardened — BasicAuth middleware, rate limiting (slowapi), path traversal protection, defusedxml for XML parsing, encryption-at-rest support
| Talking Point | Detail |
|---|---|
| NLP / text analysis | Regex-based behavioral pattern detection with 30+ rules, context-aware false-positive filtering, and peer-reviewed academic citations |
| Data engineering | Multi-format parser pipeline (XML, SQLite, CSV, JSON), SQLite WAL persistence, structured data pipeline from raw exports to analytics |
| API design | FastAPI with Pydantic schemas, rate limiting, auth middleware, structured error handling, async task queue (Celery/Redis) |
| AI agent architecture | Three-layer Agentic RAG — most questions answered without any LLM call; 225 test questions validate routing logic |
| Testing discipline | 726 tests across 33 files: pattern detection, API, agent, fuzz, golden snapshots, malformed data resilience, encryption |
| Security awareness | defusedxml (XXE prevention), path traversal checks, Fernet encryption-at-rest, request tracing, rate limiting |
| Full-stack ownership | Python engine, FastAPI API, React/TS dashboard, SQLite storage, Docker config, structured logging — all one person |
| Self-awareness | Honest roadmap, known limitations documented, self-audit report that identifies what's broken and what's not done |
I built this as a solo project to learn NLP, data engineering, and full-stack architecture. I use AI-assisted tooling for development, but I review every change, write tests, and validate behavior with static analysis and manual testing.
- Role: Sole developer — design, implementation, testing, documentation
- Standard: ruff clean, mypy clean, 95%+ test pass rate before any commit
- AI policy: AI-assisted code is reviewed, tested, and understood. I can explain and extend every component.
- Honest gaps: 16 of 726 tests are skipped (no API key / no sample data). 17 fail — 15 are documented planned features (see Roadmap), 2 are edge-case assertion refinements.
Each part is also available as a standalone document:
| Part | Standalone |
|---|---|
| Summary | docs/summary.md |
| Technical Stack | docs/tech_stack.md |
| Quick Start | docs/quick_start.md |
| Tutorial | docs/tutorial.md |
| What you want | Where to find it | Time |
|---|---|---|
| Stack + key claims | Impact ↑ | 30 sec |
| Proof it works | Evidence ↑ | 1 min |
| Architecture decisions | Architecture ↓ | 2 min |
| What I'd improve | Roadmap ↓ + Known Limitations ↓ | 2 min |
| Run it yourself | Quick Start ↓ | 3 min |
| What you want | Where to find it | Time |
|---|---|---|
| System architecture | Architecture ↓ | 2 min |
| Pattern detection internals | Pattern Detection ↓ | 3 min |
| AI agent design | Agentic RAG ↓ | 3 min |
| Security + privacy posture | Security ↓ + DATA_PRIVACY.md | 2 min |
| Testing strategy | Testing ↓ | 2 min |
| Data flow | Data Flow ↓ | 2 min |
| File map | File Map ↓ | 1 min |
| What you want | Where to find it | Time |
|---|---|---|
| Install + run analysis | Quick Start ↓ | 3 min |
| Start the dashboard | Dashboard ↓ | 2 min |
| Ask AI questions | AI Assistant ↓ | 2 min |
| Non-technical walkthrough | Tutorial ↓ | 10 min |
┌─────────────────────────────────────────────────────────┐
│ React Dashboard │
│ (TypeScript, Vite, Recharts) │
└────────────────────────┬────────────────────────────────┘
│ HTTP (localhost:8000)
┌────────────────────────▼────────────────────────────────┐
│ FastAPI Backend │
│ ┌──────────┐ ┌──────────┐ ┌─────────┐ ┌─────────┐ │
│ │ REST API │ │ Chat API │ │ Auth │ │ Rate │ │
│ │ /cases/* │ │ /ask │ │ Basic │ │ Limiter │ │
│ └────┬─────┘ └────┬─────┘ └─────────┘ └─────────┘ │
│ │ │ │
│ ┌────▼─────┐ ┌─────▼──────────────────────────┐ │
│ │ Case │ │ Analysis Agent (RAG) │ │
│ │ Storage │ │ L1: Structured → L2: Retrieval │ │
│ │ (SQLite)│ │ → L3: LLM Deep Analysis │ │
│ └──────────┘ └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────┐
│ Analysis Engine │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │
│ │ Parsers │ │ Pattern │ │Supportive│ │ Report │ │
│ │ XML/SQL/ │ │ Detector │ │ Pattern │ │ Writer │ │
│ │ CSV/JSON │ │ (30+) │ │ Detector │ │ MD/JSON│ │
│ └──────────┘ └──────────┘ └──────────┘ └────────┘ │
└─────────────────────────────────────────────────────────┘
| Decision | Why |
|---|---|
| Local-first, no cloud | Communication data is deeply personal. Zero-trust architecture means the user's data never touches a network unless they explicitly configure an LLM API key. |
| Regex over ML for pattern detection | Interpretable, auditable, no training data needed. Every flag can be traced to a specific regex rule and academic source. ML would be a black box. |
| Three-layer agent (not just LLM) | Layer 1 handles stats without any API call. Layer 2 retrieves context locally. Only Layer 3 calls an external LLM — and only if configured. Most questions never leave the machine. |
| SQLite over Postgres | Single-user local tool. SQLite is zero-config, portable, and handles the data volumes here (tens of thousands of messages) easily. WAL mode enabled for concurrent reads. |
| Bidirectional analysis | Both parties are analyzed with the same rules. The tool doesn't assume who is "right." This is a deliberate design choice to reduce bias. |
The engine uses regex-based detection grounded in peer-reviewed behavioral science. Every pattern includes the academic citation.
DARVO (Freyd, 1997)
- Deny — Denying events that occurred
- Attack — Attacking the accuser to deflect blame
- Reverse Victim & Offender — Claiming victimhood when confronted
Gottman's Four Horsemen (Gottman & Silver, 1999)
- Criticism — Attacking character rather than behavior
- Contempt — Expressions of superiority, mockery, or disgust
- Defensiveness — Counter-blaming or playing the victim
- Stonewalling — Withdrawal and refusal to engage
Coercive Control (Stark, 2007)
- Isolation — Controlling who the person sees or talks to
- Financial Control — Using money as leverage
- Weaponizing Health — Using illness or trauma to manipulate
Additional Manipulation Patterns
- Gaslighting, guilt-tripping, love bombing, future faking, triangulation, silent treatment threats, minimizing, blame shifting, deflection
- Validation — Acknowledging the other person's feelings
- Empathy — Showing emotional understanding
- Appreciation — Expressing gratitude or value
- Accountability — Owning one's actions
- De-escalation — Attempting to reduce conflict
To reduce false positives, negative flags are suppressed when the engine detects:
- Apologies ("I'm sorry", "My bad")
- Self-directed negativity ("I hate myself" vs. "I hate you")
- De-escalation ("Let's take a break")
- Banter/jokes (laughter, emojis, reciprocal tone)
The AI assistant uses a three-layer architecture to answer questions about case data:
| Layer | Name | What it does | Needs LLM? |
|---|---|---|---|
| L1 | Structured Query | Handles stat questions ("How many messages?", "Who sent more?") via direct SQL queries | No |
| L2 | Keyword Retrieval | Searches messages by date, pattern, direction, severity, keywords | No |
| L3 | Deep Analysis | Constructs a context-rich prompt with retrieved messages and sends to OpenAI/Anthropic | Yes |
How it works: Every question enters at L1. If the StructuredQueryEngine can answer it (counts, comparisons, breakdowns), it returns immediately — no network call, no latency. If L1 can't handle it, L2 retrieves relevant messages and formats them. If an LLM is configured, L3 sends the context + question to get a nuanced answer. If no LLM is configured, L2's retrieval result is returned directly.
207 of 225 agent test questions pass. The 18 failures are planned L1 features documented in the Roadmap.
1. User provides data (XML/SQLite/CSV/JSON)
│
2. Parser extracts messages + calls
│
3. Pattern detector runs on each message (both directions)
│
4. Results stored: SQLite DB + JSON + Markdown reports
│
5. FastAPI serves data to React dashboard
│
6. User asks questions → Agent answers via L1/L2/L3
What goes to the network: Nothing, unless you configure an LLM API key in .env. If configured, only the specific messages relevant to your question (plus the question itself) are sent to the LLM provider. Full details in DATA_PRIVACY.md.
| Source | Format | Direction | Notes |
|---|---|---|---|
| SMS Backup & Restore | XML | Both | Standard Android backup format |
| Call Logs | XML | Both | Standard Android call log format |
| Signal Desktop | SQLite | Both | Decrypted via sqlcipher |
| CSV | CSV | Both | Custom import (Date, Direction, Body) |
| Manual JSON | JSON | Both | For transcribing handwritten notes/images |
| Text Paste | Plain text | Both | Paste chat logs directly via API |
| Layer | Implementation |
|---|---|
| Authentication | HTTP Basic Auth on all endpoints (configurable credentials) |
| Rate Limiting | slowapi — 60/min for reads, 20/min for analysis, 5/min for deletes |
| XML Safety | defusedxml — prevents XXE, billion laughs, and entity expansion attacks |
| Path Traversal | All case paths are resolved and validated against the cases root directory |
| Request Tracing | Every request gets a UUID via middleware, propagated through structlog |
| Encryption at Rest | Fernet symmetric encryption for stored data (optional, key in .env) |
| Data Isolation | Each case is a separate directory. Delete a case = delete the directory. |
| No Telemetry | Zero analytics, zero tracking, zero outbound connections (unless LLM configured) |
See SECURITY_HARDENING.md for the full hardening checklist.
726 tests collected across 33 test files.
| Category | Tests | What's covered |
|---|---|---|
| Pattern detection (DARVO) | 35 | deny, attack, reverse victim/offender |
| Pattern detection (Gaslighting) | 28 | reality denial, memory manipulation, crazy-making |
| Pattern detection (Gottman) | 36 | criticism, contempt, defensiveness, stonewalling |
| Pattern detection (Coercive Control) | 28 | isolation, financial control, health weaponization |
| Pattern detection (Manipulation) | 37 | guilt-tripping, love bombing, future faking, triangulation |
| Supportive patterns | 85 | validation, empathy, accountability, appreciation |
| Context filters | 46 | apology suppression, self-directed, de-escalation, banter |
| Agent questions (225 NL queries) | 225 | stat queries, pattern searches, date filtering, edge cases |
| API endpoints | 5 | cases, summary, timeline, patterns, hurtful |
| Upload + ingestion | 8 | file upload, text paste, extension validation, case creation |
| Async integration | 3 | Celery task dispatch, task status |
| Database integrity | 6 | WAL mode, schema init, CRUD operations |
| Edge cases + malformed data | 36 | empty input, None values, missing keys, garbage data |
| Encryption | 1 | Fernet encrypt/decrypt round-trip |
| Caching | 1 | Agent cache hit/miss/invalidation |
| Rate limiting | 1 | Endpoint rate limit enforcement |
| Hurtful language | 30 | severity classification, bidirectional |
| Bidirectional analysis | 30 | both-party pattern detection |
| Parsers | 22 | XML, Signal SQLite, CSV, JSON parsing |
| Observability + tracing | 3 | structured logging, request ID propagation |
What's not tested (honest gaps):
- No browser-level E2E tests for the React dashboard (the dashboard is a read-only viewer)
- No load/stress testing (single-user local tool — not a scaling concern at this stage)
- 16 tests skip due to missing API keys or sample data
- 17 tests fail: 15 are documented planned L1 agent features, 2 are assertion refinements
python -m pytest # full suite
python -m pytest --tb=short -q # compact output
python -m pytest tests/test_patterns_darvo.py -v # specific suiteruff check . # 0 errors
mypy engine api --ignore-missing-imports # 0 errors- Python 3.11+
- Node.js 18+ (for dashboard, optional)
- Redis (for async tasks, optional)
git clone https://github.com/beautifulplanet/Communication-Analysis-Toolkit.git
cd Communication-Analysis-Toolkit
pip install -r requirements.txt1. Prepare your data
Place your sms_backup.xml (from SMS Backup & Restore) into a new folder:
cases/my_case/source_data/sms_backup.xml
2. Create a config file
Create cases/my_case/config.json:
{
"case_name": "My Analysis",
"user_label": "Me",
"contact_label": "Them",
"sms_xml": "./cases/my_case/source_data/sms_backup.xml",
"output_dir": "./cases/my_case/output",
"date_start": "2020-01-01",
"date_end": "2025-12-31"
}3. Run the engine
python -m engine.cli --config cases/my_case/config.jsonThe engine displays a legal consent notice on first run. Output files are generated in cases/my_case/output/.
cd dashboard
npm install
npm run devThen start the API:
uvicorn api.main:app --reload --port 8000Open http://localhost:5173 to see the dashboard.
To enable natural-language questions about your data:
- Copy
.env.exampleto.env - Add your API key:
OPENAI_API_KEY=sk-... # or ANTHROPIC_API_KEY=sk-ant-... - Restart the API server
- Use the chat panel in the dashboard or POST to
/api/cases/{case_id}/ask
Without an API key, the agent still works — it just uses Layer 1 (structured queries) and Layer 2 (keyword retrieval) without LLM interpretation.
| File | What it contains |
|---|---|
TIMELINE.md |
Day-by-day narrative of the relationship, flagging incidents |
ANALYSIS.md |
Statistics: message volume, response times, pattern breakdowns |
EVIDENCE.md |
Every flagged message with severity rating and pattern classification |
DATA.json |
Machine-readable dataset of the entire analysis |
AI_PROMPTS.md |
Ready-made prompts for external AI auditing (ChatGPT, Claude, Gemini) |
TIMELINE.md (excerpt):
2024-03-15 — Escalation Point
18:42 - Contact: "That never happened" [DENY]
18:44 - Contact: "You're imagining things" [GASLIGHTING]
18:47 - User: "I have the texts right here"
18:49 - Contact: "You're crazy" [GASLIGHTING - SEVERE]
Communication-Analysis-Toolkit/
├── engine/ # Core analysis engine (Python)
│ ├── analyzer.py # Main analysis pipeline (~1,050 lines)
│ ├── patterns.py # Negative pattern detection (30+ rules)
│ ├── patterns_supportive.py # Positive pattern detection
│ ├── relationship_health.py # Gottman ratio + health scoring
│ ├── reporting.py # Markdown + JSON report generation
│ ├── storage.py # CaseStorage DAO (SQLite)
│ ├── db.py # Database connection + schema init
│ ├── crypto.py # Fernet encryption at rest
│ ├── logger.py # Structured logging (structlog)
│ ├── llm.py # LLM client (OpenAI/Anthropic)
│ ├── cli.py # CLI entry point with consent flow
│ └── tasks.py # Celery async task definitions
│
├── api/ # FastAPI backend
│ ├── main.py # App setup, middleware, routes
│ ├── agent.py # Three-layer Analysis Agent
│ ├── retriever.py # Message retrieval + filtering
│ ├── data.py # CaseDataReader
│ ├── services.py # Agent caching layer
│ ├── config.py # Pydantic settings
│ ├── dependencies.py # Shared deps + case data loading
│ ├── middleware.py # Auth + Request ID middleware
│ ├── schemas.py # Pydantic response models
│ ├── errors.py # Error handlers
│ └── routers/ # Endpoint modules
│ ├── chat.py # /ask endpoint
│ ├── cases.py # Case CRUD
│ ├── ingestion.py # Async analysis trigger
│ ├── messages.py # Message browsing
│ ├── upload.py # File + text upload
│ └── health.py # Health check + metrics
│
├── dashboard/ # React frontend (TypeScript)
│ └── src/
│ ├── App.tsx # Main app with routing
│ ├── components/ # UI components
│ └── api/ # API client
│
├── tests/ # 33 test files, 726 tests
│ ├── test_patterns_*.py # Pattern detection suites (5 files)
│ ├── test_supportive_*.py # Positive pattern tests
│ ├── test_agent_questions.py # 225 NL query tests
│ ├── test_api.py # API endpoint tests
│ ├── test_malformed_data.py # Resilience tests
│ └── ... # 25 more test files
│
├── cases/ # Case data directory (gitignored)
├── docs/ # Architecture Decision Records
├── DATA_PRIVACY.md # Privacy policy
├── SECURITY_HARDENING.md # Security checklist
├── CONTRIBUTING.md # Contribution guidelines
├── CHANGELOG.md # Version history
└── requirements.txt # Python dependencies
| Metric | Value |
|---|---|
| Python source files | 100 |
| Python LOC (engine + api + tests) | ~9,660 |
| Dashboard LOC (TypeScript) | ~760 |
| Test files | 33 |
| Test count | 726 |
| Pass rate | 693/726 (95.5%) |
| Pattern rules | 30+ negative, 10+ positive |
| Supported data formats | 6 (XML, SQLite, CSV, JSON, text paste, call logs) |
| ruff errors | 0 |
| mypy errors | 0 |
These are features where the agent currently falls back to Layer 2 (keyword search) instead of answering directly via Layer 1 (structured SQL). They're tracked as failing tests in test_agent_questions.py:
| Feature | Test class | What it needs |
|---|---|---|
| Gap detection queries | TestGaps (3 tests) |
L1 handler for "Were there communication gaps?" that queries the gaps table |
| Monthly breakdown queries | TestMonthly (4 tests) |
L1 handler for "Show monthly stats" that aggregates by month |
| Top pattern queries | TestBreakdowns (2 tests) |
L1 handler for "Most common pattern?" that ranks pattern counts |
| Contact day enrichment | TestDayCounts (1 test) |
Include "contact" label in day count responses |
| Hurtful by-party breakdown | TestHurtfulCounts (1 test) |
Include user/contact names in hurtful count responses |
| RAG retriever SQLite integration | TestAgentIntegration (2 tests) |
Connect L2 retriever to SQLite instead of dict-based data |
| Silent day detection | TestNoContactDays (1 test) |
L1 handler for "silent days" / "no contact days" |
| Empty data edge cases | TestAgentWithEmptyData (2 tests) |
Refine L1 responses when case has zero data |
- Deployment plan — Docker Compose for single-command startup; potential hosted version
- Dashboard E2E tests — Playwright tests for the React frontend
- ML-enhanced detection — Train a classifier on labeled data to complement regex rules
- Multi-contact analysis — Analyze multiple relationships within a single case
- Export to PDF — Generate printable reports from the Markdown output
- Regex-based detection has limits. Sarcasm, coded language, and cultural context can cause false positives/negatives. The tool is a starting point, not a conclusion.
- Supportive scores are text-only. Acts of service, quality time, physical affection, and non-verbal cues are not captured. A low supportive score does not mean support is absent.
- Not a diagnostic tool. This does not replace professional legal or psychological advice. Pattern detection is probabilistic — always review flagged content in full context.
- Single-user local tool. Not designed for multi-tenant deployment. No user accounts, no role-based access.
- LLM answers depend on the model. Layer 3 quality varies by provider and model version. The tool is designed to work well without any LLM.
If you're not a developer, here's how to use this tool step by step.
For Android Users (SMS/MMS):
- Download SMS Backup & Restore from the Play Store
- Tap Backup → select "Text Messages" (and "Call Logs" if desired)
- Choose "Local Backup" and save the XML file
- Transfer the file to your computer
For Signal Users:
Use the signal_desktop_extractor.py script if you have Signal Desktop installed.
- Download and install Python from python.org (check "Add Python to PATH")
- Download this toolkit as a ZIP from GitHub and extract it
- Open your terminal and navigate to the folder:
cd Downloads/Communication-Analysis-Toolkit - Install dependencies:
pip install -r requirements.txt
- Create
cases/my_case/source_data/and copy your data file there - Create
cases/my_case/config.json(see Quick Start for the template)
python -m engine.cli --config cases/my_case/config.jsonA consent notice will appear on first run. Type y to proceed.
Open cases/my_case/output/:
- TIMELINE.md — Day-by-day view with flagged incidents
- ANALYSIS.md — The numbers: message counts, patterns, ratios
- EVIDENCE.md — Every flagged message with severity and classification
MIT License. For personal and educational use. Not a substitute for professional legal or psychological advice.