English | 日本語
A personal AI that quietly watches your day-to-day life, remembers everything, and helps you understand how you spend your time.
Prerequisites: Python 3.12+, Node.js 22+, uv, and a Gemini API key. Don't have these yet? See the full setup guide for installation instructions.
Windows (PowerShell) — 5 min
# 1. Clone and install
git clone https://github.com/Andyyyy64/david.git
cd david
uv sync
cd web; npm install; cd ..
# 2. Set your API key
"GEMINI_API_KEY=your-key-here" | Out-File -Encoding utf8 .env
# 3. Launch the desktop app
cd web; npm run electron:startPermissions: When prompted, allow Camera and Microphone access in Settings → Privacy & Security.
macOS (Terminal) — 5 min
# 1. Clone and install
git clone https://github.com/Andyyyy64/david.git
cd david
uv sync
cd web && npm install && cd ..
# 2. Set your API key
echo "GEMINI_API_KEY=your-key-here" > .env
# 3. Launch the desktop app
cd web && npm run electron:startPermissions: Grant Camera, Microphone, Screen Recording, and Accessibility access for your terminal in System Settings → Privacy & Security. See the macOS permission guide for details.
Linux / WSL2
git clone https://github.com/Andyyyy64/david.git
cd david
uv sync
cd web && npm install && cd ..
echo "GEMINI_API_KEY=your-key-here" > .env
# Start daemon + web UI
./start.sh
# Open http://localhost:3001For WSL2 camera setup (usbipd), see the full guide.
Verify it's working:
life look # Capture + analyze a single frame
life status # Check daemon is runningOpen http://localhost:3001 — you should see the timeline with your first captured frame.
- Vision
- Features
- Architecture
- Project Structure
- Setup — Requirements, Configuration, Docker
- CLI Commands
- Configuration Reference
- Web API
- Database Schema
- Tech Stack
"Monitor, manage, and analyze your life."
Three pillars:
- Monitoring — Continuous, automatic recording of your day. Camera, screen, audio, and app focus — all captured without any manual input.
- Management — Instantly answer "what was I doing then?" An externalized, searchable memory that replaces journaling.
- Analysis — See "how focused was I?" and "where did my time go?" in concrete patterns — daily, weekly, and monthly.
- Interval capture — Webcam + screen + audio captured every 30 seconds (configurable). Between ticks, change detection checks every 1 second and saves extra frames/screenshots when significant visual changes occur (screen: 10% threshold, camera: 15% perceptual hash difference).
- Foreground window tracking — Persistent PowerShell process monitors app focus changes every 500ms via Win32 P/Invoke (
GetForegroundWindow), recording process name and window title for precise per-app duration tracking. - Presence detection — Haar cascade face detection + MOG2 motion analysis with hysteresis state machine (present → absent → sleeping). Requires 3 consecutive ticks without face before transitioning. Sleep state detected by low brightness during configured night hours.
- Audio capture & transcription — ALSA recording with auto-device detection. Silence trimming (500 amplitude threshold, min 0.3s voice) keeps only meaningful audio. Transcription via LLM with user context awareness.
- Live feed — MJPEG streaming server on port 3002 at ~30fps, independent of the main capture interval.
- Frame analysis — Each tick sends camera image + screen capture + audio + foreground window info to LLM (Gemini or Claude). Returns structured JSON with activity category and natural language description.
- Activity classification — LLM freely generates activity category names. Existing categories are shown as examples for consistency, and new categories are accepted and registered automatically. Fuzzy matching (LCS similarity ≥ 0.7) normalizes variants to existing categories. All activity → meta-category mappings are stored in the
activity_mappingsDB table. - Meta-categories — Activities are dynamically mapped to 6 meta-categories for productivity scoring: focus, communication, entertainment, browsing, break, idle. The LLM outputs the meta-category alongside each activity. The mapping is stored in DB and served to the frontend via API.
- Multi-scale summaries — Hierarchical generation: 10m (from raw frames) → 30m → 1h → 6h → 12h → 24h (includes keyframe images + transcriptions + improvement suggestions). Each scale builds from the one below.
- Daily reports — Auto-generated on day change. Includes activity breakdown, timeline narrative, focus percentage (focus frames / active frames), and event list. Delivered via webhook.
- Context awareness — User profile (
data/context.md) and recent 5-frame history included in every LLM prompt for continuity.
- Timeline — Frames grouped by hour, sized by motion score, colored by activity meta-category. Keyboard navigation (arrow keys) and scroll-wheel frame switching.
- Detail panel — Camera image (click to expand), screen captures (main + change-detected extras with thumbnail strip), audio player with transcription, foreground window info, and all metadata.
- Summary panel — Browse summaries by scale (10m–24h) with expand/collapse. Click a summary to highlight its time range on the timeline.
- Dashboard — Focus score %, pie chart by meta-category, activity list with duration bars, top 10 app usage with switch counts, weekly stacked bar chart, gantt-style session timeline.
- Search — FTS5 trigram full-text search across frame descriptions, transcriptions, activities, window titles, and summaries. Click results to jump to date/frame.
- Activity heatmap — Frames-per-hour intensity visualization across 24 hours.
- Live feed — Real-time MJPEG stream with LIVE/OFFLINE indicator, expandable to full-screen modal.
- Mobile — Responsive layout with tab switching (Summaries / Timeline / Detail) on narrow viewports.
- Auto-refresh — 30-second polling for new frames, summaries, and events when viewing today's data.
Collects conversations from external chat platforms to enrich the "externalized memory." Knowing what you were discussing — and with whom — adds a critical dimension to daily activity tracking.
Architecture: Adapter pattern with a unified ChatSource interface. Each platform has its own adapter; users enable only what they use via life.toml.
| Platform | Status | Method | DMs | Servers/Groups |
|---|---|---|---|---|
| Discord | Implemented | REST API polling (user token) | Yes | Yes |
| LINE | Planned | Chat export import | Yes | Yes |
| Slack | Planned | Bot token + Events API | — | Yes |
| Telegram | Planned | Bot API / TDLib | Yes | Yes |
| Planned | Chat export import | Yes | Yes | |
| Teams | Planned | Microsoft Graph API | Yes | Yes |
How it works:
- Platform adapter polls for new messages in a background thread
- Messages are stored in
chat_messagestable with unified schema (platform, channel, author, content, timestamp) - Recent conversations are injected into LLM prompts — frame analysis sees "user was discussing X on Discord" alongside screen/camera data
- Daily reports include a chat activity summary (message counts per channel)
Discord specifics: Uses a user token for full access (servers + DMs + group DMs). On first run, backfills past N months of history (backfill_months, default 3) by paginating backwards through all channels. Thereafter, polls every 60s (configurable) for new messages, comparing last_message_id per channel to fetch only deltas. Handles rate limiting with automatic retry.
- Discord — Daily reports via webhook embed (4000 char limit, purple accent)
- LINE Notify — Daily reports via LINE API (1000 char limit)
- Test with
life notify-test
daemon/ (Python) web/ (Node.js/Hono) frontend (React)
├─ Camera capture ├─ REST API ├─ Timeline view
├─ Screen capture ├─ SQLite read-only ├─ Frame detail
├─ Audio capture ├─ Media serving ├─ Summary panel
├─ Window monitor ├─ MJPEG proxy ├─ Live feed
├─ Presence detection └─ Static file serving ├─ Dashboard
├─ LLM analysis ├─ Search
├─ Summary generation ├─ Activity heatmap
├─ Report generation └─ Mobile responsive
├─ Chat integration
├─ Change detection
├─ SQLite write
└─ MJPEG live server (port 3002)
- Daemon writes to SQLite, web reads it (WAL mode for concurrent access)
- Window monitor runs a persistent PowerShell process with its own SQLite connection
- Shared
data/directory:frames/,screens/,audio/,life.db - LLM provider abstracted: Gemini or Claude, configured in
life.toml
| Thread | Purpose | Rate |
|---|---|---|
| Main loop | Capture + analysis + summaries | Every 30s (configurable) |
| Live feed | Webcam → MJPEG stream | ~30fps |
| Audio recording | ALSA capture during interval | Per tick |
| Window monitor | PowerShell → window_events table |
500ms polls |
| Change detection | Screen/camera hash comparison | Every 1s between ticks |
| Chat poller | Discord/etc → chat_messages table |
Every 60s (configurable) |
| Live HTTP server | Serve MJPEG to clients | On demand |
Click to expand
daemon/ # Python package
├─ cli.py # CLI entry point (Click)
├─ daemon.py # Main observer loop
├─ config.py # TOML config loading
├─ analyzer.py # Frame analysis + summary generation
├─ activity.py # ActivityManager: DB-backed normalization + meta-category mapping
├─ report.py # Daily report generation
├─ notify.py # Discord / LINE webhook notifications
├─ live.py # MJPEG streaming server
├─ chat/ # Chat platform integration
│ ├─ base.py # Abstract ChatSource interface
│ ├─ discord.py # Discord adapter (user token, REST polling)
│ └─ manager.py # ChatManager: orchestrates adapters
├─ llm/ # LLM provider abstraction
│ ├─ base.py # Abstract base class
│ ├─ gemini.py # Google Gemini (image + audio support)
│ └─ claude.py # Anthropic Claude (via CLI)
├─ capture/ # Data capture modules
│ ├─ camera.py # Webcam (V4L2 / MJPEG)
│ ├─ screen.py # Screen capture (PowerShell)
│ ├─ audio.py # Audio recording (ALSA)
│ ├─ window.py # Foreground window monitor (PowerShell + Win32)
│ └─ frame_store.py # JPEG file storage
├─ analysis/ # Local analysis (no LLM)
│ ├─ motion.py # MOG2 background subtraction
│ ├─ scene.py # Brightness classification
│ ├─ change.py # Perceptual hash change detection
│ ├─ presence.py # Face detection + state machine
│ └─ transcribe.py # Audio → text via LLM
├─ summary/ # Summary formatting
│ ├─ formatter.py # CLI output formatting
│ └─ timeline.py # Timeline data builder
├─ claude/ # Claude-specific features
│ ├─ analyzer.py # Review analysis
│ └─ review.py # Daily review package generator
└─ storage/ # Database layer
├─ database.py # SQLite schema, migrations, queries
└─ models.py # Frame, Event, Summary, Report dataclasses
web/ # Node.js web application
├─ server/
│ ├─ index.ts # Hono app setup, media serving
│ ├─ db.ts # SQLite connection (read-only)
│ └─ routes/ # API route handlers
└─ src/
├─ App.tsx # Main SPA orchestrator
├─ components/ # React components
├─ hooks/ # Data fetching with 30s polling
└─ lib/ # API client, types, activity module, utilities
data/ # Runtime data (gitignored)
├─ frames/ # Camera JPEGs (YYYY-MM-DD/*.jpg)
├─ screens/ # Screen PNGs (YYYY-MM-DD/*.png)
├─ audio/ # Audio WAVs (YYYY-MM-DD/*.wav)
├─ live/ # Current MJPEG stream frame
├─ context.md # User profile for LLM context
├─ life.db # SQLite database (WAL mode)
└─ life.pid # Daemon PID file
See getting-started.md for full platform-specific instructions (日本語版).
| Platform | Guide |
|---|---|
| Windows (Native) | getting-started.md#windows-native |
| Windows (WSL2) | getting-started.md#windows-wsl2 |
| Mac | getting-started.md#mac |
| Windows (Native) | Windows (WSL2) | Mac | |
|---|---|---|---|
| Python | 3.12+ (Windows) | 3.12+ (in WSL2) | 3.12+ |
| Node.js | 22+ (Windows) | 22+ (in WSL2) | 22+ |
| Camera | Built-in / USB (DirectShow) | External USB via usbipd | Built-in |
| Microphone | Built-in / USB (WASAPI) | External USB via usbipd | Built-in |
| Screen capture | PowerShell + Windows Forms | PowerShell + Windows Forms | screencapture (built-in) |
| Window tracking | PowerShell + Win32 API | PowerShell + Win32 API | osascript (built-in) |
| Gemini API key | Required | Required | Required |
All settings live in life.toml (created automatically on first launch, or create manually):
[llm]
provider = "gemini"
gemini_model = "gemini-2.5-flash"
[capture]
interval_sec = 30
[presence]
enabled = trueTip: Create data/context.md with your name, occupation, and habits — the AI uses this for more accurate activity descriptions.
See the full configuration reference below for all options.
docker compose upFor camera/audio device passthrough, configure docker-compose.override.yml. See Docker setup.
| Command | Description |
|---|---|
life start [-d] |
Start the observer daemon (-d for background) |
life stop |
Stop the running daemon |
life status |
Show status (frame count, summaries, disk usage) |
life capture |
Capture a single test frame |
life look |
Capture and analyze a frame immediately |
life recent [-n 5] |
Show recent frame analyses |
life today [DATE] |
Show timeline for the day |
life stats [DATE] |
Show daily statistics |
life summaries [DATE] [--scale 1h] |
Show summaries (10m/30m/1h/6h/12h/24h) |
life events [DATE] |
List detected events |
life report [DATE] |
Generate daily diary report |
life review [DATE] [--json] |
Generate review package |
life notify-test |
Test webhook notification |
All options in life.toml:
data_dir = "data"
[capture]
device = 0 # camera device ID (/dev/videoN)
interval_sec = 30 # capture interval (seconds)
width = 640
height = 480
jpeg_quality = 85
audio_device = "" # ALSA device (empty = auto-detect)
audio_sample_rate = 44100
[analysis]
motion_threshold = 0.02 # MOG2 foreground pixel ratio
brightness_dark = 40.0 # below = DARK scene
brightness_bright = 180.0 # above = BRIGHT scene
[llm]
provider = "gemini" # "gemini" or "claude"
claude_model = "haiku"
gemini_model = "gemini-2.5-flash"
[presence]
enabled = true
absent_threshold_ticks = 3 # ticks before absent state
sleep_start_hour = 23
sleep_end_hour = 8
[notify]
provider = "discord" # "discord" or "line"
webhook_url = ""
enabled = false
[chat]
enabled = false # master switch for chat integration
[chat.discord]
enabled = false
user_token = "" # Discord user token
user_id = "" # Your Discord user ID
poll_interval = 60 # seconds between polls
backfill_months = 3 # fetch past N months on first run (0 = skip)| Endpoint | Description |
|---|---|
GET /api/frames?date=YYYY-MM-DD |
List frames for a date |
GET /api/frames/latest |
Get latest frame |
GET /api/frames/:id |
Get frame by ID |
GET /api/summaries?date=...&scale=... |
List summaries |
GET /api/events?date=... |
List events |
GET /api/stats?date=... |
Daily statistics (counts, averages, hourly activity) |
GET /api/stats/activities?date=... |
Activity breakdown with duration and hourly detail |
GET /api/stats/apps?date=... |
App usage from window events (duration, switch count) |
GET /api/stats/dates |
List dates with data |
GET /api/stats/range?from=...&to=... |
Per-day stats with meta-category breakdown |
GET /api/sessions?date=... |
Activity sessions (consecutive frame grouping) |
GET /api/reports?date=... |
Get daily report |
GET /api/reports |
List recent reports |
GET /api/activities |
List activity categories with meta-categories |
GET /api/activities/mappings |
Activity → meta-category mapping table |
GET /api/search?q=...&from=...&to=... |
Full-text search (frames + summaries) |
GET /api/export/frames?date=...&format=csv|json |
Export frames as CSV or JSON |
GET /api/export/summaries?from=...&to=...&format=csv|json |
Export summaries |
GET /api/export/report?date=... |
Export daily report as JSON |
GET /api/live/stream |
MJPEG stream proxy |
GET /api/live/frame |
Single JPEG snapshot |
GET /health |
Health check |
GET /media/{path} |
Serve image/audio files |
Click to expand
Core capture data: timestamp, camera path, screen path, extra screen paths, audio path, transcription, brightness, motion score, scene type, LLM description, activity category, foreground window.
Focus change events recorded by the window monitor: timestamp, process name, window title. Used for precise app usage duration calculation via LEAD() window function.
Multi-scale summaries (10m to 24h) with timestamp, scale, content, and frame count.
Detected events: scene changes, motion spikes, presence state changes. Linked to source frame.
Dynamic activity → meta-category mapping. Primary key is activity name, with meta_category, first_seen timestamp, and frame_count. Seeded from existing frame data on first migration. Updated automatically as the LLM generates new activities.
Daily auto-generated reports with content, frame count, and focus percentage.
Messages collected from chat platforms: platform, platform-specific message ID, channel/guild info, author, is_self flag, content, timestamp, and JSON metadata for attachments/embeds. Unique constraint on (platform, platform_message_id).
Daily user memos: date (primary key), content, updated_at. Editable only for today, read-only for past dates.
frames_fts (trigram) over description, transcription, activity, foreground_window. summaries_fts (trigram) over content.
- Daemon: Python 3.12 / Click / OpenCV / SQLite (WAL mode)
- LLM: Google Gemini (image + audio) / Anthropic Claude (via CLI)
- Window tracking: PowerShell / Win32 P/Invoke (
GetForegroundWindow) - Web server: Hono 4 / better-sqlite3 / Node.js 22
- Frontend: React 19 / TypeScript / Vite 6
- Infra: Docker Compose / WSL2