diff --git a/apps/docs/content/docs/concepts/local-storage.mdx b/apps/docs/content/docs/concepts/local-storage.mdx new file mode 100644 index 0000000..70e860c --- /dev/null +++ b/apps/docs/content/docs/concepts/local-storage.mdx @@ -0,0 +1,140 @@ +--- +title: Local Storage +description: How GitMem uses the local filesystem, what persists across sessions, and container deployment considerations. +--- + +import { Callout } from 'fumadocs-ui/components/callout' + +# Local Storage + +GitMem writes to the local filesystem for session state, caching, and free-tier data storage. This page maps exactly what lives where so you can make informed decisions about persistence — especially in containers. + +## Storage Locations + +| Location | What | Owner | +|----------|------|-------| +| `/.gitmem/` | Session state, threads, config, caches | GitMem MCP server | +| `~/.cache/gitmem/` | Search result cache (15-min TTL) | GitMem MCP server | + +## File Inventory + +``` +.gitmem/ ++-- active-sessions.json # Process lifecycle ++-- config.json # Project defaults ++-- sessions.json # Recent session index (free tier SOT) ++-- threads.json # Thread state cache / free tier SOT ++-- suggested-threads.json # AI-suggested threads ++-- closing-payload.json # (ephemeral -- deleted after use) ++-- cache/ +| +-- hook-scars.json # Local scar copy for hooks plugin ++-- hooks-state/ +| +-- start_time # Session start timestamp +| +-- tool_call_count # Recall nag counter +| +-- last_nag_time # Last recall reminder time +| +-- stop_hook_active # Lock file (re-entrancy guard) +| +-- audit.jsonl # Hook execution log ++-- sessions/ + +-- / + +-- session.json # Per-session state (scars, confirmations) +``` + +**Total typical footprint: ~530KB** (dominated by `cache/hook-scars.json`). + +## File Lifecycle + +| File | Created | Survives Session Close? | +|------|---------|------------------------| +| `active-sessions.json` | `session_start` | Yes — multi-session registry | +| `config.json` | First `session_start` | Yes | +| `sessions.json` | `session_close` (free tier) | Yes | +| `threads.json` | `session_close` | Yes | +| `suggested-threads.json` | `session_close` | Yes | +| `closing-payload.json` | Agent writes before close | **No** — ephemeral | +| `cache/hook-scars.json` | Hooks plugin startup | Yes | +| `sessions//session.json` | `session_start` | **No** — cleaned up on close | + +## Cross-Session Data Flow + +### What `session_start` loads + +| Data | Pro/Dev Source | Free Source | +|------|---------------|-------------| +| Last session (decisions, reflection) | Supabase `sessions` | `.gitmem/sessions.json` | +| Open threads | Supabase `threads` | `.gitmem/threads.json` | +| Recent decisions | Supabase `decisions` | `.gitmem/sessions.json` (embedded) | +| Scars for recall | Supabase `learnings` | `.gitmem/learnings.json` | +| Suggested threads | `.gitmem/suggested-threads.json` | `.gitmem/suggested-threads.json` | + +### What `recall` searches + +| Tier | Source | Search Method | +|------|--------|---------------| +| Pro/Dev | Supabase `learnings` | Semantic (embedding cosine similarity) | +| Pro/Dev (cached) | `~/.cache/gitmem/results/` | Local vector search (15-min TTL) | +| Free | `.gitmem/learnings.json` | Keyword tokenization match | + +### What `session_close` persists + +| Data | Pro/Dev Destination | Free Destination | +|------|--------------------|--------------------| +| Session record | Supabase `sessions` | `.gitmem/sessions.json` | +| New learnings | Supabase `learnings` | `.gitmem/learnings.json` | +| Decisions | Supabase `decisions` | `.gitmem/decisions.json` | +| Thread state | Supabase `threads` + local | `.gitmem/threads.json` | +| Scar usage | Supabase `scar_usage` | `.gitmem/scar_usage.json` | +| Transcript | Supabase storage bucket | Not captured | + +## Container Deployments + +### Ephemeral container per session + +``` +Container A (session 1) -> writes .gitmem/ -> container destroyed +Container B (session 2) -> fresh .gitmem/ -> no history +``` + +| Tier | Cross-Session Memory | What Breaks | +|------|---------------------|-------------| +| **Pro/Dev** | **Works** — Supabase is SOT | Hooks plugin cold-starts each time. Suggested threads lost. Minor UX friction, no data loss. | +| **Free** | **Completely broken** — all memory is local files | No scars, no threads, no session history. Each session is amnesic. | + +### Persistent volume mount + +```bash +docker run -v gitmem-data:/app/.gitmem ... +``` + +Both tiers work. Free tier: local files ARE the SOT. Pro tier: local files are caches, Supabase is SOT. + +### Shared container (long-running) + +Container stays alive across multiple `claude` invocations. Both tiers work. `.gitmem/` persists because the container persists. + +## Recommendations + +### Free tier + +Mount a volume for `.gitmem/`: + +```yaml +volumes: + - gitmem-state:/workspace/.gitmem +``` + +Files that MUST persist: `learnings.json`, `threads.json`, `sessions.json`, `decisions.json`. + +### Pro/Dev tier + +**Nothing required.** Supabase is the source of truth. A fresh `.gitmem/` each session works — just slightly slower (cache cold start). + +Optional for better UX: + +```yaml +volumes: + - gitmem-cache:/workspace/.gitmem/cache # Avoids scar cache re-download +``` + + +`active-sessions.json` tracks process lifecycle (PIDs, hostnames) — inherently local. `sessions//session.json` survives context compaction when the LLM loses state. `cache/hook-scars.json` is needed by shell-based hooks that can't call Supabase directly. `closing-payload.json` avoids MCP tool call size limits. + diff --git a/apps/docs/content/docs/concepts/meta.json b/apps/docs/content/docs/concepts/meta.json index c7bd526..0cd72d3 100644 --- a/apps/docs/content/docs/concepts/meta.json +++ b/apps/docs/content/docs/concepts/meta.json @@ -1,4 +1,4 @@ { "title": "Concepts", - "pages": ["index", "scars", "sessions", "threads", "learning-types", "tiers"] + "pages": ["index", "scars", "sessions", "threads", "learning-types", "tiers", "local-storage"] } diff --git a/apps/docs/content/docs/concepts/threads.mdx b/apps/docs/content/docs/concepts/threads.mdx index e4a234d..32da193 100644 --- a/apps/docs/content/docs/concepts/threads.mdx +++ b/apps/docs/content/docs/concepts/threads.mdx @@ -1,11 +1,19 @@ --- title: Threads -description: Track unresolved work across sessions with GitMem threads. +description: Track unresolved work across sessions with lifecycle management, vitality scoring, and semantic deduplication. --- +import { Callout } from 'fumadocs-ui/components/callout' + # Threads -**Threads** track unresolved work that carries across sessions. When you can't finish something in the current session, create a thread so the next session picks it up. +**Threads** are persistent work items that carry across sessions. They track what's unresolved, what's blocked, and what needs follow-up — surviving session boundaries so nothing gets lost. + +## Why Threads Exist + +Sessions end, but work doesn't. Before threads, open items lived as plain strings inside session records. They had no IDs, no lifecycle, no way to mark something as done. You'd see the same stale item surfaced session after session with no way to clear it. + +Threads give open items identity (`t-XXXXXXXX`), lifecycle status, vitality scoring, and a resolution trail. ## Creating Threads @@ -13,44 +21,185 @@ description: Track unresolved work across sessions with GitMem threads. create_thread({ text: "Auth middleware needs rate limiting before production deploy" }) ``` -Threads include: -- A unique thread ID (e.g., `t-a1b2c3d4`) -- Description text -- Creation timestamp -- Optional Linear issue link +Threads are created in three ways: -## Semantic Deduplication - -GitMem uses cosine similarity (threshold > 0.85) to prevent duplicate threads. If you try to create a thread that's semantically identical to an existing one, GitMem returns the existing thread instead. +1. **Explicitly** via `create_thread` — mid-session when you identify a new open item +2. **Implicitly** via `session_close` — when the closing payload includes `open_threads` +3. **Promoted** from a suggestion via `promote_suggestion` — when a recurring topic is confirmed ## Thread Lifecycle +Threads progress through a 5-stage state machine based on vitality scoring and age: + ``` -create → surface at session_start → resolve +create_thread / session_close payload + | + v + [ EMERGING ] -- first 24 hours, high visibility + | + v (age > 24h) + [ ACTIVE ] -- vitality > 0.5, actively referenced + | + v (vitality decays) + [ COOLING ] -- 0.2 <= vitality <= 0.5, fading from use + | + v (vitality < 0.2) + [ DORMANT ] -- vitality < 0.2, no recent touches + | + v (dormant 30+ days) + [ ARCHIVED ] -- auto-archived, hidden from session_start + +Any state --(explicit resolve_thread)--> [ RESOLVED ] ``` -1. **Create** — `create_thread` during a session -2. **Surface** — Open threads appear in the next `session_start` banner -3. **Resolve** — `resolve_thread` with a resolution note when complete +### Transitions -## Managing Threads +| Transition | Condition | +|-----------|-----------| +| any -> emerging | Thread age < 24 hours | +| emerging -> active | Thread age >= 24 hours, vitality > 0.5 | +| active -> cooling | Vitality drops to [0.2, 0.5] | +| cooling -> active | Touch refreshes vitality above 0.5 | +| cooling -> dormant | Vitality drops below 0.2 | +| dormant -> active | Touch refreshes vitality above 0.5 | +| dormant -> archived | Dormant for 30+ consecutive days | +| any -> resolved | Explicit `resolve_thread` call | -| Tool | Purpose | -|------|---------| -| `list_threads` | See all open threads | -| `resolve_thread` | Mark a thread as done | -| `cleanup_threads` | Triage by health (active/cooling/dormant) | +**Terminal states:** Archived and resolved threads do not transition. To reopen an archived topic, create a new thread. -### Thread Health +## Vitality Scoring -`cleanup_threads` categorizes threads by vitality: +Every thread has a vitality score (0.0 to 1.0) computed from two components: -- **Active** — Recently created or referenced -- **Cooling** — Not referenced in a while -- **Dormant** — Untouched for 30+ days (auto-archivable) +``` +vitality = 0.55 * recency + 0.45 * frequency +``` -### Suggested Threads +### Recency -`session_start` may suggest threads based on session context. You can: -- **Promote** — `promote_suggestion` converts it to a real thread -- **Dismiss** — `dismiss_suggestion` suppresses it (3 dismissals = permanent suppression) +Exponential decay based on thread class half-life: + +``` +recency = e^(-ln(2) * days_since_touch / half_life) +``` + +| Thread Class | Half-Life | Use Case | +|-------------|-----------|----------| +| operational | 3 days | Deploys, fixes, incidents, blockers | +| backlog | 21 days | Research, long-running improvements | + +Thread class is auto-detected from keywords in the thread text ("deploy", "fix", "debug", "hotfix", "urgent", "broken", "incident", "blocker" = operational). + +### Frequency + +Log-scaled touch count normalized against thread age: + +``` +frequency = min(log(touch_count + 1) / log(days_alive + 1), 1.0) +``` + +### Status Thresholds + +| Vitality Score | Status | +|---------------|--------| +| > 0.5 | active | +| 0.2 - 0.5 | cooling | +| < 0.2 | dormant | + +Threads touched during a session have their `touch_count` incremented and `last_touched_at` refreshed, which revives decayed vitality. + +## Carry-Forward + +On `session_start`, open threads appear with vitality info: + +``` +Open threads (3): + t-abc12345: Fix auth timeout [ACTIVE 0.82] (operational, 2d ago) + t-def67890: Improve test coverage [COOLING 0.35] (backlog, 12d ago) + t-ghi11111: New thread just created [EMERGING 0.95] (backlog, today) +``` + +## Resolution + +Threads are resolved via `resolve_thread`: +- **By ID** (preferred): `resolve_thread({ thread_id: "t-a1b2c3d4" })` +- **By text match** (fallback): `resolve_thread({ text_match: "package name" })` + +Resolution records a timestamp, the resolving session, and an optional note. Knowledge graph triples are written to track the resolution relationship. + +## Semantic Deduplication + +When `create_thread` is called, the new thread text is compared against all open threads using embedding cosine similarity before creation. + +| Threshold | Value | Meaning | +|-----------|-------|---------| +| Dedup similarity | 0.85 | Above this = duplicate | + +**Dedup methods** (in priority order): +1. **Embedding-based** — cosine similarity of text embeddings (when Supabase available) +2. **Text normalization fallback** — exact match after lowercasing, stripping punctuation, collapsing whitespace + +When a duplicate is detected, the existing thread is returned (with `deduplicated: true`) and touched to keep it vital. + +## Suggested Threads + +At `session_close`, session embeddings are compared to detect recurring topics that should become threads. + +### Detection Algorithm + +1. Compare current session embedding against the last 20 sessions (30-day window) +2. Find sessions with cosine similarity >= 0.70 +3. If 3+ sessions cluster (current + 2 historical): + - Check if an open thread already covers the topic (>= 0.80) -> skip + - Check if a pending suggestion already matches (>= 0.80) -> add evidence + - Otherwise, create a new suggestion + +Suggestions appear at `session_start`: + +``` +Suggested threads (2) -- recurring topics not yet tracked: + ts-a1b2c3d4: Recurring auth timeout pattern (3 sessions) + ts-e5f6g7h8: Build performance regression (4 sessions) + Use promote_suggestion or dismiss_suggestion to manage. +``` + +| Action | Tool | Effect | +|--------|------|--------| +| Promote | `promote_suggestion` | Converts to a real thread | +| Dismiss | `dismiss_suggestion` | Suppresses (3x = permanent) | + +## Knowledge Graph Integration + +Thread creation and resolution generate knowledge graph triples: + +| Predicate | Subject | Object | When | +|-----------|---------|--------|------| +| `created_thread` | Session | Thread | Thread created | +| `resolves_thread` | Session | Thread | Thread resolved | +| `relates_to_thread` | Thread | Issue | Thread linked to Linear issue | + +Use `graph_traverse` to query these relationships with 4 lenses: `connected_to`, `produced_by`, `provenance`, `stats`. + +## Managing Threads + +| Tool | Purpose | +|------|---------| +| [`create_thread`](/docs/tools/create-thread) | Create a new open thread | +| [`resolve_thread`](/docs/tools/resolve-thread) | Mark a thread as done | +| [`list_threads`](/docs/tools/list-threads) | See all open threads | +| [`cleanup_threads`](/docs/tools/cleanup-threads) | Triage by health (active/cooling/dormant) | +| [`promote_suggestion`](/docs/tools/promote-suggestion) | Convert suggestion to real thread | +| [`dismiss_suggestion`](/docs/tools/dismiss-suggestion) | Suppress a suggestion | + +## Storage + +| Location | Purpose | Tier | +|----------|---------|------| +| `.gitmem/threads.json` | Runtime cache / free tier SOT | All | +| `.gitmem/suggested-threads.json` | Pending suggestions | All | +| Supabase `threads` table | Source of truth (full vitality, lifecycle, embeddings) | Pro/Dev | +| Supabase `sessions.open_threads` | Legacy fallback | Pro/Dev | + + +On free tier, `.gitmem/threads.json` IS the source of truth. On pro/dev tier, it's a cache — Supabase is authoritative. + diff --git a/apps/docs/content/docs/contributing/compliance.mdx b/apps/docs/content/docs/contributing/compliance.mdx new file mode 100644 index 0000000..6f604c4 --- /dev/null +++ b/apps/docs/content/docs/contributing/compliance.mdx @@ -0,0 +1,118 @@ +--- +title: MCP Protocol Compliance +description: Full MCP protocol compliance report — 36/36 tests passing. +--- + +import { Callout } from 'fumadocs-ui/components/callout' + +# MCP Protocol Compliance + + +GitMem passes all MCP protocol compliance tests as of v1.0.3. + + +## Running the Suite + +```bash +GITMEM_TIER=free node tests/compliance/mcp-protocol-compliance.mjs +``` + +The compliance test script spawns the MCP server as a child process via STDIO, performs the full JSON-RPC 2.0 handshake, validates tool schemas, executes tool calls, and tests error handling. + +## Results Summary + +| Category | Tests | Result | +|----------|-------|--------| +| Protocol Handshake | 9 | 9/9 | +| Tool Listing | 4 | 4/4 | +| Schema Validation | 3 | 3/3 | +| Tool Execution | 10 | 10/10 | +| Error Handling | 4 | 4/4 | +| Response Format | 6 | 6/6 | +| **Total** | **36** | **36/36** | + +## Category Details + +### 1. Protocol Handshake (9/9) + +Tests JSON-RPC 2.0 `initialize` method and `notifications/initialized` lifecycle. + +| Test | Result | +|------|--------| +| initialize returns result | PASS | +| has protocolVersion | PASS | +| protocolVersion is string | PASS | +| has serverInfo | PASS | +| serverInfo.name exists | PASS | +| serverInfo.version exists | PASS | +| has capabilities | PASS | +| capabilities.tools exists | PASS | +| initialized notification accepted | PASS | + +### 2. Tool Listing (4/4) + +| Test | Result | +|------|--------| +| tools/list returns result | PASS | +| result has tools array | PASS | +| at least 1 tool registered | PASS | +| tool count (21) is reasonable (5-100) | PASS | + +21 tools in free tier. Pro tier exposes 67, dev tier 73. + +### 3. Tool Schema Validation (3/3) + +Every tool's `inputSchema` validated against JSON Schema and MCP spec requirements. + +| Test | Result | +|------|--------| +| all tool schemas valid (type, required, property types, descriptions) | PASS | +| all descriptions >= 30 chars | PASS | +| no duplicate tool names | PASS | + +### 4. Tool Execution (10/10) + +Live tool calls via MCP STDIO transport. + +| Test | Result | +|------|--------| +| gitmem-help returns result | PASS | +| result has content array | PASS | +| content[0].type === "text" | PASS | +| content[0].text is non-empty | PASS | +| search returns result | PASS | +| search result has content | PASS | +| search content is text type | PASS | +| recall returns result | PASS | +| recall has content array | PASS | +| log returns result | PASS | + +### 5. Error Handling (4/4) + +| Test | Result | +|------|--------| +| unknown tool returns error | PASS | +| unknown method returns JSON-RPC error | PASS | +| error has numeric code | PASS | +| error code is -32601 (Method not found) | PASS | + +### 6. Response Format Compliance (6/6) + +| Test | Result | +|------|--------| +| all responses include jsonrpc: "2.0" | PASS | +| all responses include matching id | PASS | +| content block has type field | PASS | +| text block has text field | PASS | +| successful calls have isError=false or undefined | PASS | +| resources/list returns -32601 (not implemented) | PASS | + +## Optional Features Not Implemented + +These are valid omissions — the MCP spec does not require servers to implement all capabilities: + +| Feature | Status | Reason | +|---------|--------|--------| +| `resources/list` | Not implemented (-32601) | No resources exposed; tools-only server | +| `prompts/list` | Not implemented (-32601) | No prompt templates; tool-driven UX | +| `resources/templates/list` | Not implemented | No dynamic resources | diff --git a/apps/docs/content/docs/contributing/meta.json b/apps/docs/content/docs/contributing/meta.json index eb274b3..49e4a92 100644 --- a/apps/docs/content/docs/contributing/meta.json +++ b/apps/docs/content/docs/contributing/meta.json @@ -1,4 +1,4 @@ { "title": "Contributing", - "pages": ["index", "testing"] + "pages": ["index", "testing", "compliance"] } diff --git a/apps/docs/content/docs/contributing/testing.mdx b/apps/docs/content/docs/contributing/testing.mdx index 8e14fcb..4e205d8 100644 --- a/apps/docs/content/docs/contributing/testing.mdx +++ b/apps/docs/content/docs/contributing/testing.mdx @@ -1,85 +1,328 @@ --- title: Testing -description: GitMem's test pyramid and how to run each tier. +description: GitMem's 6-tier test pyramid, CI pipeline, and how to run each tier. --- +import { Tabs, Tab } from 'fumadocs-ui/components/tabs' +import { Callout } from 'fumadocs-ui/components/callout' + # Testing -GitMem uses a 6-tier test pyramid. All runnable tests must pass before shipping. +GitMem uses a 6-tier testing pyramid. Each tier adds cost/time but tests closer to the real user experience. -## Test Tiers +## Test Pyramid -### Tier 1: Unit Tests +| Tier | Command | Tests | Speed | Cost | What it tests | +|------|---------|-------|-------|------|---------------| +| **1 - Unit** | `npm run test:unit` | 597 | ~3s | Free | Schema validation, pure functions, golden regressions | +| **2 - Smoke** | `npm run test:smoke` | 9 | ~5s | Free | MCP server boot, tool registration, basic tool calls via stdio | +| **3 - Integration** | `npm run test:integration` | 63 | ~30s | Free (Docker) | Real PostgreSQL, session lifecycle, cache behavior, query plans | +| **4 - E2E** | `npm run test:e2e` | 68 | ~90s | Free (Docker for pro) | CLI install flow, hooks, free/pro tier MCP via stdio | +| **5 - User Journey** | `npm run test:e2e -- tests/e2e/user-journey.test.ts` | 6 | ~60s | API calls | Real Claude session via Agent SDK | +| **6 - Performance** | `npm run test:perf` | benchmarks | ~30s | Free | Cold start, recall latency, cache hit rate microbenchmarks | -```bash -npm run test:unit -``` +**Run all:** `npm run test:all` (runs tiers 1-4 + 6; excludes user-journey) -- **Count:** 660+ tests -- **Speed:** ~3 seconds -- **Dependencies:** None -- **Coverage:** Schema validation, pure functions, golden regressions + +Always run `npm run test:unit` at minimum. Before shipping to npm, run tiers 1-5. Tier 5 (User Journey) is the most important gate. + -### Tier 2: Smoke Tests +## Quick Run (No Docker) + +For development, run the tests that don't need Docker: ```bash -npx vitest run --config vitest.smoke.config.ts +npm run test:unit && npx vitest run --config vitest.smoke.config.ts ``` -- **Count:** 9 tests (4 free + 5 pro) -- **Speed:** ~5 seconds -- **Dependencies:** None (free) / Supabase (pro) -- **Coverage:** MCP server boot, tool registration, basic stdio calls +This covers 670+ tests in under 10 seconds. + +--- + +## Tier Details + +### Tier 1 — Unit Tests (597 tests, 34 files) + +Pure unit tests with no external dependencies. Fast, deterministic, run everywhere. + +| Category | Files | What it covers | +|----------|-------|----------------| +| **Schemas** | 13 files | Zod schema validation for all tool inputs | +| **Services** | 11 files | Thread manager, active sessions, file locks, gitmem-dir, timezone | +| **Tools** | 2 files | absorb-observations, prepare-context | +| **Hooks** | 2 files | format-utils, quick-retrieve | +| **Diagnostics** | 4 files | anonymizer, channels, check-command, collector | +| **Golden Regressions** | 1 file | 11 tests replaying specific historical bugs | +| **Standalone** | 3 files | Variant assignment and enforcement (21 tests) | + +### Tier 2 — Smoke Tests (9 tests, 2 files) + +Boot the MCP server via stdio transport and verify basic functionality. + +| Suite | Tests | What it covers | +|-------|-------|----------------| +| `smoke-free.test.ts` | 4 | Free tier server boot, tool list, basic recall | +| `smoke-pro.test.ts` | 5 | Pro tier server boot (skips without Supabase credentials) | + +### Tier 3 — Integration Tests (63 tests, 5 files) + +Tests against a real PostgreSQL database via Testcontainers. Catches issues mocks would miss: missing indexes, query plan regressions, schema drift. + +All tests share a single Testcontainers setup that starts `pgvector/pgvector:pg16`, stubs `auth.role()` for Supabase compatibility, and loads `schema/setup.sql`. + +| Suite | Tests | What it covers | +|-------|-------|----------------| +| `fresh-install.test.ts` | ~12 | Empty database behavior, first session, first learning | +| `session-lifecycle.test.ts` | ~15 | Session create/close, concurrent sessions, close compliance | +| `cache-behavior.test.ts` | 9 | Cache file operations, TTL expiry, cache symmetry | +| `query-plans.test.ts` | ~12 | Index usage verification (EXPLAIN), query performance at scale | +| `scale-profiles.test.ts` | ~15 | Behavior at 0, 15, 100, 500, 1000 scars | + +### Tier 4 — E2E Tests (68 tests, 6 files) -### Tier 3: Integration Tests +Tests CLI commands and MCP protocol end-to-end. Pro tests spawn Testcontainers. + +| Suite | Tests | What it covers | +|-------|-------|----------------| +| `cli-fresh-install.test.ts` | 27 | `gitmem init`, `gitmem check`, `gitmem install-hooks`, output sanitization | +| `free-tier.test.ts` | 15 | Free tier MCP: session lifecycle, recall, create_learning, parameter validation | +| `pro-fresh.test.ts` | 11 | Pro tier with PostgreSQL: tool registration, recall, session lifecycle | +| `pro-mature.test.ts` | 7 | Pro tier at scale (1000 scars): performance, cache hit rate, throughput | +| `organic-discovery.test.ts` | 2 | Multi-session organic adoption measurement (API calls) | +| `user-journey.test.ts` | 6 | Real Claude session (see Tier 5) | + +### Tier 5 — User Journey (6 tests) + +The most important pre-ship gate. Spawns a real Claude session and verifies the full user experience using the [Claude Agent SDK](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/sdk). + +What it verifies: +1. SessionStart hook fires with correct ceremony wording +2. MCP tools are registered and connected +3. Agent calls `session_start` and `recall` +4. No internal references leak into output +5. Session completes successfully + +### Tier 6 — Performance Benchmarks + +Vitest `bench()` microbenchmarks measuring operation latency with statistical rigor. + +| Suite | What it benchmarks | +|-------|--------------------| +| `cold-start.bench.ts` | Cache initialization, first session start | +| `recall.bench.ts` | Local vector search at 15 and 1000 scars | +| `cache.bench.ts` | Cache key generation | +| `session-start.bench.ts` | Session start components | + +#### Performance Baselines + +| Component | Baseline (ms) | +|-----------|--------------| +| `session_start_total` | 750 | +| `recall_with_scars` | 2000 | +| `recall_empty` | 500 | +| `scar_search_local` | 100 | +| `session_close_total` | 1500 | +| `cache_hit` | 5 | + +Tests fail if measurement exceeds baseline x 1.5 (alert threshold). + +--- + +## CI Pipeline + +Source: `.github/workflows/ci.yml` + +**Triggers:** Push to `main`, `v*` tags, PRs against `main`. + +### Build Job (matrix: Node 18, 20, 22) + +| Step | Command | What it does | +|------|---------|-------------| +| Type check | `npm run typecheck` | `tsc --noEmit` | +| Build | `npm run build` | `tsc` — compile to `dist/` | +| Unit tests | `npm run test:unit` | 764 tests via vitest | +| Smoke test | `npm run test:smoke:free` | 4 MCP integration tests | + +### Publish Job (tag pushes only) + +Runs after all 3 build matrix jobs pass. Only fires on `v*` tag pushes. + +### Release Workflow ```bash -npx vitest run --config vitest.integration.config.ts +# 1. Make changes, commit +# 2. Bump version +npm version patch +# 3. Tag and push +git tag v1.0.X +git push origin main --tags +# CI builds -> tests -> publishes automatically ``` -- **Count:** 63 tests -- **Speed:** ~30 seconds -- **Dependencies:** Docker (Supabase PostgreSQL) -- **Coverage:** Database operations, session lifecycle, cache, query plans +### What's NOT in CI -### Tier 4: E2E Tests +| Test tier | Why not | How to run | +|-----------|---------|-----------| +| Integration (Tier 3) | Needs Docker | `npm run test:integration` locally | +| E2E pro (Tier 4) | Needs Docker | `npm run test:e2e` locally | +| User Journey (Tier 5) | Needs Claude API key | `npm run test:e2e -- tests/e2e/user-journey.test.ts` | +| Performance (Tier 6) | Benchmarks, not pass/fail | `npm run test:perf` | -```bash -npx vitest run --config vitest.e2e.config.ts +--- + +## Prerequisites + +### Docker (Tiers 3-4) + +Integration and pro E2E tests use [Testcontainers](https://node.testcontainers.org/) to spin up `pgvector/pgvector:pg16` PostgreSQL containers. + +- Docker daemon must be running (`docker info` must succeed) +- Tests skip gracefully when Docker is unavailable +- Docker-in-Docker: mount the host Docker socket (`/var/run/docker.sock`) + +### Auth Schema Stub + +Plain pgvector PostgreSQL doesn't include Supabase's `auth` schema. All Docker-based tests must stub it: + +```sql +CREATE SCHEMA IF NOT EXISTS auth; +CREATE OR REPLACE FUNCTION auth.role() RETURNS TEXT AS $$ + SELECT 'service_role'::TEXT; +$$ LANGUAGE sql; ``` -- **Count:** 60 tests -- **Speed:** ~75 seconds -- **Dependencies:** Varies by suite -- **Coverage:** Full install flow, hooks, MCP via stdio +### Claude CLI (Tier 5) -### Tier 5: User Journey +User Journey tests require the Claude CLI installed and authenticated. Detection: `claude --version`. + +--- + +## Skip Conditions + +Tests skip gracefully when dependencies are missing: + +| Test | Skip condition | Detection | +|------|---------------|-----------| +| `user-journey.test.ts` | Claude CLI not installed | `claude --version` | +| `organic-discovery.test.ts` | Claude CLI not installed | `claude --version` | +| `pro-fresh.test.ts` | Docker not available | `docker info` | +| `pro-mature.test.ts` | Docker not available | `docker info` | +| `smoke-pro.test.ts` | No Supabase credentials | env check | +| All integration tests | Docker not available | `docker info` | + +--- + +## Tool Tier Gating + +The MCP server gates tools by tier: + +| Tier | Tool Count | Includes | +|------|-----------|----------| +| **free** | 55 | Core tools only | +| **pro** | 67 | + analyze, cache management, graph traverse | +| **dev** | 73 | + batch operations, transcripts | + +--- + +## Mapping Changes to Test Tiers + +Test at the tier where your change first touches a real boundary. Every change gets Tier 1. Then add the tier that matches the highest boundary crossed. -```bash -npx vitest run tests/e2e/user-journey.test.ts --config vitest.e2e.config.ts +``` +Did you change... + +- a pure function or schema? -> Tier 1 (always) + +- how the MCP server responds? -> + Tier 2 (5s, free) + +- a database query or migration? -> + Tier 3 (30s, Docker) + +- a CLI command or hook script? -> + Tier 4 (90s, Docker for pro) + +- what the agent experiences? -> + Tier 5 (60s, ~$0.30) + +- performance-sensitive code? -> + Tier 6 (30s, free) ``` -- **Count:** 6 tests -- **Speed:** ~60 seconds -- **Dependencies:** Claude API key -- **Coverage:** Real Claude session verifying hooks, tools, and ceremony +### Pre-commit Minimum -### Tier 6: Performance +| Situation | Run | +|-----------|-----| +| Any code change | `npm run test:unit` (Tier 1) | +| MCP server or tool changes | + `npm run test:smoke:free` (Tier 2) | +| Before pushing to GitHub | Tiers 1 + 2 minimum | +| Before npm publish | Tiers 1-5 (Tier 5 is the ship gate) | -```bash -npx vitest bench --config vitest.perf.config.ts +--- + +## Agent SDK Testing Pattern + +The `user-journey.test.ts` file establishes a reusable pattern for testing Claude CLI integrations: + +```typescript +import { query } from "@anthropic-ai/claude-agent-sdk"; +import type { SDKMessage, HookCallback, PreToolUseHookInput } from "@anthropic-ai/claude-agent-sdk"; + +const toolCalls: string[] = []; +const hookObserver: HookCallback = async (input) => { + if (input.hook_event_name === "PreToolUse") { + toolCalls.push((input as PreToolUseHookInput).tool_name); + } + return {}; +}; + +for await (const msg of query({ + prompt: "Do something", + options: { + cwd: "/path/to/project", + model: "haiku", + maxTurns: 5, + maxBudgetUsd: 1.0, + permissionMode: "bypassPermissions", + allowDangerouslySkipPermissions: true, + persistSession: false, + settingSources: ["project"], + thinking: { type: "disabled" }, + hooks: { + PreToolUse: [{ hooks: [hookObserver] }], + }, + }, +})) { + // Process messages... +} + +expect(toolCalls).toContain("mcp__gitmem__session_start"); ``` -- **Count:** 4 benchmark files -- **Speed:** ~30 seconds -- **Coverage:** Cold start latency, recall speed, cache performance +### Key SDK Options for Testing -## Quick Run (No Docker) +| Option | Value | Why | +|--------|-------|-----| +| `model` | `"haiku"` | Fastest, cheapest | +| `maxTurns` | 2-5 | Prevent runaway | +| `maxBudgetUsd` | 1.0 | Hard cost cap | +| `permissionMode` | `"bypassPermissions"` | No interactive prompts | +| `persistSession` | `false` | No disk state | +| `settingSources` | `["project"]` | Load project hooks | -For development, run the tests that don't need Docker: +--- + +## Adding New Tests + +| What you're adding | Tier | Location | +|-------------------|------|----------| +| Schema validation / pure logic | 1 | `tests/unit/schemas/` or `tests/unit/services/` | +| Database behavior | 3 | `tests/integration/` | +| Free tier CLI / hooks | 4 | `cli-fresh-install.test.ts` or `free-tier.test.ts` | +| Pro tier MCP | 4 | `pro-fresh.test.ts` or `pro-mature.test.ts` | +| Agent behavior | 5 | `user-journey.test.ts` | +| Performance regression | 6 | `tests/performance/` | +| Hook scripts | — | `hooks/tests/test-hooks.sh` | + + +For Tier 5 user-journey tests, keep prompts simple and use `appendSystemPrompt` to constrain agent behavior. Test that tools are _called_, not that the agent says specific words. + + +## MCP Protocol Compliance + +A dedicated compliance suite validates gitmem-mcp against the MCP protocol specification. See [Compliance Report](/docs/contributing/compliance) for the full report. + +**Latest result:** 36/36 PASS ```bash -npm run test:unit && npx vitest run --config vitest.smoke.config.ts +GITMEM_TIER=free node tests/compliance/mcp-protocol-compliance.mjs ``` - -This covers 670+ tests in under 10 seconds. diff --git a/apps/docs/public/llms-full.txt b/apps/docs/public/llms-full.txt index 9b45035..e2bbda4 100644 --- a/apps/docs/public/llms-full.txt +++ b/apps/docs/public/llms-full.txt @@ -167,6 +167,148 @@ Severity applies to scars and determines surfacing priority: | `medium` | Moderate friction or repeated issue | "Always validate UUIDs before DB lookup" | | `low` | Minor optimization or preference | "Use descriptive branch names" | +======================================================================== +# Local Storage +URL: https://gitmem.ai/docs/concepts/local-storage +Description: How GitMem uses the local filesystem, what persists across sessions, and container deployment considerations. +======================================================================== + +import { Callout } from 'fumadocs-ui/components/callout' + +# Local Storage + +GitMem writes to the local filesystem for session state, caching, and free-tier data storage. This page maps exactly what lives where so you can make informed decisions about persistence — especially in containers. + +## Storage Locations + +| Location | What | Owner | +|----------|------|-------| +| `/.gitmem/` | Session state, threads, config, caches | GitMem MCP server | +| `~/.cache/gitmem/` | Search result cache (15-min TTL) | GitMem MCP server | + +## File Inventory + +``` +.gitmem/ ++-- active-sessions.json # Process lifecycle ++-- config.json # Project defaults ++-- sessions.json # Recent session index (free tier SOT) ++-- threads.json # Thread state cache / free tier SOT ++-- suggested-threads.json # AI-suggested threads ++-- closing-payload.json # (ephemeral -- deleted after use) ++-- cache/ +| +-- hook-scars.json # Local scar copy for hooks plugin ++-- hooks-state/ +| +-- start_time # Session start timestamp +| +-- tool_call_count # Recall nag counter +| +-- last_nag_time # Last recall reminder time +| +-- stop_hook_active # Lock file (re-entrancy guard) +| +-- audit.jsonl # Hook execution log ++-- sessions/ + +-- / + +-- session.json # Per-session state (scars, confirmations) +``` + +**Total typical footprint: ~530KB** (dominated by `cache/hook-scars.json`). + +## File Lifecycle + +| File | Created | Survives Session Close? | +|------|---------|------------------------| +| `active-sessions.json` | `session_start` | Yes — multi-session registry | +| `config.json` | First `session_start` | Yes | +| `sessions.json` | `session_close` (free tier) | Yes | +| `threads.json` | `session_close` | Yes | +| `suggested-threads.json` | `session_close` | Yes | +| `closing-payload.json` | Agent writes before close | **No** — ephemeral | +| `cache/hook-scars.json` | Hooks plugin startup | Yes | +| `sessions//session.json` | `session_start` | **No** — cleaned up on close | + +## Cross-Session Data Flow + +### What `session_start` loads + +| Data | Pro/Dev Source | Free Source | +|------|---------------|-------------| +| Last session (decisions, reflection) | Supabase `sessions` | `.gitmem/sessions.json` | +| Open threads | Supabase `threads` | `.gitmem/threads.json` | +| Recent decisions | Supabase `decisions` | `.gitmem/sessions.json` (embedded) | +| Scars for recall | Supabase `learnings` | `.gitmem/learnings.json` | +| Suggested threads | `.gitmem/suggested-threads.json` | `.gitmem/suggested-threads.json` | + +### What `recall` searches + +| Tier | Source | Search Method | +|------|--------|---------------| +| Pro/Dev | Supabase `learnings` | Semantic (embedding cosine similarity) | +| Pro/Dev (cached) | `~/.cache/gitmem/results/` | Local vector search (15-min TTL) | +| Free | `.gitmem/learnings.json` | Keyword tokenization match | + +### What `session_close` persists + +| Data | Pro/Dev Destination | Free Destination | +|------|--------------------|--------------------| +| Session record | Supabase `sessions` | `.gitmem/sessions.json` | +| New learnings | Supabase `learnings` | `.gitmem/learnings.json` | +| Decisions | Supabase `decisions` | `.gitmem/decisions.json` | +| Thread state | Supabase `threads` + local | `.gitmem/threads.json` | +| Scar usage | Supabase `scar_usage` | `.gitmem/scar_usage.json` | +| Transcript | Supabase storage bucket | Not captured | + +## Container Deployments + +### Ephemeral container per session + +``` +Container A (session 1) -> writes .gitmem/ -> container destroyed +Container B (session 2) -> fresh .gitmem/ -> no history +``` + +| Tier | Cross-Session Memory | What Breaks | +|------|---------------------|-------------| +| **Pro/Dev** | **Works** — Supabase is SOT | Hooks plugin cold-starts each time. Suggested threads lost. Minor UX friction, no data loss. | +| **Free** | **Completely broken** — all memory is local files | No scars, no threads, no session history. Each session is amnesic. | + +### Persistent volume mount + +```bash +docker run -v gitmem-data:/app/.gitmem ... +``` + +Both tiers work. Free tier: local files ARE the SOT. Pro tier: local files are caches, Supabase is SOT. + +### Shared container (long-running) + +Container stays alive across multiple `claude` invocations. Both tiers work. `.gitmem/` persists because the container persists. + +## Recommendations + +### Free tier + +Mount a volume for `.gitmem/`: + +```yaml +volumes: + - gitmem-state:/workspace/.gitmem +``` + +Files that MUST persist: `learnings.json`, `threads.json`, `sessions.json`, `decisions.json`. + +### Pro/Dev tier + +**Nothing required.** Supabase is the source of truth. A fresh `.gitmem/` each session works — just slightly slower (cache cold start). + +Optional for better UX: + +```yaml +volumes: + - gitmem-cache:/workspace/.gitmem/cache # Avoids scar cache re-download +``` + + +`active-sessions.json` tracks process lifecycle (PIDs, hostnames) — inherently local. `sessions//session.json` survives context compaction when the LLM loses state. `cache/hook-scars.json` is needed by shell-based hooks that can't call Supabase directly. `closing-payload.json` avoids MCP tool call size limits. + + ======================================================================== # Scars URL: https://gitmem.ai/docs/concepts/scars @@ -312,12 +454,20 @@ These reflections become context for the next session. ======================================================================== # Threads URL: https://gitmem.ai/docs/concepts/threads -Description: Track unresolved work across sessions with GitMem threads. +Description: Track unresolved work across sessions with lifecycle management, vitality scoring, and semantic deduplication. ======================================================================== +import { Callout } from 'fumadocs-ui/components/callout' + # Threads -**Threads** track unresolved work that carries across sessions. When you can't finish something in the current session, create a thread so the next session picks it up. +**Threads** are persistent work items that carry across sessions. They track what's unresolved, what's blocked, and what needs follow-up — surviving session boundaries so nothing gets lost. + +## Why Threads Exist + +Sessions end, but work doesn't. Before threads, open items lived as plain strings inside session records. They had no IDs, no lifecycle, no way to mark something as done. You'd see the same stale item surfaced session after session with no way to clear it. + +Threads give open items identity (`t-XXXXXXXX`), lifecycle status, vitality scoring, and a resolution trail. ## Creating Threads @@ -325,47 +475,188 @@ Description: Track unresolved work across sessions with GitMem threads. create_thread({ text: "Auth middleware needs rate limiting before production deploy" }) ``` -Threads include: -- A unique thread ID (e.g., `t-a1b2c3d4`) -- Description text -- Creation timestamp -- Optional Linear issue link +Threads are created in three ways: + +1. **Explicitly** via `create_thread` — mid-session when you identify a new open item +2. **Implicitly** via `session_close` — when the closing payload includes `open_threads` +3. **Promoted** from a suggestion via `promote_suggestion` — when a recurring topic is confirmed + +## Thread Lifecycle + +Threads progress through a 5-stage state machine based on vitality scoring and age: + +``` +create_thread / session_close payload + | + v + [ EMERGING ] -- first 24 hours, high visibility + | + v (age > 24h) + [ ACTIVE ] -- vitality > 0.5, actively referenced + | + v (vitality decays) + [ COOLING ] -- 0.2 <= vitality <= 0.5, fading from use + | + v (vitality < 0.2) + [ DORMANT ] -- vitality < 0.2, no recent touches + | + v (dormant 30+ days) + [ ARCHIVED ] -- auto-archived, hidden from session_start + +Any state --(explicit resolve_thread)--> [ RESOLVED ] +``` + +### Transitions + +| Transition | Condition | +|-----------|-----------| +| any -> emerging | Thread age < 24 hours | +| emerging -> active | Thread age >= 24 hours, vitality > 0.5 | +| active -> cooling | Vitality drops to [0.2, 0.5] | +| cooling -> active | Touch refreshes vitality above 0.5 | +| cooling -> dormant | Vitality drops below 0.2 | +| dormant -> active | Touch refreshes vitality above 0.5 | +| dormant -> archived | Dormant for 30+ consecutive days | +| any -> resolved | Explicit `resolve_thread` call | + +**Terminal states:** Archived and resolved threads do not transition. To reopen an archived topic, create a new thread. + +## Vitality Scoring + +Every thread has a vitality score (0.0 to 1.0) computed from two components: + +``` +vitality = 0.55 * recency + 0.45 * frequency +``` + +### Recency + +Exponential decay based on thread class half-life: + +``` +recency = e^(-ln(2) * days_since_touch / half_life) +``` + +| Thread Class | Half-Life | Use Case | +|-------------|-----------|----------| +| operational | 3 days | Deploys, fixes, incidents, blockers | +| backlog | 21 days | Research, long-running improvements | + +Thread class is auto-detected from keywords in the thread text ("deploy", "fix", "debug", "hotfix", "urgent", "broken", "incident", "blocker" = operational). + +### Frequency + +Log-scaled touch count normalized against thread age: + +``` +frequency = min(log(touch_count + 1) / log(days_alive + 1), 1.0) +``` + +### Status Thresholds + +| Vitality Score | Status | +|---------------|--------| +| > 0.5 | active | +| 0.2 - 0.5 | cooling | +| < 0.2 | dormant | + +Threads touched during a session have their `touch_count` incremented and `last_touched_at` refreshed, which revives decayed vitality. + +## Carry-Forward + +On `session_start`, open threads appear with vitality info: + +``` +Open threads (3): + t-abc12345: Fix auth timeout [ACTIVE 0.82] (operational, 2d ago) + t-def67890: Improve test coverage [COOLING 0.35] (backlog, 12d ago) + t-ghi11111: New thread just created [EMERGING 0.95] (backlog, today) +``` + +## Resolution + +Threads are resolved via `resolve_thread`: +- **By ID** (preferred): `resolve_thread({ thread_id: "t-a1b2c3d4" })` +- **By text match** (fallback): `resolve_thread({ text_match: "package name" })` + +Resolution records a timestamp, the resolving session, and an optional note. Knowledge graph triples are written to track the resolution relationship. ## Semantic Deduplication -GitMem uses cosine similarity (threshold > 0.85) to prevent duplicate threads. If you try to create a thread that's semantically identical to an existing one, GitMem returns the existing thread instead. +When `create_thread` is called, the new thread text is compared against all open threads using embedding cosine similarity before creation. -## Thread Lifecycle +| Threshold | Value | Meaning | +|-----------|-------|---------| +| Dedup similarity | 0.85 | Above this = duplicate | + +**Dedup methods** (in priority order): +1. **Embedding-based** — cosine similarity of text embeddings (when Supabase available) +2. **Text normalization fallback** — exact match after lowercasing, stripping punctuation, collapsing whitespace + +When a duplicate is detected, the existing thread is returned (with `deduplicated: true`) and touched to keep it vital. + +## Suggested Threads + +At `session_close`, session embeddings are compared to detect recurring topics that should become threads. + +### Detection Algorithm + +1. Compare current session embedding against the last 20 sessions (30-day window) +2. Find sessions with cosine similarity >= 0.70 +3. If 3+ sessions cluster (current + 2 historical): + - Check if an open thread already covers the topic (>= 0.80) -> skip + - Check if a pending suggestion already matches (>= 0.80) -> add evidence + - Otherwise, create a new suggestion + +Suggestions appear at `session_start`: ``` -create → surface at session_start → resolve +Suggested threads (2) -- recurring topics not yet tracked: + ts-a1b2c3d4: Recurring auth timeout pattern (3 sessions) + ts-e5f6g7h8: Build performance regression (4 sessions) + Use promote_suggestion or dismiss_suggestion to manage. ``` -1. **Create** — `create_thread` during a session -2. **Surface** — Open threads appear in the next `session_start` banner -3. **Resolve** — `resolve_thread` with a resolution note when complete +| Action | Tool | Effect | +|--------|------|--------| +| Promote | `promote_suggestion` | Converts to a real thread | +| Dismiss | `dismiss_suggestion` | Suppresses (3x = permanent) | + +## Knowledge Graph Integration + +Thread creation and resolution generate knowledge graph triples: + +| Predicate | Subject | Object | When | +|-----------|---------|--------|------| +| `created_thread` | Session | Thread | Thread created | +| `resolves_thread` | Session | Thread | Thread resolved | +| `relates_to_thread` | Thread | Issue | Thread linked to Linear issue | + +Use `graph_traverse` to query these relationships with 4 lenses: `connected_to`, `produced_by`, `provenance`, `stats`. ## Managing Threads | Tool | Purpose | |------|---------| -| `list_threads` | See all open threads | -| `resolve_thread` | Mark a thread as done | -| `cleanup_threads` | Triage by health (active/cooling/dormant) | +| [`create_thread`](/docs/tools/create-thread) | Create a new open thread | +| [`resolve_thread`](/docs/tools/resolve-thread) | Mark a thread as done | +| [`list_threads`](/docs/tools/list-threads) | See all open threads | +| [`cleanup_threads`](/docs/tools/cleanup-threads) | Triage by health (active/cooling/dormant) | +| [`promote_suggestion`](/docs/tools/promote-suggestion) | Convert suggestion to real thread | +| [`dismiss_suggestion`](/docs/tools/dismiss-suggestion) | Suppress a suggestion | -### Thread Health - -`cleanup_threads` categorizes threads by vitality: - -- **Active** — Recently created or referenced -- **Cooling** — Not referenced in a while -- **Dormant** — Untouched for 30+ days (auto-archivable) +## Storage -### Suggested Threads +| Location | Purpose | Tier | +|----------|---------|------| +| `.gitmem/threads.json` | Runtime cache / free tier SOT | All | +| `.gitmem/suggested-threads.json` | Pending suggestions | All | +| Supabase `threads` table | Source of truth (full vitality, lifecycle, embeddings) | Pro/Dev | +| Supabase `sessions.open_threads` | Legacy fallback | Pro/Dev | -`session_start` may suggest threads based on session context. You can: -- **Promote** — `promote_suggestion` converts it to a real thread -- **Dismiss** — `dismiss_suggestion` suppresses it (3 dismissals = permanent suppression) + +On free tier, `.gitmem/threads.json` IS the source of truth. On pro/dev tier, it's a cache — Supabase is authoritative. + ======================================================================== # Tiers @@ -410,6 +701,126 @@ These tools will be available when Pro launches: | `cache_health` | Compare local cache vs remote | | `cache_flush` | Force reload from Supabase | +======================================================================== +# MCP Protocol Compliance +URL: https://gitmem.ai/docs/contributing/compliance +Description: Full MCP protocol compliance report — 36/36 tests passing. +======================================================================== + +import { Callout } from 'fumadocs-ui/components/callout' + +# MCP Protocol Compliance + + +GitMem passes all MCP protocol compliance tests as of v1.0.3. + + +## Running the Suite + +```bash +GITMEM_TIER=free node tests/compliance/mcp-protocol-compliance.mjs +``` + +The compliance test script spawns the MCP server as a child process via STDIO, performs the full JSON-RPC 2.0 handshake, validates tool schemas, executes tool calls, and tests error handling. + +## Results Summary + +| Category | Tests | Result | +|----------|-------|--------| +| Protocol Handshake | 9 | 9/9 | +| Tool Listing | 4 | 4/4 | +| Schema Validation | 3 | 3/3 | +| Tool Execution | 10 | 10/10 | +| Error Handling | 4 | 4/4 | +| Response Format | 6 | 6/6 | +| **Total** | **36** | **36/36** | + +## Category Details + +### 1. Protocol Handshake (9/9) + +Tests JSON-RPC 2.0 `initialize` method and `notifications/initialized` lifecycle. + +| Test | Result | +|------|--------| +| initialize returns result | PASS | +| has protocolVersion | PASS | +| protocolVersion is string | PASS | +| has serverInfo | PASS | +| serverInfo.name exists | PASS | +| serverInfo.version exists | PASS | +| has capabilities | PASS | +| capabilities.tools exists | PASS | +| initialized notification accepted | PASS | + +### 2. Tool Listing (4/4) + +| Test | Result | +|------|--------| +| tools/list returns result | PASS | +| result has tools array | PASS | +| at least 1 tool registered | PASS | +| tool count (21) is reasonable (5-100) | PASS | + +21 tools in free tier. Pro tier exposes 67, dev tier 73. + +### 3. Tool Schema Validation (3/3) + +Every tool's `inputSchema` validated against JSON Schema and MCP spec requirements. + +| Test | Result | +|------|--------| +| all tool schemas valid (type, required, property types, descriptions) | PASS | +| all descriptions >= 30 chars | PASS | +| no duplicate tool names | PASS | + +### 4. Tool Execution (10/10) + +Live tool calls via MCP STDIO transport. + +| Test | Result | +|------|--------| +| gitmem-help returns result | PASS | +| result has content array | PASS | +| content[0].type === "text" | PASS | +| content[0].text is non-empty | PASS | +| search returns result | PASS | +| search result has content | PASS | +| search content is text type | PASS | +| recall returns result | PASS | +| recall has content array | PASS | +| log returns result | PASS | + +### 5. Error Handling (4/4) + +| Test | Result | +|------|--------| +| unknown tool returns error | PASS | +| unknown method returns JSON-RPC error | PASS | +| error has numeric code | PASS | +| error code is -32601 (Method not found) | PASS | + +### 6. Response Format Compliance (6/6) + +| Test | Result | +|------|--------| +| all responses include jsonrpc: "2.0" | PASS | +| all responses include matching id | PASS | +| content block has type field | PASS | +| text block has text field | PASS | +| successful calls have isError=false or undefined | PASS | +| resources/list returns -32601 (not implemented) | PASS | + +## Optional Features Not Implemented + +These are valid omissions — the MCP spec does not require servers to implement all capabilities: + +| Feature | Status | Reason | +|---------|--------|--------| +| `resources/list` | Not implemented (-32601) | No resources exposed; tools-only server | +| `prompts/list` | Not implemented (-32601) | No prompt templates; tool-driven UX | +| `resources/templates/list` | Not implemented | No dynamic resources | + ======================================================================== # Contributing URL: https://gitmem.ai/docs/contributing @@ -487,90 +898,333 @@ gitmem/ ======================================================================== # Testing URL: https://gitmem.ai/docs/contributing/testing -Description: GitMem's test pyramid and how to run each tier. +Description: GitMem's 6-tier test pyramid, CI pipeline, and how to run each tier. ======================================================================== +import { Tabs, Tab } from 'fumadocs-ui/components/tabs' +import { Callout } from 'fumadocs-ui/components/callout' + # Testing -GitMem uses a 6-tier test pyramid. All runnable tests must pass before shipping. +GitMem uses a 6-tier testing pyramid. Each tier adds cost/time but tests closer to the real user experience. -## Test Tiers +## Test Pyramid -### Tier 1: Unit Tests +| Tier | Command | Tests | Speed | Cost | What it tests | +|------|---------|-------|-------|------|---------------| +| **1 - Unit** | `npm run test:unit` | 597 | ~3s | Free | Schema validation, pure functions, golden regressions | +| **2 - Smoke** | `npm run test:smoke` | 9 | ~5s | Free | MCP server boot, tool registration, basic tool calls via stdio | +| **3 - Integration** | `npm run test:integration` | 63 | ~30s | Free (Docker) | Real PostgreSQL, session lifecycle, cache behavior, query plans | +| **4 - E2E** | `npm run test:e2e` | 68 | ~90s | Free (Docker for pro) | CLI install flow, hooks, free/pro tier MCP via stdio | +| **5 - User Journey** | `npm run test:e2e -- tests/e2e/user-journey.test.ts` | 6 | ~60s | API calls | Real Claude session via Agent SDK | +| **6 - Performance** | `npm run test:perf` | benchmarks | ~30s | Free | Cold start, recall latency, cache hit rate microbenchmarks | -```bash -npm run test:unit -``` +**Run all:** `npm run test:all` (runs tiers 1-4 + 6; excludes user-journey) -- **Count:** 660+ tests -- **Speed:** ~3 seconds -- **Dependencies:** None -- **Coverage:** Schema validation, pure functions, golden regressions + +Always run `npm run test:unit` at minimum. Before shipping to npm, run tiers 1-5. Tier 5 (User Journey) is the most important gate. + + +## Quick Run (No Docker) -### Tier 2: Smoke Tests +For development, run the tests that don't need Docker: ```bash -npx vitest run --config vitest.smoke.config.ts +npm run test:unit && npx vitest run --config vitest.smoke.config.ts ``` -- **Count:** 9 tests (4 free + 5 pro) -- **Speed:** ~5 seconds -- **Dependencies:** None (free) / Supabase (pro) -- **Coverage:** MCP server boot, tool registration, basic stdio calls +This covers 670+ tests in under 10 seconds. + +--- + +## Tier Details + +### Tier 1 — Unit Tests (597 tests, 34 files) + +Pure unit tests with no external dependencies. Fast, deterministic, run everywhere. + +| Category | Files | What it covers | +|----------|-------|----------------| +| **Schemas** | 13 files | Zod schema validation for all tool inputs | +| **Services** | 11 files | Thread manager, active sessions, file locks, gitmem-dir, timezone | +| **Tools** | 2 files | absorb-observations, prepare-context | +| **Hooks** | 2 files | format-utils, quick-retrieve | +| **Diagnostics** | 4 files | anonymizer, channels, check-command, collector | +| **Golden Regressions** | 1 file | 11 tests replaying specific historical bugs | +| **Standalone** | 3 files | Variant assignment and enforcement (21 tests) | + +### Tier 2 — Smoke Tests (9 tests, 2 files) + +Boot the MCP server via stdio transport and verify basic functionality. + +| Suite | Tests | What it covers | +|-------|-------|----------------| +| `smoke-free.test.ts` | 4 | Free tier server boot, tool list, basic recall | +| `smoke-pro.test.ts` | 5 | Pro tier server boot (skips without Supabase credentials) | + +### Tier 3 — Integration Tests (63 tests, 5 files) + +Tests against a real PostgreSQL database via Testcontainers. Catches issues mocks would miss: missing indexes, query plan regressions, schema drift. + +All tests share a single Testcontainers setup that starts `pgvector/pgvector:pg16`, stubs `auth.role()` for Supabase compatibility, and loads `schema/setup.sql`. + +| Suite | Tests | What it covers | +|-------|-------|----------------| +| `fresh-install.test.ts` | ~12 | Empty database behavior, first session, first learning | +| `session-lifecycle.test.ts` | ~15 | Session create/close, concurrent sessions, close compliance | +| `cache-behavior.test.ts` | 9 | Cache file operations, TTL expiry, cache symmetry | +| `query-plans.test.ts` | ~12 | Index usage verification (EXPLAIN), query performance at scale | +| `scale-profiles.test.ts` | ~15 | Behavior at 0, 15, 100, 500, 1000 scars | + +### Tier 4 — E2E Tests (68 tests, 6 files) + +Tests CLI commands and MCP protocol end-to-end. Pro tests spawn Testcontainers. + +| Suite | Tests | What it covers | +|-------|-------|----------------| +| `cli-fresh-install.test.ts` | 27 | `gitmem init`, `gitmem check`, `gitmem install-hooks`, output sanitization | +| `free-tier.test.ts` | 15 | Free tier MCP: session lifecycle, recall, create_learning, parameter validation | +| `pro-fresh.test.ts` | 11 | Pro tier with PostgreSQL: tool registration, recall, session lifecycle | +| `pro-mature.test.ts` | 7 | Pro tier at scale (1000 scars): performance, cache hit rate, throughput | +| `organic-discovery.test.ts` | 2 | Multi-session organic adoption measurement (API calls) | +| `user-journey.test.ts` | 6 | Real Claude session (see Tier 5) | + +### Tier 5 — User Journey (6 tests) + +The most important pre-ship gate. Spawns a real Claude session and verifies the full user experience using the [Claude Agent SDK](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/sdk). + +What it verifies: +1. SessionStart hook fires with correct ceremony wording +2. MCP tools are registered and connected +3. Agent calls `session_start` and `recall` +4. No internal references leak into output +5. Session completes successfully + +### Tier 6 — Performance Benchmarks + +Vitest `bench()` microbenchmarks measuring operation latency with statistical rigor. + +| Suite | What it benchmarks | +|-------|--------------------| +| `cold-start.bench.ts` | Cache initialization, first session start | +| `recall.bench.ts` | Local vector search at 15 and 1000 scars | +| `cache.bench.ts` | Cache key generation | +| `session-start.bench.ts` | Session start components | + +#### Performance Baselines + +| Component | Baseline (ms) | +|-----------|--------------| +| `session_start_total` | 750 | +| `recall_with_scars` | 2000 | +| `recall_empty` | 500 | +| `scar_search_local` | 100 | +| `session_close_total` | 1500 | +| `cache_hit` | 5 | + +Tests fail if measurement exceeds baseline x 1.5 (alert threshold). + +--- + +## CI Pipeline -### Tier 3: Integration Tests +Source: `.github/workflows/ci.yml` + +**Triggers:** Push to `main`, `v*` tags, PRs against `main`. + +### Build Job (matrix: Node 18, 20, 22) + +| Step | Command | What it does | +|------|---------|-------------| +| Type check | `npm run typecheck` | `tsc --noEmit` | +| Build | `npm run build` | `tsc` — compile to `dist/` | +| Unit tests | `npm run test:unit` | 764 tests via vitest | +| Smoke test | `npm run test:smoke:free` | 4 MCP integration tests | + +### Publish Job (tag pushes only) + +Runs after all 3 build matrix jobs pass. Only fires on `v*` tag pushes. + +### Release Workflow ```bash -npx vitest run --config vitest.integration.config.ts +# 1. Make changes, commit +# 2. Bump version +npm version patch +# 3. Tag and push +git tag v1.0.X +git push origin main --tags +# CI builds -> tests -> publishes automatically ``` -- **Count:** 63 tests -- **Speed:** ~30 seconds -- **Dependencies:** Docker (Supabase PostgreSQL) -- **Coverage:** Database operations, session lifecycle, cache, query plans +### What's NOT in CI -### Tier 4: E2E Tests +| Test tier | Why not | How to run | +|-----------|---------|-----------| +| Integration (Tier 3) | Needs Docker | `npm run test:integration` locally | +| E2E pro (Tier 4) | Needs Docker | `npm run test:e2e` locally | +| User Journey (Tier 5) | Needs Claude API key | `npm run test:e2e -- tests/e2e/user-journey.test.ts` | +| Performance (Tier 6) | Benchmarks, not pass/fail | `npm run test:perf` | -```bash -npx vitest run --config vitest.e2e.config.ts +--- + +## Prerequisites + +### Docker (Tiers 3-4) + +Integration and pro E2E tests use [Testcontainers](https://node.testcontainers.org/) to spin up `pgvector/pgvector:pg16` PostgreSQL containers. + +- Docker daemon must be running (`docker info` must succeed) +- Tests skip gracefully when Docker is unavailable +- Docker-in-Docker: mount the host Docker socket (`/var/run/docker.sock`) + +### Auth Schema Stub + +Plain pgvector PostgreSQL doesn't include Supabase's `auth` schema. All Docker-based tests must stub it: + +```sql +CREATE SCHEMA IF NOT EXISTS auth; +CREATE OR REPLACE FUNCTION auth.role() RETURNS TEXT AS $$ + SELECT 'service_role'::TEXT; +$$ LANGUAGE sql; ``` -- **Count:** 60 tests -- **Speed:** ~75 seconds -- **Dependencies:** Varies by suite -- **Coverage:** Full install flow, hooks, MCP via stdio +### Claude CLI (Tier 5) -### Tier 5: User Journey +User Journey tests require the Claude CLI installed and authenticated. Detection: `claude --version`. -```bash -npx vitest run tests/e2e/user-journey.test.ts --config vitest.e2e.config.ts +--- + +## Skip Conditions + +Tests skip gracefully when dependencies are missing: + +| Test | Skip condition | Detection | +|------|---------------|-----------| +| `user-journey.test.ts` | Claude CLI not installed | `claude --version` | +| `organic-discovery.test.ts` | Claude CLI not installed | `claude --version` | +| `pro-fresh.test.ts` | Docker not available | `docker info` | +| `pro-mature.test.ts` | Docker not available | `docker info` | +| `smoke-pro.test.ts` | No Supabase credentials | env check | +| All integration tests | Docker not available | `docker info` | + +--- + +## Tool Tier Gating + +The MCP server gates tools by tier: + +| Tier | Tool Count | Includes | +|------|-----------|----------| +| **free** | 55 | Core tools only | +| **pro** | 67 | + analyze, cache management, graph traverse | +| **dev** | 73 | + batch operations, transcripts | + +--- + +## Mapping Changes to Test Tiers + +Test at the tier where your change first touches a real boundary. Every change gets Tier 1. Then add the tier that matches the highest boundary crossed. + +``` +Did you change... + +- a pure function or schema? -> Tier 1 (always) + +- how the MCP server responds? -> + Tier 2 (5s, free) + +- a database query or migration? -> + Tier 3 (30s, Docker) + +- a CLI command or hook script? -> + Tier 4 (90s, Docker for pro) + +- what the agent experiences? -> + Tier 5 (60s, ~$0.30) + +- performance-sensitive code? -> + Tier 6 (30s, free) ``` -- **Count:** 6 tests -- **Speed:** ~60 seconds -- **Dependencies:** Claude API key -- **Coverage:** Real Claude session verifying hooks, tools, and ceremony +### Pre-commit Minimum -### Tier 6: Performance +| Situation | Run | +|-----------|-----| +| Any code change | `npm run test:unit` (Tier 1) | +| MCP server or tool changes | + `npm run test:smoke:free` (Tier 2) | +| Before pushing to GitHub | Tiers 1 + 2 minimum | +| Before npm publish | Tiers 1-5 (Tier 5 is the ship gate) | -```bash -npx vitest bench --config vitest.perf.config.ts +--- + +## Agent SDK Testing Pattern + +The `user-journey.test.ts` file establishes a reusable pattern for testing Claude CLI integrations: + +```typescript +import { query } from "@anthropic-ai/claude-agent-sdk"; +import type { SDKMessage, HookCallback, PreToolUseHookInput } from "@anthropic-ai/claude-agent-sdk"; + +const toolCalls: string[] = []; +const hookObserver: HookCallback = async (input) => { + if (input.hook_event_name === "PreToolUse") { + toolCalls.push((input as PreToolUseHookInput).tool_name); + } + return {}; +}; + +for await (const msg of query({ + prompt: "Do something", + options: { + cwd: "/path/to/project", + model: "haiku", + maxTurns: 5, + maxBudgetUsd: 1.0, + permissionMode: "bypassPermissions", + allowDangerouslySkipPermissions: true, + persistSession: false, + settingSources: ["project"], + thinking: { type: "disabled" }, + hooks: { + PreToolUse: [{ hooks: [hookObserver] }], + }, + }, +})) { + // Process messages... +} + +expect(toolCalls).toContain("mcp__gitmem__session_start"); ``` -- **Count:** 4 benchmark files -- **Speed:** ~30 seconds -- **Coverage:** Cold start latency, recall speed, cache performance +### Key SDK Options for Testing -## Quick Run (No Docker) +| Option | Value | Why | +|--------|-------|-----| +| `model` | `"haiku"` | Fastest, cheapest | +| `maxTurns` | 2-5 | Prevent runaway | +| `maxBudgetUsd` | 1.0 | Hard cost cap | +| `permissionMode` | `"bypassPermissions"` | No interactive prompts | +| `persistSession` | `false` | No disk state | +| `settingSources` | `["project"]` | Load project hooks | -For development, run the tests that don't need Docker: +--- + +## Adding New Tests + +| What you're adding | Tier | Location | +|-------------------|------|----------| +| Schema validation / pure logic | 1 | `tests/unit/schemas/` or `tests/unit/services/` | +| Database behavior | 3 | `tests/integration/` | +| Free tier CLI / hooks | 4 | `cli-fresh-install.test.ts` or `free-tier.test.ts` | +| Pro tier MCP | 4 | `pro-fresh.test.ts` or `pro-mature.test.ts` | +| Agent behavior | 5 | `user-journey.test.ts` | +| Performance regression | 6 | `tests/performance/` | +| Hook scripts | — | `hooks/tests/test-hooks.sh` | + + +For Tier 5 user-journey tests, keep prompts simple and use `appendSystemPrompt` to constrain agent behavior. Test that tools are _called_, not that the agent says specific words. + + +## MCP Protocol Compliance + +A dedicated compliance suite validates gitmem-mcp against the MCP protocol specification. See [Compliance Report](/docs/contributing/compliance) for the full report. + +**Latest result:** 36/36 PASS ```bash -npm run test:unit && npx vitest run --config vitest.smoke.config.ts +GITMEM_TIER=free node tests/compliance/mcp-protocol-compliance.mjs ``` -This covers 670+ tests in under 10 seconds. - ======================================================================== # Configuration URL: https://gitmem.ai/docs/getting-started/configuration diff --git a/apps/docs/public/llms.txt b/apps/docs/public/llms.txt index f6ece5e..bc6fdac 100644 --- a/apps/docs/public/llms.txt +++ b/apps/docs/public/llms.txt @@ -46,12 +46,14 @@ Full docs: https://gitmem.ai/llms-full.txt - [Changelog](https://gitmem.ai/docs/changelog): GitMem release history. - [Core Concepts](https://gitmem.ai/docs/concepts): Understand the building blocks of GitMem's institutional memory system. - [Learning Types](https://gitmem.ai/docs/concepts/learning-types): The four types of institutional memory in GitMem. +- [Local Storage](https://gitmem.ai/docs/concepts/local-storage): How GitMem uses the local filesystem, what persists across sessions, and container deployment considerations. - [Scars](https://gitmem.ai/docs/concepts/scars): How GitMem captures and surfaces mistakes as institutional memory. - [Sessions](https://gitmem.ai/docs/concepts/sessions): How GitMem tracks bounded work periods with context loading and closing ceremonies. -- [Threads](https://gitmem.ai/docs/concepts/threads): Track unresolved work across sessions with GitMem threads. +- [Threads](https://gitmem.ai/docs/concepts/threads): Track unresolved work across sessions with lifecycle management, vitality scoring, and semantic deduplication. - [Tiers](https://gitmem.ai/docs/concepts/tiers): GitMem's tier system and how it affects available features. +- [MCP Protocol Compliance](https://gitmem.ai/docs/contributing/compliance): Full MCP protocol compliance report — 36/36 tests passing. - [Contributing](https://gitmem.ai/docs/contributing): How to contribute to GitMem. -- [Testing](https://gitmem.ai/docs/contributing/testing): GitMem's test pyramid and how to run each tier. +- [Testing](https://gitmem.ai/docs/contributing/testing): GitMem's 6-tier test pyramid, CI pipeline, and how to run each tier. - [Configuration](https://gitmem.ai/docs/getting-started/configuration): Configure GitMem with environment variables and config files. - [Your First Session](https://gitmem.ai/docs/getting-started/first-session): Walk through a complete GitMem session from start to close. - [Free vs Pro](https://gitmem.ai/docs/getting-started/free-vs-pro): What you get today with the free tier and what's coming with Pro. diff --git a/docs/architecture/local-storage.md b/docs/architecture/local-storage.md deleted file mode 100644 index b6b5ba6..0000000 --- a/docs/architecture/local-storage.md +++ /dev/null @@ -1,185 +0,0 @@ -# GitMem Local Storage Architecture - -> How GitMem uses the local filesystem, what persists across sessions, and what breaks in ephemeral containers. - -## The Problem - -GitMem's value proposition is **cross-session institutional memory**. But users running in locked-down containers (CI, ephemeral dev environments, Docker-per-session) lose all local state between invocations. This document maps exactly what lives where so operators can make informed decisions about persistence. - -## Storage Locations - -GitMem writes to two locations: - -| Location | What | Owner | -|----------|------|-------| -| `/.gitmem/` | Session state, threads, config, caches | GitMem MCP server | -| `~/.cache/gitmem/` | Search result cache (15-min TTL) | GitMem MCP server | - -A third location exists but is **not created by GitMem**: - -| Location | What | Owner | -|----------|------|-------| -| `~/.claude/projects//*.jsonl` | Conversation transcripts | Claude Code CLI | - -GitMem reads from the Claude Code transcripts during `session_close` (transcript capture), but never writes to or manages them. - -## File-by-File Inventory - -### `.gitmem/` — Core State - -``` -.gitmem/ -├── active-sessions.json # 478B Process lifecycle -├── config.json # 63B Project defaults -├── sessions.json # 644B Recent session index (free tier SOT) -├── threads.json # ~5KB Thread state cache / free tier SOT -├── suggested-threads.json # ~2B AI-suggested threads -├── closing-payload.json # (ephemeral — deleted after use) -├── cache/ -│ └── hook-scars.json # ~517KB Local scar copy for hooks plugin -├── hooks-state/ -│ ├── start_time # 11B Session start timestamp -│ ├── tool_call_count # 2B Recall nag counter -│ ├── last_nag_time # 2B Last recall reminder time -│ ├── stop_hook_active # 0B Lock file (re-entrancy guard) -│ └── audit.jsonl # ~4KB Hook execution log -└── sessions/ - └── / - └── session.json # ~6KB Per-session state (scars, confirmations) -``` - -**Total typical footprint: ~530KB** (dominated by `cache/hook-scars.json`). - -### File Lifecycle - -| File | Created | Updated | Deleted | Survives Session Close? | -|------|---------|---------|---------|------------------------| -| `active-sessions.json` | `session_start` | Every session start/close | Never (entries pruned) | Yes — multi-session registry | -| `config.json` | First `session_start` | Rarely | Never | Yes | -| `sessions.json` | `session_close` (free tier) | Each close | Never | Yes | -| `threads.json` | `session_close` | Each close | Never | Yes | -| `suggested-threads.json` | `session_close` | Each close | Never | Yes | -| `closing-payload.json` | Agent writes before close | Never | `session_close` deletes it | **No** — ephemeral | -| `cache/hook-scars.json` | Hooks plugin startup | Periodically refreshed | Never | Yes | -| `hooks-state/*` | Session start | During session | `start_time` reset each session | Partially | -| `sessions//session.json` | `session_start` | `recall`, `confirm_scars` | `session_close` cleans up | **No** — cleaned up on close | - -## Cross-Session Data Flow - -### What the next session needs - -When `session_start` runs, it loads context from these sources: - -| Data | Pro/Dev Tier Source | Free Tier Source | -|------|--------------------|--------------------| -| Last session (decisions, reflection) | Supabase `sessions` | `.gitmem/sessions.json` | -| Open threads | Supabase `threads` | `.gitmem/threads.json` | -| Recent decisions | Supabase `decisions` | `.gitmem/sessions.json` (embedded) | -| Scars for recall | Supabase `learnings` | `.gitmem/learnings.json` | -| Suggested threads | `.gitmem/suggested-threads.json` | `.gitmem/suggested-threads.json` | - -### What `recall` needs - -| Tier | Source | Search Method | -|------|--------|---------------| -| Pro/Dev | Supabase `learnings` | Semantic (embedding cosine similarity) | -| Pro/Dev (cached) | `~/.cache/gitmem/results/` | Local vector search (15-min TTL) | -| Free | `.gitmem/learnings.json` | Keyword tokenization match | - -### What `session_close` persists - -| Data | Pro/Dev Destination | Free Destination | -|------|--------------------|--------------------| -| Session record | Supabase `sessions` | `.gitmem/sessions.json` | -| New learnings | Supabase `learnings` | `.gitmem/learnings.json` | -| Decisions | Supabase `decisions` | `.gitmem/decisions.json` | -| Thread state | Supabase `threads` + `.gitmem/threads.json` | `.gitmem/threads.json` | -| Scar usage | Supabase `scar_usage` | `.gitmem/scar_usage.json` | -| Transcript | Supabase storage bucket | Not captured | - -## The Container Problem - -### Scenario: Ephemeral container per session - -``` -Container A (session 1) → writes .gitmem/ → container destroyed -Container B (session 2) → fresh .gitmem/ → no history -``` - -**Impact by tier:** - -| Tier | Cross-Session Memory | What Breaks | -|------|---------------------|-------------| -| **Pro/Dev** | **Works** — Supabase is SOT | Hooks plugin cold-starts each time (re-downloads scar cache). Suggested threads lost. Minor UX friction, no data loss. | -| **Free** | **Completely broken** — all memory is local files | No scars, no threads, no session history, no decisions. Each session is amnesic. | - -### Scenario: Persistent volume mount - -``` -docker run -v gitmem-data:/app/.gitmem ... -``` - -| Tier | Cross-Session Memory | Notes | -|------|---------------------|-------| -| **Pro/Dev** | **Works perfectly** | Local files are caches; Supabase is SOT | -| **Free** | **Works** | Local files ARE the SOT; volume mount preserves them | - -### Scenario: Shared container (long-running) - -``` -Container stays alive across multiple `claude` invocations -``` - -Both tiers work. `.gitmem/` persists because the container persists. - -## Recommendations for Container Deployments - -### Minimum viable persistence (free tier) - -Mount a volume for `.gitmem/`: -```yaml -volumes: - - gitmem-state:/workspace/.gitmem -``` - -Files that MUST persist for free tier cross-session: -- `learnings.json` (scars — the whole point) -- `threads.json` (open work tracking) -- `sessions.json` (session history for context) -- `decisions.json` (decision log) - -### Minimum viable persistence (pro/dev tier) - -**Nothing required.** Supabase is the source of truth. Local files are caches/working state. A fresh `.gitmem/` each session works — just slightly slower (cache cold start). - -Optional for better UX: -```yaml -volumes: - - gitmem-cache:/workspace/.gitmem/cache # Avoids scar cache re-download -``` - -### What about Claude Code transcripts? - -The `~/.claude/projects/` directory accumulates conversation transcripts (~2MB each). In long-running containers, these can grow to hundreds of megabytes. These are: -- Created by Claude Code, not GitMem -- Read by GitMem during transcript capture (pro/dev `session_close`) -- Never cleaned up automatically -- Not required for GitMem to function - -For ephemeral containers: don't bother persisting these. GitMem's transcript capture uploads them to Supabase before the container dies (if pro/dev tier). For free tier, transcripts aren't captured at all. - -## Architecture Decision: Why Local Files Exist at All - -Given that pro/dev tier uses Supabase, why does GitMem write local files? - -1. **`active-sessions.json`** — Process lifecycle tracking (PIDs, hostnames). This is inherently local — Supabase can't know if a process is still alive. - -2. **`sessions//session.json`** — Survives context compaction. When Claude Code compresses the conversation, the MCP server's in-memory state is fine, but the LLM loses context. The agent reads this file to recover session_id and surfaced scars. This is a Claude Code architectural constraint, not a GitMem choice. - -3. **`threads.json`** — Cache/fallback. If Supabase is down or slow, session_start can still show threads. - -4. **`cache/hook-scars.json`** — The hooks plugin runs as shell scripts (not MCP), so it can't call Supabase directly. It needs a local scar copy for fast pattern matching. - -5. **`closing-payload.json`** — MCP tool calls have size limits. Writing the payload to a file and passing only the session_id keeps the MCP call clean. - -6. **Free tier files** — The entire free tier runs without Supabase. Local JSON files ARE the database. diff --git a/docs/compliance/mcp-protocol-compliance.md b/docs/compliance/mcp-protocol-compliance.md deleted file mode 100644 index e96f3f7..0000000 --- a/docs/compliance/mcp-protocol-compliance.md +++ /dev/null @@ -1,169 +0,0 @@ -# MCP Protocol Compliance Report - -> **Date:** 2026-02-16 -> **Version:** v1.0.3 (`d7f4876`) -> **Tier Tested:** free -> **Tool:** MCP Inspector v0.15.0 (`@modelcontextprotocol/inspector`) + custom compliance suite -> **Verdict:** PASS — full MCP protocol compliance (36/36) - ---- - -## Test Results - -| Category | Tests | Result | -|----------|-------|--------| -| Protocol Handshake | 9 | 9/9 | -| Tool Listing | 4 | 4/4 | -| Schema Validation | 3 | 3/3 | -| Tool Execution | 10 | 10/10 | -| Error Handling | 4 | 4/4 | -| Response Format | 6 | 6/6 | -| **Total** | **36** | **36/36** | - ---- - -## 1. Protocol Handshake (9/9) - -Tests JSON-RPC 2.0 `initialize` method and `notifications/initialized` lifecycle. - -| Test | Result | -|------|--------| -| initialize returns result | PASS | -| has protocolVersion | PASS | -| protocolVersion is string | PASS | -| has serverInfo | PASS | -| serverInfo.name exists | PASS | -| serverInfo.version exists | PASS | -| has capabilities | PASS | -| capabilities.tools exists | PASS | -| initialized notification accepted | PASS | - -## 2. Tool Listing (4/4) - -| Test | Result | -|------|--------| -| tools/list returns result | PASS | -| result has tools array | PASS | -| at least 1 tool registered | PASS | -| tool count (21) is reasonable (5-100) | PASS | - -**Note:** 21 tools in free tier. Pro tier exposes 67, dev tier 73. - -## 3. Tool Schema Validation (3/3) - -Every tool's `inputSchema` validated against JSON Schema and MCP spec requirements. - -| Test | Result | -|------|--------| -| all tool schemas valid (type, required, property types, descriptions) | PASS | -| all descriptions >= 30 chars | PASS | -| no duplicate tool names | PASS | - -### Per-Tool Schema Detail - -| Tool | Params | Required | Description Length | -|------|--------|----------|-------------------| -| recall | 5 | 1 | 163 chars | -| confirm_scars | 1 | 1 | 287 chars | -| session_start | 7 | 0 | 327 chars | -| session_refresh | 1 | 0 | 406 chars | -| session_close | 4 | 2 | 689 chars | -| create_learning | 13 | 3 | 58 chars | -| create_decision | 9 | 3 | 62 chars | -| record_scar_usage | 11 | 4 | 52 chars | -| search | 5 | 1 | 171 chars | -| log | 5 | 0 | 111 chars | -| prepare_context | 5 | 2 | 147 chars | -| absorb_observations | 2 | 1 | 181 chars | -| list_threads | 3 | 0 | 172 chars | -| resolve_thread | 3 | 0 | 144 chars | -| create_thread | 2 | 1 | 233 chars | -| promote_suggestion | 2 | 1 | 146 chars | -| dismiss_suggestion | 1 | 1 | 114 chars | -| cleanup_threads | 2 | 0 | 229 chars | -| health | 1 | 0 | 210 chars | -| gitmem-help | 0 | 0 | 59 chars | -| archive_learning | 2 | 1 | 196 chars | - -## 4. Tool Execution (10/10) - -Live tool calls via MCP STDIO transport. - -| Test | Result | -|------|--------| -| gitmem-help returns result | PASS | -| result has content array | PASS | -| content[0].type === "text" | PASS | -| content[0].text is non-empty | PASS | -| search returns result | PASS | -| search result has content | PASS | -| search content is text type | PASS | -| recall returns result | PASS | -| recall has content array | PASS | -| log returns result | PASS | - -## 5. Error Handling (4/4) - -| Test | Result | -|------|--------| -| unknown tool returns error | PASS | -| unknown method returns JSON-RPC error | PASS | -| error has numeric code | PASS | -| error code is -32601 (Method not found) | PASS | - -## 6. Response Format Compliance (6/6) - -| Test | Result | -|------|--------| -| all responses include jsonrpc: "2.0" | PASS | -| all responses include matching id | PASS | -| content block has type field | PASS | -| text block has text field | PASS | -| successful calls have isError=false or undefined | PASS | -| resources/list returns -32601 (not implemented) | PASS | - ---- - -## Protocol Features Not Implemented - -These are optional MCP capabilities that gitmem does not expose: - -| Feature | Status | Reason | -|---------|--------|--------| -| `resources/list` | Not implemented (-32601) | No resources exposed; tools-only server | -| `prompts/list` | Not implemented (-32601) | No prompt templates; tool-driven UX | -| `resources/templates/list` | Not implemented | No dynamic resources | - -These are valid omissions — the MCP spec does not require servers to implement all capabilities. - ---- - -## Test Infrastructure - -The compliance test script lives at `tests/compliance/mcp-protocol-compliance.mjs`. It: - -1. Spawns the MCP server as a child process via STDIO -2. Performs the full JSON-RPC 2.0 handshake (initialize → notifications/initialized) -3. Validates tool schemas against MCP spec -4. Executes tool calls and validates response format -5. Tests error handling for unknown tools/methods -6. Reports pass/fail with colored terminal output - -### Running - -```bash -cd /workspace/gitmem -GITMEM_TIER=free node tests/compliance/mcp-protocol-compliance.mjs -``` - ---- - -## MCP Testing Tools Evaluated - -| Tool | Source | Used | Notes | -|------|--------|------|-------| -| **MCP Inspector** | `@modelcontextprotocol/inspector` (npm) | Yes | Official Anthropic tool. `--cli` mode for headless. v0.15.0 | -| **Custom compliance suite** | `tests/compliance/mcp-protocol-compliance.mjs` | Yes | 36 tests covering full protocol spec | -| **Janix-ai/mcp-validator** | GitHub only | No | No npm package; clone-only. Supports STDIO + HTTP + OAuth 2.1 | -| **RHEcosystemAppEng/mcp-validation** | GitHub only | No | Protocol + security scanning via mcp-scan | -| **mcp-testing-framework** | GitHub only | No | Multi-model batch evaluation of tool call quality | diff --git a/docs/launch-test-plan.md b/docs/launch-test-plan.md deleted file mode 100644 index d97d9e8..0000000 --- a/docs/launch-test-plan.md +++ /dev/null @@ -1,163 +0,0 @@ -# GitMem v1.0.1 Launch Test Plan (Human Verification) - -**Date:** 2026-02-16 -**Pre-req:** Commit `b284061` on main, version bumped to 1.0.1, published to npm - ---- - -## Phase 1: Local Build Verification - -### Setup - -```bash -cd /path/to/gitmem -git pull origin main -npm run build -``` - -Point Claude Desktop at local build in `~/Library/Application Support/Claude/claude_desktop_config.json`: - -```json -{ - "mcpServers": { - "gitmem": { - "command": "node", - "args": ["/absolute/path/to/gitmem/dist/server.js"] - } - } -} -``` - -Restart Claude Desktop (Cmd+Q, reopen). - -### Test 1: No stdout corruption - -Tell the agent: -> "Run `gm-open` to start a session" - -- [ ] No red error toast appears -- [ ] No "invalid JSON" errors in the UI -- [ ] Session starts cleanly with session_id and agent displayed - -### Test 2: Error messages surface - -Tell the agent: -> "Create a scar with title 'test scar' and description 'testing error surface' but don't include severity" - -- [ ] Response contains `errors` array (not just `success: false`) -- [ ] Error message mentions missing severity -- [ ] You can read the error and understand what went wrong - -### Test 3: Session ID validation - -Tell the agent: -> "Close the session with session_id 'SESSION_AUTO'" - -- [ ] Returns validation error (not a confusing DB lookup failure) -- [ ] Error message says "Invalid session_id format" -- [ ] Error message includes a UUID example -- [ ] Error message suggests running session_start first - -### Test 4: Arbitrary project names - -Tell the agent: -> "Start a new session with project 'my-cool-project'" - -- [ ] Session starts successfully -- [ ] No enum rejection error -- [ ] Project shows as `my-cool-project` in the response - -### Test 5: Happy path end-to-end - -Run these in sequence: - -1. `gm-open` (start session) -2. `gitmem-r` with plan "verify launch readiness" (recall scars) -3. `gm-close` (close session) - -- [ ] All three complete without errors -- [ ] Session opens, scars surface (or "no scars found" message), session closes cleanly - ---- - -## Phase 2: npm Publish + Verification - -### Publish - -```bash -cd /path/to/gitmem - -# Version should already be 1.0.1 in package.json -npm run build -npm publish - -# Verify on npm -npm info gitmem-mcp version -# Expected: 1.0.1 -``` - -### Fresh install verification - -```bash -# Install fresh from npm (new terminal) -npx gitmem-mcp@1.0.1 --help 2>&1 | head -5 -``` - -### Point Claude Desktop at npm package - -Update `claude_desktop_config.json`: - -```json -{ - "mcpServers": { - "gitmem": { - "command": "npx", - "args": ["-y", "gitmem-mcp@1.0.1"] - } - } -} -``` - -Restart Claude Desktop. - -### Re-run Tests 1-5 from Phase 1 - -- [ ] Test 1: No stdout corruption -- [ ] Test 2: Error messages surface -- [ ] Test 3: Session ID validation -- [ ] Test 4: Arbitrary project names -- [ ] Test 5: Happy path end-to-end - ---- - -## Pass/Fail - -**All 5 tests must pass in both phases to ship.** - -| Phase | Test | Pass? | -|-------|------|-------| -| Local | 1. No stdout corruption | | -| Local | 2. Error messages surface | | -| Local | 3. Session ID validation | | -| Local | 4. Arbitrary project names | | -| Local | 5. Happy path e2e | | -| npm | 1. No stdout corruption | | -| npm | 2. Error messages surface | | -| npm | 3. Session ID validation | | -| npm | 4. Arbitrary project names | | -| npm | 5. Happy path e2e | | - ---- - -## Ship Blockers Fixed in v1.0.1 - -| Issue | Fix | Tests | -|-------|-----|-------| -| | 22 console.log replaced with console.error in check.ts | +2 regression tests | -| | errors[] added to create_learning and record_scar_usage results | +8 tests | -| | UUID/short-ID format validation in session_close | +10 tests | -| | Removed hardcoded project enum, free-form string | covered by schema tests | - -## Known Limitations (shipping as v1.0.2) - -- Multi-session concurrency (last-write-wins on active-sessions.json) diff --git a/docs/testing.md b/docs/testing.md deleted file mode 100644 index a7bad72..0000000 --- a/docs/testing.md +++ /dev/null @@ -1,549 +0,0 @@ -# GitMem Testing Guide - -> **Last Updated:** 2026-02-16 · **Test Totals:** 764 tests across 6 tiers - -## Test Pyramid Overview - -GitMem uses a 6-tier testing pyramid. Each tier adds cost/time but tests closer to the real user experience. - -| Tier | Command | Files | Tests | Speed | Cost | What it tests | -|------|---------|-------|-------|-------|------|---------------| -| **1 - Unit** | `npm run test:unit` | 34 | 597 | ~3s | Free | Schema validation, pure functions, golden regressions | -| **2 - Smoke** | `npm run test:smoke` | 2 | 9 | ~5s | Free | MCP server boot, tool registration, basic tool calls via stdio | -| **3 - Integration** | `npm run test:integration` | 5 | 63 | ~30s | Free (Docker) | Real PostgreSQL, session lifecycle, cache behavior, query plans | -| **4 - E2E** | `npm run test:e2e` | 6 | 68 | ~90s | Free (Docker for pro) | CLI install flow, hooks, free/pro tier MCP via stdio | -| **5 - User Journey** | `npm run test:e2e -- tests/e2e/user-journey.test.ts` | 1 | 6 | ~60s | API calls | Real Claude session via Agent SDK | -| **6 - Performance** | `npm run test:perf` | 4 | benchmarks | ~30s | Free | Cold start, recall latency, cache hit rate microbenchmarks | - -**Run all:** `npm run test:all` (runs tiers 1-4 + 6; excludes user-journey) - -**Before pushing:** Always run `npm run test:unit` at minimum. - -**Before shipping to npm:** Run tiers 1-5. Tier 5 (User Journey) is the most important gate — it spawns a real Claude session and verifies the full hook + MCP + agent behavior chain. - -### CI Pipeline - -Source: `.github/workflows/ci.yml` - -**Triggers:** Push to `main`, push of `v*` tags, PRs against `main`. - -#### Build job (matrix: Node 18, 20, 22) - -Each step does exactly one thing. No step re-runs another step's work. - -| Step | Command | What it does | -|------|---------|-------------| -| Type check | `npm run typecheck` | `tsc --noEmit` — verify types, no output | -| Build | `npm run build` | `tsc` — compile to `dist/` | -| Unit tests | `npm run test:unit` | 764 tests via vitest (one execution) | -| Smoke test | `npm run test:smoke:free` | 4 MCP integration tests (server boot, tools, session lifecycle) | - -#### Publish job (tag pushes only) - -Runs after all 3 build matrix jobs pass. Only fires on `v*` tag pushes. - -| Step | Command | What it does | -|------|---------|-------------| -| Install | `npm ci --legacy-peer-deps` | Clean install (legacy-peer-deps for zod@4 conflict) | -| Build | `npm run build` | `tsc` — compile to `dist/` | -| Publish | `npm publish` | Publish to npm via `NPM_TOKEN` secret. `prepublishOnly` runs `tsc` (compile only, no tests) | - -#### Release workflow - -```bash -# 1. Make changes, commit -# 2. Bump version -npm version patch # or: edit package.json + CHANGELOG.md manually -# 3. Tag and push -git tag v1.0.X -git push origin main --tags -# CI builds → tests → publishes automatically -``` - -#### What's NOT in CI - -| Test tier | Why not | How to run | -|-----------|---------|-----------| -| Integration (Tier 3) | Needs Docker | `npm run test:integration` locally | -| E2E pro (Tier 4) | Needs Docker | `npm run test:e2e` locally | -| User Journey (Tier 5) | Needs Claude API key, costs ~$0.30 | `npm run test:e2e -- tests/e2e/user-journey.test.ts` | -| Performance (Tier 6) | Benchmarks, not pass/fail | `npm run test:perf` | - -#### CI design principles - -- **`build` = compile only.** Never embed tests in the build script. CI has dedicated test steps. -- **`prepublishOnly` = compile only.** The publish job runs after all tests pass — no need to re-test. -- **`--legacy-peer-deps`** required because `@anthropic-ai/claude-agent-sdk` depends on `zod@^4.0.0` while gitmem uses `zod@3.x`. -- **Tests set `GITMEM_DIR`** to temp directories. Without this, tests write to `~/.gitmem/` (which may not exist in CI) or `process.cwd()/.gitmem/` (wrong path when `GITMEM_DIR` is set elsewhere). - ---- - -## Prerequisites - -### Docker (Tiers 3-4) - -Integration and pro E2E tests use [Testcontainers](https://node.testcontainers.org/) to spin up `pgvector/pgvector:pg16` PostgreSQL containers. - -**Requirements:** -- Docker daemon running and accessible -- `docker info` must succeed -- Tests skip gracefully when Docker is unavailable - -**Docker-in-Docker:** If running inside a container, mount the host Docker socket (`/var/run/docker.sock`) to enable Testcontainers. - -### Auth Schema Stub - -Plain pgvector PostgreSQL doesn't include Supabase's `auth` schema. All Docker-based tests must stub it before loading `schema/setup.sql`: - -```sql -CREATE SCHEMA IF NOT EXISTS auth; -CREATE OR REPLACE FUNCTION auth.role() RETURNS TEXT AS $$ - SELECT 'service_role'::TEXT; -$$ LANGUAGE sql; -``` - -This stub is applied in `tests/integration/setup.ts` (shared) and individually in `pro-fresh.test.ts` and `pro-mature.test.ts`. - -### Claude CLI (Tier 5) - -User Journey tests require the Claude CLI installed and authenticated. Detection: `claude --version`. - ---- - -## Tier 1 — Unit Tests (597 tests, 34 files) - -Pure unit tests with no external dependencies. Fast, deterministic, run everywhere. - -### Test Categories - -| Category | Files | What it covers | -|----------|-------|----------------| -| **Schemas** | 13 files (`tests/unit/schemas/`) | Zod schema validation for all tool inputs: recall, session-start, session-close, create-learning, create-decision, search, analyze, log, prepare-context, absorb-observations, record-scar-usage, transcript, common | -| **Services** | 11 files (`tests/unit/services/`) | Thread manager (dedup, lifecycle, suggestions, vitality, Supabase sync, triples), active sessions (locking, multi-session), file locks, gitmem-dir, timezone | -| **Tools** | 2 files (`tests/unit/tools/`) | absorb-observations, prepare-context | -| **Hooks** | 2 files (`tests/unit/hooks/`) | format-utils, quick-retrieve | -| **Diagnostics** | 4 files (`tests/unit/diagnostics/`) | anonymizer, channels, check-command, collector | -| **Golden Regressions** | 1 file (`tests/unit/golden-regressions.test.ts`) | 11 tests replaying specific historical bugs | -| **Standalone** | 3 files (`tests/variant-*.test.ts`) | Variant assignment, variant enforcement, variant missing issue ID (21 tests total) | - -### Configuration - -- **Config:** `vitest.config.ts` -- **Includes:** `tests/unit/**`, `tests/od-*.test.ts` -- **Excludes:** integration, e2e, smoke, performance - ---- - -## Tier 2 — Smoke Tests (9 tests, 2 files) - -Boot the MCP server via stdio transport and verify basic functionality. - -| Suite | Tests | What it covers | -|-------|-------|----------------| -| `smoke-free.test.ts` | 4 | Free tier server boot, tool list, basic recall | -| `smoke-pro.test.ts` | 5 | Pro tier server boot (skips without Supabase credentials) | - -### Configuration - -- **Config:** `vitest.smoke.config.ts` -- **Commands:** `npm run test:smoke:free` (always runs), `npm run test:smoke:pro` (skips without creds) - ---- - -## Tier 3 — Integration Tests (63 tests, 5 files) - -Tests against a real PostgreSQL database via Testcontainers. Catches issues mocks would miss: missing indexes, query plan regressions, schema drift. - -### Shared Setup (`tests/integration/setup.ts`) - -All integration tests share a single Testcontainers setup that: -1. Starts `pgvector/pgvector:pg16` container -2. Stubs `auth.role()` for Supabase compatibility -3. Loads `schema/setup.sql` -4. Sets `DATABASE_URL`, `SUPABASE_URL`, `GITMEM_TIER=pro` environment variables -5. Provides helpers: `truncateAllTables()`, `indexExists()`, `getQueryPlan()`, `analyzeQueryPlan()`, `generateRandomVector()`, `formatVector()` - -### Suites - -| Suite | Tests | What it covers | -|-------|-------|----------------| -| `fresh-install.test.ts` | ~12 | Empty database behavior, first session, first learning creation | -| `session-lifecycle.test.ts` | ~15 | Session create/close, concurrent sessions, close compliance | -| `cache-behavior.test.ts` | 9 | Cache file operations, TTL expiry, cache symmetry (decisions/wins), scar search caching | -| `query-plans.test.ts` | ~12 | Index usage verification (EXPLAIN), query performance at scale | -| `scale-profiles.test.ts` | ~15 | Behavior at 0, 15, 100, 500, 1000 scars | - -### Configuration - -- **Config:** `vitest.integration.config.ts` -- **Timeout:** 120s container startup, 30s per test -- **Skip:** Entire suite skips if Docker unavailable - ---- - -## Tier 4 — E2E Tests (68 tests, 6 files) - -Tests CLI commands and MCP protocol end-to-end. Pro tests spawn Testcontainers. - -### Suites - -#### `cli-fresh-install.test.ts` — 27 tests, free - -Tests the CLI commands a new user runs when installing gitmem: - -- **Init CLI**: `gitmem init` creates `.gitmem/`, starter scars, permissions in `.claude/settings.json` -- **Check CLI**: `gitmem check` health check passes on initialized project -- **Hooks CLI**: `gitmem install-hooks` writes project-level hooks, preserves existing permissions, `--force` overwrites, `gitmem uninstall-hooks` removes hooks cleanly -- **Hook Script Output**: Direct execution of `session-start.sh` and `session-close-check.sh` verifying protocol wording ("YOU (the agent) ANSWER"), no internal reference leaks -- **Output Sanitization**: All CLI commands checked for internal reference leaks; `CLAUDE.md.template` and `starter-scars.json` verified clean - -#### `free-tier.test.ts` — 15 tests, free - -Free tier MCP functionality via stdio transport: session lifecycle, recall, create_learning, create_decision, record_scar_usage, parameter validation (golden regression for 2026-02-03 crash). - -#### `pro-fresh.test.ts` — 11 tests, Docker required - -Pro tier with Supabase PostgreSQL (Testcontainers + pgvector, 15 starter scars): - -- Tool registration (core + pro tools) -- Recall with starter scars (finds deployment-related matches) -- Session lifecycle (start, close) -- Create learning and create decision -- Pro-only tools: `analyze`, `gitmem-cache-status` - -**Architectural note:** The MCP server communicates with Supabase via PostgREST (HTTP), not direct PostgreSQL connections. Tests verify MCP tool success, not direct database persistence. - -#### `pro-mature.test.ts` — 7 tests, Docker required - -Pro tier at scale (1000 seeded scars): - -- Recall performance within baseline (2000ms × 1.5) -- Cache hit rate (second recall faster) -- Session start within baseline -- Search within baseline -- Sequential operation throughput (4 ops < 10s) -- Data volume verification (1000 learnings with embeddings) - -#### `organic-discovery.test.ts` — 2 tests, API calls - -Multi-session organic adoption measurement. Tests whether agents discover and adopt gitmem with varying nudge configurations. Agent SDK-based, costs ~$0.30 per 3-session chain. - -#### `user-journey.test.ts` — 6 tests, API calls (Tier 5) - -See Tier 5 section below. - -### MCP Test Client (`tests/e2e/mcp-client.ts`) - -Shared test infrastructure for E2E tests. Spawns `dist/index.js` as a child process and connects via MCP SDK's `StdioClientTransport`. - -**Key exports:** - -| Export | Purpose | -|--------|---------| -| `createMcpClient(env)` | Spawn MCP server with custom env, return connected client | -| `callTool(client, name, args)` | Call an MCP tool and return typed result | -| `listTools(client)` | List registered tools | -| `getToolResultText(result)` | Extract text from tool result | -| `parseToolResult(result)` | Parse JSON from tool result (handles markdown code blocks) | -| `isToolError(result)` | Check if result is an error | -| `CORE_TOOLS` | Tools available in all tiers: recall, session_start, session_close, create_learning, create_decision, record_scar_usage, search, log | -| `PRO_TOOLS` | Pro-tier tools: analyze, gitmem-cache-status, gitmem-cache-health, gitmem-cache-flush | -| `DEV_TOOLS` | Dev-only tools: record_scar_usage_batch, save_transcript, get_transcript | -| `EXPECTED_TOOL_COUNTS` | free: 55, pro: 67, dev: 73 | - -### Configuration - -- **Config:** `vitest.e2e.config.ts` -- **Timeout:** 120-180s for container startup, 30s per test - ---- - -## Tier 5 — User Journey (6 tests, 1 file) - -**This is the most important pre-ship gate.** Spawns a real Claude session and verifies the full user experience. - -### `user-journey.test.ts` — 6 tests, ~60s, costs API calls - -Uses the **Claude Agent SDK** (`@anthropic-ai/claude-agent-sdk`) to spawn real Claude sessions against a test directory with gitmem installed. - -Key design decisions: -- **Agent SDK, not subprocess**: Uses `query()` from the SDK — runs in-process, returns typed `SDKMessage` events. Much faster and more reliable than spawning `claude -p` as a subprocess. -- **Haiku model**: Fast, cheap. Budget capped at $1/test. -- **Thinking disabled**: No extended thinking needed for test prompts. -- **`settingSources: ["project"]`**: Loads `.claude/settings.json` from the test directory (picks up installed hooks). -- **`persistSession: false`**: No session files written to disk. -- **PreToolUse hook observer**: Programmatic `HookCallback` that records every tool call the agent makes. - -What it verifies: -1. **SessionStart hook fires** — hook_started + hook_response events, exit code 0 -2. **MCP tools registered** — init event lists 10+ `mcp__gitmem__*` tools, core tools present, server status "connected" -3. **Agent calls session_start** — observed via PreToolUse hook callback -4. **Agent calls recall** — observed via PreToolUse hook callback, session completes successfully -5. **No internal references** — hook output and result text checked -6. **Correct ceremony wording** — "YOU (the agent) ANSWER" and "session_start" in hook output - -Test setup creates a temp directory with: -``` -/tmp/gitmem-journey-xxx/ - .gitmem/ # from `gitmem init` - .mcp.json # points to built dist/index.js - .claude/settings.json # from `gitmem install-hooks` (hooks + permissions) - node_modules/gitmem-mcp/hooks -> /workspace/gitmem/hooks # symlink -``` - ---- - -## Tier 6 — Performance Benchmarks (4 files) - -Vitest `bench()` microbenchmarks measuring operation latency with statistical rigor. - -### Suites - -| Suite | What it benchmarks | -|-------|--------------------| -| `cold-start.bench.ts` | Cache initialization, first session start | -| `recall.bench.ts` | Local vector search at 15 and 1000 scars | -| `cache.bench.ts` | Cache key generation | -| `session-start.bench.ts` | Session start components | - -### Performance Baselines (`tests/performance/baselines.ts`) - -Target latencies derived from performance targets and production measurements. Tests fail if measurement exceeds baseline × 1.5 (alert threshold). - -| Component | Baseline (ms) | Source | -|-----------|--------------|--------| -| `session_start_total` | 750 | Lean start | -| `recall_with_scars` | 2000 | Production | -| `recall_empty` | 500 | Production | -| `scar_search_local` | 100 | Production | -| `scar_search_remote` | 2000 | Production | -| `session_close_total` | 1500 | Blocking path only | -| `create_learning` | 3000 | Production | -| `create_decision` | 3000 | Production | -| `cache_hit` | 5 | Production | -| `cache_key_generation` | 1 | Production | - -### Configuration - -- **Config:** `vitest.perf.config.ts` -- **Command:** `npm run test:perf` (runs `vitest bench`, not `vitest run`) -- **Output:** Results written to `tests/performance/results.json` -- **Pool:** Single fork for consistent measurements - ---- - -## Agent SDK Testing Pattern - -The `user-journey.test.ts` file establishes a reusable pattern for testing Claude CLI integrations from code: - -```typescript -import { query } from "@anthropic-ai/claude-agent-sdk"; -import type { SDKMessage, HookCallback, PreToolUseHookInput } from "@anthropic-ai/claude-agent-sdk"; - -// Collect observations -const toolCalls: string[] = []; -const hookObserver: HookCallback = async (input) => { - if (input.hook_event_name === "PreToolUse") { - toolCalls.push((input as PreToolUseHookInput).tool_name); - } - return {}; -}; - -// Run session -for await (const msg of query({ - prompt: "Do something", - options: { - cwd: "/path/to/project", - model: "haiku", - maxTurns: 5, - maxBudgetUsd: 1.0, - permissionMode: "bypassPermissions", - allowDangerouslySkipPermissions: true, - persistSession: false, - settingSources: ["project"], // loads .claude/settings.json hooks - thinking: { type: "disabled" }, - hooks: { - PreToolUse: [{ hooks: [hookObserver] }], - }, - }, -})) { - if (msg.type === "system" && msg.subtype === "init") { - // Access: msg.tools, msg.mcp_servers, msg.session_id - } - if (msg.type === "system" && msg.subtype === "hook_response") { - // Access: msg.hook_event, msg.exit_code, msg.stdout, msg.outcome - } - if (msg.type === "result") { - // Access: msg.subtype ("success"/"error"), msg.total_cost_usd - } -} - -// Assert on observations -expect(toolCalls).toContain("mcp__gitmem__session_start"); -``` - -### Key SDK Options for Testing - -| Option | Value | Why | -|--------|-------|-----| -| `model` | `"haiku"` | Fastest, cheapest | -| `maxTurns` | 2-5 | Prevent runaway | -| `maxBudgetUsd` | 1.0 | Hard cost cap | -| `permissionMode` | `"bypassPermissions"` | No interactive prompts | -| `allowDangerouslySkipPermissions` | `true` | Required with bypassPermissions | -| `persistSession` | `false` | No disk state | -| `settingSources` | `["project"]` | Load project hooks | -| `thinking` | `{ type: "disabled" }` | No extended thinking | - -### Why SDK over `claude -p` subprocess - -| | Agent SDK (`query()`) | Subprocess (`claude -p`) | -|---|---|---| -| Speed | ~10s per session | 60-120s+ per session | -| Events | Typed `SDKMessage` | Parse NDJSON strings | -| Tool observation | Programmatic `HookCallback` | Parse `tool_use` blocks from JSON | -| MCP config | Via project `.mcp.json` | Via project `.mcp.json` | -| Hook observation | `SDKHookResponseMessage` events | Parse `hook_response` from NDJSON | -| Process model | In-process (spawns CLI as child) | Shell subprocess via `execFile` | -| Environment | Clean (no env var inheritance issues) | Inherits parent env (e.g., `CLAUDE_MODEL`) | - ---- - -## Hook Tests (`hooks/tests/test-hooks.sh`) - -Bash test suite (28 tests) for hook scripts. Tests detection cascades, output format, environment variable handling. Run with: - -```bash -bash hooks/tests/test-hooks.sh -``` - ---- - -## Skip Conditions - -Tests skip gracefully when dependencies are missing: - -| Test | Skip condition | Detection | -|------|---------------|-----------| -| `user-journey.test.ts` | Claude CLI not installed | `claude --version` | -| `organic-discovery.test.ts` | Claude CLI not installed | `claude --version` | -| `pro-fresh.test.ts` | Docker not available | `docker info` | -| `pro-mature.test.ts` | Docker not available | `docker info` | -| `smoke-pro.test.ts` | No Supabase credentials | env check | -| All integration tests | Docker not available | `docker info` | - ---- - -## Tool Tier Gating - -The MCP server gates tools by tier. Tests verify correct gating via `EXPECTED_TOOL_COUNTS`: - -| Tier | Tool Count | Includes | -|------|-----------|----------| -| **free** | 55 | Core tools only | -| **pro** | 67 | + analyze (3), cache management (6), graph traverse (3) | -| **dev** | 73 | + batch operations (2), transcripts (4) | - -Source: `src/tools/definitions.ts` → `getRegisteredTools()`, `src/services/tier.ts` feature flags. - ---- - -## Mapping Changes to Test Tiers - -**The rule:** Test at the tier where your change first touches a real boundary. Every change gets Tier 1. Then add the tier that matches the highest boundary crossed. Don't skip tiers — if you need Tier 3, also run 1 and 2. - -### Decision Framework - -``` -Did you change... - ├─ a pure function or schema? → Tier 1 (always, non-negotiable) - ├─ how the MCP server responds? → + Tier 2 (5s, free) - ├─ a database query or migration? → + Tier 3 (30s, needs Docker) - ├─ a CLI command or hook script? → + Tier 4 (90s, needs Docker for pro) - ├─ what the agent experiences? → + Tier 5 (60s, costs ~$0.30) - └─ performance-sensitive code? → + Tier 6 (30s, free) -``` - -### Boundary Reference - -| Your change touches... | Runtime boundary | Minimum tier | Example | -|------------------------|-----------------|--------------|---------| -| Schema validation, Zod rules | None — all in-memory | **Tier 1** | Adding `.max()` limits to schemas | -| Pure function logic (sorting, formatting, parsing) | None — all in-memory | **Tier 1** | Fixing `normalizeThreads` created_at preservation | -| Tool handler branching, response formatting | None — still pure functions | **Tier 1** | Changing how recall formats output | -| MCP server wiring (tool registration, error responses) | Process spawn + stdio | **Tier 2** | Changing error redaction in server.ts | -| Network calls (`fetch`, `AbortSignal.timeout`) | Network | **Tier 2** (mock) or **Tier 3** (real) | Adding timeouts to fetch calls | -| Database queries, RPC calls, migrations | Network + SQL | **Tier 3** | New Supabase RPC function | -| CLI commands (`gitmem init`, `gitmem check`) | Filesystem + process | **Tier 4** | Changing starter scar loading behavior | -| Hook scripts (SessionStart, SessionClose) | Filesystem + shell | **Tier 4** | Modifying hook output format | -| Agent behavior (does Claude call the right tools?) | Claude API + everything | **Tier 5** | Changing CLAUDE.md ceremony wording | -| Latency of hot paths | Time | **Tier 6** | Optimizing recall search | - -### Practical examples - -**"I added `.max()` to Zod schemas"** -- Boundary: none (pure validation) -- Run: Tier 1 unit tests for the schema -- Also run: Tier 2 smoke — confirms MCP returns clean errors on oversized input - -**"I added `AbortSignal.timeout()` to fetch calls"** -- Boundary: network -- Run: Tier 1 (build compiles) — but unit tests can't meaningfully verify timeout behavior -- Also run: Tier 2 smoke (server still boots), Tier 3 if available (real network calls timeout correctly) - -**"I changed how `loadStarterScars` timestamps entries"** -- Boundary: filesystem -- Run: Tier 1 (test the class method directly with tmpdir) -- Also run: Tier 4 (`cli-fresh-install.test.ts` tests the full `gitmem init` flow) - -**"I changed the session close ceremony wording"** -- Boundary: Claude API (agent must follow new instructions) -- Run: Tier 1 (if schema changed), Tier 4 (hook output format), Tier 5 (agent actually follows ceremony) - -### Pre-commit minimum - -| Situation | Run | -|-----------|-----| -| Any code change | `npm run test:unit` (Tier 1) — always, no exceptions | -| MCP server or tool changes | + `npm run test:smoke:free` (Tier 2) | -| Before pushing to GitHub | Tiers 1 + 2 minimum | -| Before npm publish | Tiers 1-5 (Tier 5 is the ship gate) | - ---- - -## Adding New Tests - -When adding new tests: - -1. **Schema validation / pure logic** (Tier 1): Add to `tests/unit/schemas/` or `tests/unit/services/` — fast, deterministic, no dependencies -2. **Database behavior** (Tier 3): Add to existing integration suite or create new file in `tests/integration/` — needs Docker -3. **Free tier CLI / hooks** (Tier 4): Add to `cli-fresh-install.test.ts` or `free-tier.test.ts` — fast, no API cost -4. **Pro tier MCP** (Tier 4): Add to `pro-fresh.test.ts` or `pro-mature.test.ts` — needs Docker -5. **User experience / agent behavior** (Tier 5): Add to `user-journey.test.ts` — uses Agent SDK, costs API calls -6. **Performance regression** (Tier 6): Add bench to `tests/performance/` — update baselines in `baselines.ts` -7. **Hook scripts**: Add to `hooks/tests/test-hooks.sh` — bash, no dependencies - -For user-journey tests, keep prompts simple and use `appendSystemPrompt` to constrain agent behavior. The agent is non-deterministic — test that tools are _called_, not that the agent says specific words. - ---- - -## MCP Protocol Compliance - -A dedicated compliance suite validates gitmem-mcp against the MCP protocol specification. See [`docs/compliance/mcp-protocol-compliance.md`](compliance/mcp-protocol-compliance.md) for the full report. - -**Latest result:** 2026-02-16 · v1.0.3 · **36/36 PASS** - -```bash -# Run compliance tests -GITMEM_TIER=free node tests/compliance/mcp-protocol-compliance.mjs -``` - -Tests cover: protocol handshake, tool listing, JSON Schema validation for all tools, tool execution, error handling (correct -32601 codes), and JSON-RPC 2.0 response format compliance. - ---- - -## Known Limitations - -- **Pro E2E tests verify MCP success, not DB persistence.** The MCP server uses Supabase PostgREST (HTTP), not direct PostgreSQL. Passing a `postgres://` URI as `SUPABASE_URL` causes the server to fall back to local `.gitmem/` storage. Full DB persistence verification would require a PostgREST layer on top of the test container. -- **Performance benchmarks require `vitest bench`**, not `vitest run`. The `npm run test:perf` script handles this, but running directly with `npx vitest run --config vitest.perf.config.ts` will fail. -- **`test:all` does not include user-journey or organic-discovery tests** (they cost API calls). diff --git a/docs/threads.md b/docs/threads.md deleted file mode 100644 index 062847a..0000000 --- a/docs/threads.md +++ /dev/null @@ -1,365 +0,0 @@ -# Threads - -Threads are persistent work items that carry across sessions. They track what's unresolved, what's blocked, and what needs follow-up — surviving session boundaries so nothing gets lost. - -## Why Threads Exist - -Sessions end, but work doesn't. Before threads, open items lived in `open_threads` as plain strings inside session records. They had no IDs, no lifecycle, no way to mark something as done. You'd see the same stale item surfaced session after session with no way to clear it. - -Threads give open items identity (`t-XXXXXXXX`), lifecycle status, vitality scoring, and a resolution trail. - -## Thread Lifecycle - -Threads progress through a 5-stage state machine based on vitality scoring and age: - -``` -create_thread / session_close payload - | - v - [ EMERGING ] ── first 24 hours, high visibility - | - v (age > 24h) - [ ACTIVE ] ── vitality > 0.5, actively referenced - | - v (vitality decays) - [ COOLING ] ── 0.2 <= vitality <= 0.5, fading from use - | - v (vitality < 0.2) - [ DORMANT ] ── vitality < 0.2, no recent touches - | - v (dormant 30+ days) - [ ARCHIVED ] ── auto-archived, hidden from session_start - -Any state ──(explicit resolve_thread)──> [ RESOLVED ] -``` - -### Creation - -Threads are created in three ways: - -1. **Explicitly** via `create_thread` — mid-session when you identify a new open item -2. **Implicitly** via `session_close` — when the closing payload includes `open_threads` -3. **Promoted** from a suggestion via `promote_suggestion` — when a recurring topic is confirmed - -New threads undergo **semantic deduplication** (Phase 3) before creation. If a thread with similar meaning already exists (cosine similarity > 0.85), the existing thread is returned instead. - -### Carry-Forward - -On `session_start`, threads are loaded from Supabase (source of truth) with fallback to session aggregation. The display now shows vitality info: - -``` -Open threads (3): - t-abc12345: Fix auth timeout [ACTIVE 0.82] (operational, 2d ago) - t-def67890: Improve test coverage [COOLING 0.35] (backlog, 12d ago) - t-ghi11111: New thread just created [EMERGING 0.95] (backlog, today) -``` - -### Resolution - -Threads are resolved via `resolve_thread`: -- **By ID** (preferred): `resolve_thread({ thread_id: "t-a1b2c3d4" })` -- **By text match** (fallback): `resolve_thread({ text_match: "package name" })` - -Resolution records a timestamp, the resolving session, and an optional note. Knowledge graph triples are written to track the resolution relationship. - -## Vitality Scoring - -Every thread has a vitality score (0.0 to 1.0) computed from two components: - -``` -vitality = 0.55 * recency + 0.45 * frequency -``` - -### Recency - -Exponential decay based on thread class half-life: - -``` -recency = e^(-ln(2) * days_since_touch / half_life) -``` - -| Thread Class | Half-Life | Use Case | -|-------------|-----------|----------| -| operational | 3 days | Deploys, fixes, incidents, blockers | -| backlog | 21 days | Research, long-running improvements | - -Thread class is auto-detected from keywords in the thread text. Keywords like "deploy", "fix", "debug", "hotfix", "urgent", "broken", "incident", "blocker" classify a thread as operational. - -### Frequency - -Log-scaled touch count normalized against thread age: - -``` -frequency = min(log(touch_count + 1) / log(days_alive + 1), 1.0) -``` - -### Status Thresholds - -| Vitality Score | Status | -|---------------|--------| -| > 0.5 | active | -| 0.2 - 0.5 | cooling | -| < 0.2 | dormant | - -Threads touched during a session have their `touch_count` incremented and `last_touched_at` refreshed, which revives decayed vitality. - -## Lifecycle State Machine - -The lifecycle wraps vitality scoring with age-based and dormancy logic: - -| Transition | Condition | -|-----------|-----------| -| any → emerging | Thread age < 24 hours | -| emerging → active | Thread age >= 24 hours, vitality > 0.5 | -| active → cooling | Vitality drops to [0.2, 0.5] | -| cooling → active | Touch refreshes vitality above 0.5 | -| cooling → dormant | Vitality drops below 0.2 | -| dormant → active | Touch refreshes vitality above 0.5 | -| dormant → archived | Dormant for 30+ consecutive days | -| any → resolved | Explicit `resolve_thread` call | - -**Terminal states:** Archived and resolved threads do not transition. To reopen an archived topic, create a new thread. - -**Dormancy tracking:** When a thread enters dormant status, a `dormant_since` timestamp is stored in the Supabase metadata column. This is cleared if the thread revives. - -**Auto-archival:** At every `session_start`, a fire-and-forget call archives threads that have been dormant for 30+ days. - -## Semantic Deduplication - -When `create_thread` is called, the new thread text is compared against all open threads using embedding cosine similarity before creation. - -| Threshold | Value | Meaning | -|-----------|-------|---------| -| `DEDUP_SIMILARITY_THRESHOLD` | 0.85 | Above this = duplicate | - -**Dedup methods** (in priority order): -1. **Embedding-based** — cosine similarity of text embeddings (preferred, when Supabase available) -2. **Text normalization fallback** — exact match after lowercasing, stripping punctuation, collapsing whitespace - -When a duplicate is detected: -- The existing thread is returned instead of creating a new one -- The existing thread is touched in Supabase to keep it vital -- Response includes `deduplicated: true` with match details - -## Knowledge Graph Integration - -Thread creation and resolution generate knowledge graph triples linking threads to sessions and issues. - -### Predicates - -| Predicate | Subject | Object | When | -|-----------|---------|--------|------| -| `created_thread` | Session | Thread | Thread created | -| `resolves_thread` | Session | Thread | Thread resolved | -| `relates_to_thread` | Thread | Issue | Thread linked to Linear issue | - -Triples are written fire-and-forget via `writeTriplesForThreadCreation()` and `writeTriplesForThreadResolution()`. They use `HALF_LIFE_PROCESS = 9999` (never decay). - -### Graph Traversal - -The `graph_traverse` tool provides 4 query lenses: -- **connected_to(node)** — find all relationships for a thread, issue, or session -- **produced_by(agent)** — find all contributions by an agent or persona -- **provenance(node, depth)** — trace origin chain up to N hops -- **stats()** — predicate distribution, top subjects/objects/issues - -## Implicit Thread Detection - -At `session_close`, session embeddings are compared to detect recurring topics that should become threads. - -### Detection Algorithm - -1. Compare current session embedding against the last 20 sessions (30-day window) -2. Find sessions with cosine similarity >= 0.70 -3. If 3+ sessions cluster (current + 2 historical): - - Check if an open thread already covers the topic (similarity >= 0.80) → skip - - Check if a pending suggestion already matches (similarity >= 0.80) → add evidence - - Otherwise, create a new suggestion - -### Thresholds - -| Constant | Value | Purpose | -|----------|-------|---------| -| `SESSION_SIMILARITY_THRESHOLD` | 0.70 | Session-to-session clustering | -| `THREAD_MATCH_THRESHOLD` | 0.80 | Existing thread covers topic | -| `SUGGESTION_MATCH_THRESHOLD` | 0.80 | Matches existing suggestion | -| `MIN_EVIDENCE_SESSIONS` | 3 | Minimum sessions to trigger | - -### Suggestion Lifecycle - -Suggestions are stored in `.gitmem/suggested-threads.json` and surfaced at `session_start`: - -``` -Suggested threads (2) — recurring topics not yet tracked: - ts-a1b2c3d4: Recurring auth timeout pattern (3 sessions) - ts-e5f6g7h8: Build performance regression (4 sessions) - Use promote_suggestion or dismiss_suggestion to manage. -``` - -| Status | Meaning | -|--------|---------| -| pending | New suggestion, awaiting user action | -| promoted | Converted to a real thread via `promote_suggestion` | -| dismissed | Suppressed via `dismiss_suggestion` (3x = permanent) | - -## ThreadObject Schema - -```typescript -interface ThreadObject { - id: string; // "t-" + 8 hex chars (e.g., "t-a1b2c3d4") - text: string; // Description of the open item - status: ThreadStatus; // "open" | "resolved" - created_at: string; // ISO timestamp - resolved_at?: string; // ISO timestamp (set on resolution) - source_session?: string; // Session UUID that created this thread - resolved_by_session?: string; // Session UUID that resolved it - resolution_note?: string; // Brief explanation of resolution -} -``` - -**Supabase-native statuses** (`emerging|active|cooling|dormant|archived|resolved`) are display enrichments. The local `ThreadStatus` stays `"open"|"resolved"` for backward compatibility, with `mapStatusFromSupabase()` flattening all non-resolved to `"open"`. - -## MCP Tools - -### `create_thread` - -Create a new open thread. Runs semantic dedup check and writes knowledge graph triples. - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `text` | `string` | yes | Thread description | -| `linear_issue` | `string` | no | Associated Linear issue (e.g., PROJ-123) | - -Returns: `{ thread, deduplicated?, dedup_details?, performance }` - -### `resolve_thread` - -Mark a thread as resolved. Provide either `thread_id` or `text_match`. - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `thread_id` | `string` | -- | Exact thread ID (e.g., `"t-a1b2c3d4"`) | -| `text_match` | `string` | -- | Case-insensitive substring match | -| `resolution_note` | `string` | -- | Brief resolution explanation | - -Returns: `{ success, resolved_thread, performance }` - -### `list_threads` - -List threads with optional filtering. - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `status` | `"open" \| "resolved"` | `"open"` | Filter by status | -| `include_resolved` | `boolean` | `false` | Include recently resolved threads | -| `project` | `string` | -- | Project scope | - -Returns: `{ threads, total_open, total_resolved, performance }` - -### `cleanup_threads` - -Batch triage tool for thread health review. Groups threads by lifecycle status. - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `project` | `string` | -- | Project scope | -| `auto_archive` | `boolean` | `false` | Auto-archive threads dormant 30+ days | - -Returns: `{ summary, groups: { emerging, active, cooling, dormant }, archived_count, archived_ids, performance }` - -### `promote_suggestion` - -Convert a suggested thread into a real open thread. - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `suggestion_id` | `string` | yes | Suggestion ID (e.g., `"ts-a1b2c3d4"`) | -| `project` | `string` | -- | Project scope | - -Returns: `{ thread, suggestion, performance }` - -### `dismiss_suggestion` - -Dismiss a suggested thread. Suggestions dismissed 3+ times are permanently suppressed. - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `suggestion_id` | `string` | yes | Suggestion ID | - -Returns: `{ suggestion, performance }` - -### `graph_traverse` - -Traverse the knowledge graph connecting threads, sessions, issues, and learnings. - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `lens` | `string` | yes | Query type: `connected_to`, `produced_by`, `provenance`, `stats` | -| `node` | `string` | -- | Node to query (e.g., thread ID, issue ID) | -| `predicate` | `string` | -- | Filter by predicate type | -| `depth` | `number` | -- | Max traversal depth (provenance lens) | - -## Storage - -### Local: `.gitmem/threads.json` - -The runtime cache. An array of `ThreadObject` values, updated on every `create_thread`, `resolve_thread`, and `session_start` (after aggregation merge). - -### Local: `.gitmem/suggested-threads.json` - -Pending thread suggestions from implicit detection. Array of `ThreadSuggestion` objects with embeddings and evidence session lists. - -### Remote: `threads` table (Supabase) - -Source of truth. Full table with columns for vitality scoring, lifecycle status, embeddings, metadata (including `dormant_since`), and knowledge graph relationships. - -### Remote: `sessions.open_threads` - -Legacy JSONB column on the sessions table. Written during `session_close`. Used as fallback when the `threads` table is unavailable. - -## Format Normalization - -Threads have passed through several format generations. The `normalizeThreads()` function handles all of them: - -| Format | Example | Handling | -|--------|---------|----------| -| Plain string | `"Fix the bug"` | Migrated to ThreadObject with generated ID | -| Full ThreadObject | `{ id, text, status }` | Passed through as-is | -| JSON string (text) | `'{"id":"t-abc","text":"...","status":"open"}'` | Parsed, used directly | -| JSON string (note) | `'{"id":"t-abc","note":"...","status":"open"}'` | Parsed, `note` mapped to `text` | -| Legacy format | `'{"item":"...","context":"..."}'` | `item` field extracted as text | - -## PROJECT STATE Convention - -Threads starting with `PROJECT STATE:` are treated specially: -- Skipped during aggregation (not shown in thread lists) -- Extracted separately by `session_start` for rapid project context - -Format: `PROJECT STATE: Project Name: PROJ-1done PROJ-2~note PROJ-3->next` - -## Implementation - -| File | Purpose | -|------|---------| -| `src/services/thread-manager.ts` | Core lifecycle: ID generation, normalization, aggregation, resolution, file I/O | -| `src/services/thread-vitality.ts` | Vitality scoring, lifecycle state machine, thread class detection | -| `src/services/thread-supabase.ts` | Supabase CRUD, vitality recomputation, dormant tracking, archival | -| `src/services/thread-dedup.ts` | Semantic deduplication via embedding cosine similarity | -| `src/services/thread-suggestions.ts` | Implicit thread detection, suggestion management | -| `src/services/triple-writer.ts` | Knowledge graph triple extraction for threads | -| `src/tools/create-thread.ts` | `create_thread` MCP tool (with dedup + triples) | -| `src/tools/resolve-thread.ts` | `resolve_thread` MCP tool (with triples) | -| `src/tools/list-threads.ts` | `list_threads` MCP tool | -| `src/tools/cleanup-threads.ts` | `cleanup_threads` MCP tool (batch triage) | -| `src/tools/promote-suggestion.ts` | `promote_suggestion` MCP tool | -| `src/tools/dismiss-suggestion.ts` | `dismiss_suggestion` MCP tool | -| `src/tools/graph-traverse.ts` | `graph_traverse` MCP tool (4 lenses) | -| `src/schemas/thread.ts` | Zod validation schemas | -| `src/types/index.ts` | TypeScript interfaces | -| `tests/unit/services/thread-vitality.test.ts` | Vitality scoring tests (17) | -| `tests/unit/services/thread-lifecycle.test.ts` | Lifecycle state machine tests (15) | -| `tests/unit/services/thread-manager.test.ts` | Thread manager tests (28) | -| `tests/unit/services/thread-supabase.test.ts` | Supabase integration tests (24) | -| `tests/unit/services/thread-dedup.test.ts` | Deduplication tests (13) | -| `tests/unit/services/thread-suggestions.test.ts` | Suggestion detection tests (13) | -| `tests/unit/services/thread-triples.test.ts` | Triple extraction tests (10) |