ContextOS ("The Brain")
- Product Vision To create an "Active Memory Operating System" that transforms AI interactions from stateless chats into continuous, evolving relationships.
One-Line Pitch: "The Graph-based Memory Layer that prevents Groundhog Day in AI."
- Problem Statement Amnesia: AI Agents forget context the moment a session closes.
Retrieval Blindness: Standard RAG retrieves "keywords" but misses "relationships" (e.g., knowing that Project A is blocking Project B).
Context Window Limits: You cannot stuff 6 months of history into every prompt.
-
Solution Overview A background service that processes chat logs into a Knowledge Graph. It extracts entities (People, Projects) and relationships (Decisions, Blockers) and injects relevant context into future sessions before the user prompts.
-
Key Features & Functional Requirements A. The "Episodic Ingestor" (The Debrief Agent) Requirement: Process raw chat transcripts into structured data without user intervention.
Trigger: runs 10 minutes after chat session inactivity.
Logic: Uses a small, fast LLM (e.g., gpt-4o-mini) to extract:
Facts: "User works at Acme Corp."
Preferences: "User hates verbose code comments."
Work Objects: "User created a 'Draft Audit Report'."
Triples: (User) --[HAS_GOAL]--> (Launch AppFlow)
B. The "Property Graph" Storage Requirement: Store both semantic vectors (for fuzzy search) and graph edges (for precise logic).
Stack: LlamaIndex PropertyGraphIndex backed by Neo4j.
Schema:
Nodes: Person, Project, Event, Document.
Edges: AUTHORED, MODIFIED, BLOCKED_BY, DECIDED_ON.
C. The "Context Injection" Middleware Requirement: proactively insert memory into the system prompt of AppFlow agents.
Logic:
User opens "ServiceNow Bot."
Middleware queries ContextOS: "What is the user's current IT context?"
Graph Result: User --[REPORTED]--> (Ticket #123: Wifi Issue).
Injection: "System Note: The user recently reported Wifi issues (Ticket #123). Ask if this is resolved."
D. The "Morning Brief" (Diff Engine) Requirement: Show the user what changed in their "World State."
Function: Compare Graph_State_T0 vs Graph_State_T1.
Output: A natural language summary: "Since yesterday, your 'Audit Project' moved from 'Planning' to 'Blocked'."
- Technical Stack Orchestration: LlamaIndex (Python).
Graph DB: Neo4j (AuraDB Free Tier for MVP).
Vector DB: Pinecone or Weaviate (for the node embeddings).
LLM: OpenAI gpt-4o-mini (for extraction) / gpt-4o (for synthesis).
- Success Metrics (MVP) Recall Accuracy: The system retrieves the correct "Active Project" 95% of the time.
Extraction Quality: "Debrief Agent" correctly identifies 90% of user decisions from chat logs.
Latency: Context retrieval adds < 500ms to the start of a chat session.
{
id: UUID,
name: string,
role?: string, // "manager", "client", "vendor"
company?: string,
email?: string,
relationship_strength: number, // 0.0-1.0, decays over time
last_mentioned: Date,
extraction_confidence: number
}{
id: UUID,
name: string,
status: "planning" | "active" | "blocked" | "completed" | "abandoned",
priority?: "high" | "medium" | "low",
deadline?: Date,
blockers: UUID[], // References to blocking nodes
last_mentioned: Date,
extraction_confidence: number
}{
id: UUID,
type: "meeting" | "deadline" | "decision" | "milestone",
description: string,
date?: Date,
participants: UUID[], // Person references
outcome?: string,
extraction_confidence: number
}{
id: UUID,
subject_id: UUID, // What this fact is about
claim: string, // "User prefers dark mode"
source_session_id: UUID,
extracted_at: Date,
extraction_confidence: number,
superseded_by?: UUID // If updated by newer fact
}| Edge | From → To | Properties |
|---|---|---|
AUTHORED |
Person → Document | { date, role: "primary"|"contributor" } |
BLOCKED_BY |
Project → Project/Event | { reason, since: Date } |
DECIDED_ON |
Person → Event | { decision_type, outcome } |
WORKS_ON |
Person → Project | { role, since: Date } |
RELATED_TO |
Any → Any | { relationship_type, strength: 0.0-1.0 } |
SUPERSEDES |
Fact → Fact | { reason: "updated"|"corrected" } |
| Platform | Detection | Parsing Strategy |
|---|---|---|
| Claude.ai | URL contains claude.ai/chat |
Parse data-testid conversation blocks |
| ChatGPT | URL contains chat.openai.com or chatgpt.com |
Parse data-message-author-role attributes |
// Content script injected into chat pages
let lastActivityTime = Date.now();
// Monitor for new messages
const observer = new MutationObserver(() => {
lastActivityTime = Date.now();
});
// Trigger debrief after 10 minutes of inactivity
setInterval(() => {
const inactiveMinutes = (Date.now() - lastActivityTime) / 60000;
if (inactiveMinutes >= 10) {
triggerDebrief();
}
}, 60000); // Check every minute- Scrape: Pull conversation HTML from active tab
- Parse: Extract user/assistant turns with timestamps
- Chunk: Split into 4000-token windows
- Extract: LLM identifies entities, facts, preferences
- Dedupe: Match against existing graph nodes (embedding similarity > 0.9)
- Store: Upsert nodes/edges to Neo4j
Two facts conflict when:
- Same
subject_id(about the same entity) - Claims are semantically contradictory (cosine similarity < 0.3 between claim embeddings)
def resolve_conflict(existing_fact: Fact, new_fact: Fact) -> Fact:
"""
Keep the fact with higher confidence.
If tie, prefer newer fact.
"""
if new_fact.extraction_confidence > existing_fact.extraction_confidence:
winner = new_fact
loser = existing_fact
elif new_fact.extraction_confidence < existing_fact.extraction_confidence:
winner = existing_fact
loser = new_fact
else:
# Tie: prefer newer
winner = new_fact
loser = existing_fact
# Don't delete loser, mark as superseded
loser.superseded_by = winner.id
winner.supersedes = loser.id
return winner
# Confidence thresholds
HIGH_CONFIDENCE = 0.8 # Explicit user statement
MEDIUM_CONFIDENCE = 0.6 # Inferred from context
LOW_CONFIDENCE = 0.4 # Ambiguous extraction| Signal | Confidence Boost |
|---|---|
| User explicitly states fact | +0.3 |
| Fact repeated across sessions | +0.2 |
| Fact from recent session (< 7 days) | +0.1 |
| Fact contradicts user correction | -0.4 |
<context_injection>
<active_projects>
- {project_name} ({status}): {one_line_summary}
</active_projects>
<recent_decisions>
- {date}: {decision_summary}
</recent_decisions>
<user_preferences>
- {preference_category}: {preference_value}
</user_preferences>
<relevant_people>
- {person_name} ({role}): Last discussed {days_ago} days ago
</relevant_people>
</context_injection>| Agent Type | Injected Context |
|---|---|
| General Assistant | Top 3 active projects, top 5 preferences |
| Code Assistant | Current project only, tech stack preferences |
| Meeting Assistant | Relevant people, recent decisions |
| Task Manager | All active projects with blockers |
- Maximum injection size: 500 tokens
- If exceeds: Prioritize by
last_mentionedrecency - Always include: Active blockers, user corrections from last 7 days
┌─────────────────────────────────────────────────────────────┐
│ User's Machine │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌────────────────────────────────┐ │
│ │ Browser Extension│ │ ContextOS Local Server │ │
│ │ (Content Scripts)│ │ (Python/FastAPI) │ │
│ │ │ │ │ │
│ │ • Scrapes chat │───>│ • Extraction Pipeline │ │
│ │ • Detects idle │ │ • Graph Storage (Neo4j) │ │
│ │ • Triggers sync │<───│ • Vector Embeddings (Weaviate) │ │
│ │ │ │ • Context Query API │ │
│ └─────────────────┘ └────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ OpenAI API │ │
│ │ (gpt-4o-mini) │ │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Rationale:
- Local server preserves privacy (no cloud dependency for graph storage)
- Browser extension has direct DOM access for reliable scraping
- Separation allows each component to be updated independently
- Future: Option to run server in Docker for portability
- Extension → Server: HTTP POST to
localhost:8742 - Server → Extension: Server-Sent Events for real-time updates
| Component | Technology | Rationale |
|---|---|---|
| Vector DB | Weaviate (local Docker) | Open-source, embedded mode, built-in vectorizer |
| Graph DB | Neo4j (AuraDB Free or local) | Best graph query language (Cypher) |
| Extraction LLM | OpenAI gpt-4o-mini | Fast, cheap, good at structured output |
| Synthesis LLM | OpenAI gpt-4o | Better reasoning for context relevance |
| Backend | Python 3.11 + FastAPI | Best LLM tooling ecosystem |
| Extension | Manifest V3 | Required for Chrome Web Store |
The term "AppFlow" in this document refers to any LLM-powered application that can receive injected context. Examples:
- Claude Code (via MCP server)
- Custom Streamlit apps
- API-based agents
Context injection works via a REST API that any application can query.
contextos/
├── extension/ # Chrome Extension
│ ├── manifest.json
│ ├── content_scripts/
│ │ ├── scraper.js # DOM scraping logic
│ │ ├── platforms/
│ │ │ ├── claude.js
│ │ │ └── chatgpt.js
│ │ └── idle_detector.js
│ ├── background/
│ │ └── service-worker.js # Extension coordination
│ └── popup/
│ ├── popup.html
│ └── popup.js
│
├── server/ # Python Backend
│ ├── src/
│ │ ├── main.py # FastAPI entry point
│ │ ├── config.py # Pydantic Settings
│ │ ├── api/
│ │ │ ├── routes.py # REST endpoints
│ │ │ └── schemas.py # Request/response models
│ │ ├── extraction/
│ │ │ ├── pipeline.py # Main extraction flow
│ │ │ ├── prompts.py # LLM prompt templates
│ │ │ └── parsers.py # Chat format parsers
│ │ ├── graph/
│ │ │ ├── neo4j_client.py # Neo4j connection
│ │ │ ├── schema.py # Cypher schema DDL
│ │ │ └── queries.py # Common graph queries
│ │ ├── vector/
│ │ │ ├── weaviate_client.py
│ │ │ └── embeddings.py # OpenAI embedding calls
│ │ └── injection/
│ │ ├── builder.py # Context assembly
│ │ └── templates.py # Injection formats
│ ├── tests/
│ │ ├── test_extraction.py
│ │ ├── test_graph.py
│ │ └── fixtures/
│ ├── requirements.txt
│ └── Dockerfile
│
├── docker-compose.yml # Neo4j + Weaviate + Server
├── .env.example
└── README.md
http://localhost:8742/api/v1
Submit a chat transcript for processing.
Request:
{
"platform": "claude" | "chatgpt",
"session_id": "uuid",
"messages": [
{
"role": "user" | "assistant",
"content": "string",
"timestamp": "ISO8601"
}
]
}Response (202 Accepted):
{
"job_id": "uuid",
"status": "queued",
"estimated_seconds": 30
}Check extraction job status.
Response (200):
{
"job_id": "uuid",
"status": "completed" | "processing" | "failed",
"extracted": {
"facts": 12,
"entities": 5,
"relationships": 8
},
"errors": []
}Retrieve context for injection into an agent.
Query Parameters:
agent_type: "general" | "code" | "meeting" | "task" (required)max_tokens: integer (default 500)project_filter: string (optional, filter to specific project)
Response (200):
{
"injection": "<context_injection>...</context_injection>",
"token_count": 423,
"sources": [
{ "type": "fact", "id": "uuid", "relevance": 0.92 },
{ "type": "project", "id": "uuid", "relevance": 0.88 }
]
}Get changes since last snapshot for Morning Brief.
Query Parameters:
since: ISO8601 timestamp (required)
Response (200):
{
"changes": [
{
"entity_type": "project",
"entity_name": "Audit Project",
"field": "status",
"from": "planning",
"to": "blocked",
"changed_at": "ISO8601"
}
],
"summary": "Your 'Audit Project' moved from 'Planning' to 'Blocked'. 2 new facts extracted about you."
}| HTTP | Code | Meaning |
|---|---|---|
| 400 | invalid_platform |
Unsupported platform in request |
| 400 | empty_transcript |
No messages in transcript |
| 404 | job_not_found |
Invalid job_id |
| 500 | extraction_failed |
LLM or DB error during processing |
| 503 | neo4j_unavailable |
Graph DB connection failed |
# .env.example
# OpenAI (Required)
OPENAI_API_KEY=sk-...
# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password
# Weaviate
WEAVIATE_URL=http://localhost:8080
# Server
SERVER_PORT=8742
LOG_LEVEL=INFO
# Extension (for development)
EXTENSION_DEV_MODE=true| Error | Detection | Recovery |
|---|---|---|
| OpenAI rate limit | 429 response | Exponential backoff, queue remaining chunks |
| OpenAI timeout | >30s response | Retry once, then mark job as partial |
| Malformed LLM output | JSON parse fails | Retry with stricter prompt, then skip chunk |
| Empty extraction | 0 entities found | Log warning, continue (some chats have no extractable content) |
| Error | Detection | Recovery |
|---|---|---|
| Neo4j connection failed | Connection timeout | Cache extractions locally, retry on reconnect |
| Weaviate unavailable | Health check fails | Disable deduplication, proceed with extraction |
| Duplicate node | Constraint violation | Merge with existing node |
| Error | Detection | Recovery |
|---|---|---|
| Server unreachable | Fetch fails | Show badge "Offline", queue transcripts locally |
| DOM structure changed | Selectors return null | Fall back to text extraction, log warning |
| Permission denied | Content script blocked | Show user notification with fix instructions |
- All data stored locally (user's machine only)
- No cloud sync by default
- Neo4j and Weaviate run in Docker with no external access
- Server binds to
127.0.0.1only (not exposed to network) - No authentication for MVP (local-only access)
- CORS restricted to extension origin
- Chat content processed but not stored verbatim
- Only extracted facts/entities retained
- User can delete all data via
/api/v1/resetendpoint
| Permission | Justification |
|---|---|
activeTab |
Read current chat page DOM |
storage |
Persist local queue when server unavailable |
host_permissions: claude.ai/* |
Content script injection |
host_permissions: chat.openai.com/* |
Content script injection |
| Module | Test Cases |
|---|---|
extraction/parsers.py |
Parse Claude format, parse ChatGPT format, handle malformed input |
extraction/pipeline.py |
Extract facts, extract relationships, handle empty input |
graph/queries.py |
Find related nodes, resolve conflicts, calculate recency |
injection/builder.py |
Token limiting, priority sorting, template rendering |
| Scenario | Validation |
|---|---|
| Full pipeline | Ingest sample transcript → verify graph populated correctly |
| Context retrieval | Populate graph → query context → verify relevant facts returned |
| Conflict resolution | Insert conflicting facts → verify winner selected correctly |
tests/fixtures/
├── claude_transcript_simple.json
├── chatgpt_transcript_long.json
├── expected_extraction.json
└── sample_graph_state.cypher
- Extension successfully scrapes Claude.ai and ChatGPT conversations
- Extraction pipeline populates Neo4j with facts, projects, people
-
/contextendpoint returns relevant context in < 500ms - Morning Brief shows accurate diff summary
| Metric | Target |
|---|---|
| Extraction precision | > 90% (facts match human labeling) |
| Context retrieval latency | < 500ms |
| Token efficiency | Injection uses < 80% of budget |
| False positive rate | < 5% (irrelevant context injected) |