Skip to content

Latest commit

 

History

History
610 lines (489 loc) · 19.1 KB

File metadata and controls

610 lines (489 loc) · 19.1 KB

ContextOS ("The Brain")

  1. Product Vision To create an "Active Memory Operating System" that transforms AI interactions from stateless chats into continuous, evolving relationships.

One-Line Pitch: "The Graph-based Memory Layer that prevents Groundhog Day in AI."

  1. Problem Statement Amnesia: AI Agents forget context the moment a session closes.

Retrieval Blindness: Standard RAG retrieves "keywords" but misses "relationships" (e.g., knowing that Project A is blocking Project B).

Context Window Limits: You cannot stuff 6 months of history into every prompt.

  1. Solution Overview A background service that processes chat logs into a Knowledge Graph. It extracts entities (People, Projects) and relationships (Decisions, Blockers) and injects relevant context into future sessions before the user prompts.

  2. Key Features & Functional Requirements A. The "Episodic Ingestor" (The Debrief Agent) Requirement: Process raw chat transcripts into structured data without user intervention.

Trigger: runs 10 minutes after chat session inactivity.

Logic: Uses a small, fast LLM (e.g., gpt-4o-mini) to extract:

Facts: "User works at Acme Corp."

Preferences: "User hates verbose code comments."

Work Objects: "User created a 'Draft Audit Report'."

Triples: (User) --[HAS_GOAL]--> (Launch AppFlow)

B. The "Property Graph" Storage Requirement: Store both semantic vectors (for fuzzy search) and graph edges (for precise logic).

Stack: LlamaIndex PropertyGraphIndex backed by Neo4j.

Schema:

Nodes: Person, Project, Event, Document.

Edges: AUTHORED, MODIFIED, BLOCKED_BY, DECIDED_ON.

C. The "Context Injection" Middleware Requirement: proactively insert memory into the system prompt of AppFlow agents.

Logic:

User opens "ServiceNow Bot."

Middleware queries ContextOS: "What is the user's current IT context?"

Graph Result: User --[REPORTED]--> (Ticket #123: Wifi Issue).

Injection: "System Note: The user recently reported Wifi issues (Ticket #123). Ask if this is resolved."

D. The "Morning Brief" (Diff Engine) Requirement: Show the user what changed in their "World State."

Function: Compare Graph_State_T0 vs Graph_State_T1.

Output: A natural language summary: "Since yesterday, your 'Audit Project' moved from 'Planning' to 'Blocked'."

  1. Technical Stack Orchestration: LlamaIndex (Python).

Graph DB: Neo4j (AuraDB Free Tier for MVP).

Vector DB: Pinecone or Weaviate (for the node embeddings).

LLM: OpenAI gpt-4o-mini (for extraction) / gpt-4o (for synthesis).

  1. Success Metrics (MVP) Recall Accuracy: The system retrieves the correct "Active Project" 95% of the time.

Extraction Quality: "Debrief Agent" correctly identifies 90% of user decisions from chat logs.

Latency: Context retrieval adds < 500ms to the start of a chat session.


7. Graph Schema (Complete)

Node Types & Properties

Person

{
  id: UUID,
  name: string,
  role?: string,              // "manager", "client", "vendor"
  company?: string,
  email?: string,
  relationship_strength: number,  // 0.0-1.0, decays over time
  last_mentioned: Date,
  extraction_confidence: number
}

Project

{
  id: UUID,
  name: string,
  status: "planning" | "active" | "blocked" | "completed" | "abandoned",
  priority?: "high" | "medium" | "low",
  deadline?: Date,
  blockers: UUID[],           // References to blocking nodes
  last_mentioned: Date,
  extraction_confidence: number
}

Event

{
  id: UUID,
  type: "meeting" | "deadline" | "decision" | "milestone",
  description: string,
  date?: Date,
  participants: UUID[],       // Person references
  outcome?: string,
  extraction_confidence: number
}

Fact

{
  id: UUID,
  subject_id: UUID,           // What this fact is about
  claim: string,              // "User prefers dark mode"
  source_session_id: UUID,
  extracted_at: Date,
  extraction_confidence: number,
  superseded_by?: UUID        // If updated by newer fact
}

Edge Types & Properties

Edge From → To Properties
AUTHORED Person → Document { date, role: "primary"|"contributor" }
BLOCKED_BY Project → Project/Event { reason, since: Date }
DECIDED_ON Person → Event { decision_type, outcome }
WORKS_ON Person → Project { role, since: Date }
RELATED_TO Any → Any { relationship_type, strength: 0.0-1.0 }
SUPERSEDES Fact → Fact { reason: "updated"|"corrected" }

8. Platform-Specific Ingestion

Supported Platforms (MVP)

Platform Detection Parsing Strategy
Claude.ai URL contains claude.ai/chat Parse data-testid conversation blocks
ChatGPT URL contains chat.openai.com or chatgpt.com Parse data-message-author-role attributes

Inactivity Detection

// Content script injected into chat pages
let lastActivityTime = Date.now();

// Monitor for new messages
const observer = new MutationObserver(() => {
  lastActivityTime = Date.now();
});

// Trigger debrief after 10 minutes of inactivity
setInterval(() => {
  const inactiveMinutes = (Date.now() - lastActivityTime) / 60000;
  if (inactiveMinutes >= 10) {
    triggerDebrief();
  }
}, 60000); // Check every minute

Extraction Pipeline

  1. Scrape: Pull conversation HTML from active tab
  2. Parse: Extract user/assistant turns with timestamps
  3. Chunk: Split into 4000-token windows
  4. Extract: LLM identifies entities, facts, preferences
  5. Dedupe: Match against existing graph nodes (embedding similarity > 0.9)
  6. Store: Upsert nodes/edges to Neo4j

9. Conflict Resolution (Confidence-Based)

Conflict Detection

Two facts conflict when:

  1. Same subject_id (about the same entity)
  2. Claims are semantically contradictory (cosine similarity < 0.3 between claim embeddings)

Resolution Algorithm

def resolve_conflict(existing_fact: Fact, new_fact: Fact) -> Fact:
    """
    Keep the fact with higher confidence.
    If tie, prefer newer fact.
    """
    if new_fact.extraction_confidence > existing_fact.extraction_confidence:
        winner = new_fact
        loser = existing_fact
    elif new_fact.extraction_confidence < existing_fact.extraction_confidence:
        winner = existing_fact
        loser = new_fact
    else:
        # Tie: prefer newer
        winner = new_fact
        loser = existing_fact

    # Don't delete loser, mark as superseded
    loser.superseded_by = winner.id
    winner.supersedes = loser.id

    return winner

# Confidence thresholds
HIGH_CONFIDENCE = 0.8    # Explicit user statement
MEDIUM_CONFIDENCE = 0.6  # Inferred from context
LOW_CONFIDENCE = 0.4     # Ambiguous extraction

Confidence Scoring

Signal Confidence Boost
User explicitly states fact +0.3
Fact repeated across sessions +0.2
Fact from recent session (< 7 days) +0.1
Fact contradicts user correction -0.4

10. Context Injection Templates

Injection Format

<context_injection>
  <active_projects>
    - {project_name} ({status}): {one_line_summary}
  </active_projects>

  <recent_decisions>
    - {date}: {decision_summary}
  </recent_decisions>

  <user_preferences>
    - {preference_category}: {preference_value}
  </user_preferences>

  <relevant_people>
    - {person_name} ({role}): Last discussed {days_ago} days ago
  </relevant_people>
</context_injection>

Injection Rules

Agent Type Injected Context
General Assistant Top 3 active projects, top 5 preferences
Code Assistant Current project only, tech stack preferences
Meeting Assistant Relevant people, recent decisions
Task Manager All active projects with blockers

Token Budget

  • Maximum injection size: 500 tokens
  • If exceeds: Prioritize by last_mentioned recency
  • Always include: Active blockers, user corrections from last 7 days

11. Architecture Decision: Deployment Model

Chosen Architecture: Browser Extension + Local Server

┌─────────────────────────────────────────────────────────────┐
│                    User's Machine                            │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐    ┌────────────────────────────────┐ │
│  │ Browser Extension│    │ ContextOS Local Server         │ │
│  │ (Content Scripts)│    │ (Python/FastAPI)               │ │
│  │                  │    │                                │ │
│  │ • Scrapes chat   │───>│ • Extraction Pipeline          │ │
│  │ • Detects idle   │    │ • Graph Storage (Neo4j)        │ │
│  │ • Triggers sync  │<───│ • Vector Embeddings (Weaviate) │ │
│  │                  │    │ • Context Query API            │ │
│  └─────────────────┘    └────────────────────────────────┘ │
│                                    │                         │
│                                    ▼                         │
│                         ┌─────────────────┐                 │
│                         │ OpenAI API      │                 │
│                         │ (gpt-4o-mini)   │                 │
│                         └─────────────────┘                 │
└─────────────────────────────────────────────────────────────┘

Rationale:

  • Local server preserves privacy (no cloud dependency for graph storage)
  • Browser extension has direct DOM access for reliable scraping
  • Separation allows each component to be updated independently
  • Future: Option to run server in Docker for portability

Communication Protocol

  • Extension → Server: HTTP POST to localhost:8742
  • Server → Extension: Server-Sent Events for real-time updates

12. Tech Stack Decisions (Resolved)

Component Technology Rationale
Vector DB Weaviate (local Docker) Open-source, embedded mode, built-in vectorizer
Graph DB Neo4j (AuraDB Free or local) Best graph query language (Cypher)
Extraction LLM OpenAI gpt-4o-mini Fast, cheap, good at structured output
Synthesis LLM OpenAI gpt-4o Better reasoning for context relevance
Backend Python 3.11 + FastAPI Best LLM tooling ecosystem
Extension Manifest V3 Required for Chrome Web Store

"AppFlow" Clarification

The term "AppFlow" in this document refers to any LLM-powered application that can receive injected context. Examples:

  • Claude Code (via MCP server)
  • Custom Streamlit apps
  • API-based agents

Context injection works via a REST API that any application can query.


13. Project File Structure

contextos/
├── extension/                    # Chrome Extension
│   ├── manifest.json
│   ├── content_scripts/
│   │   ├── scraper.js           # DOM scraping logic
│   │   ├── platforms/
│   │   │   ├── claude.js
│   │   │   └── chatgpt.js
│   │   └── idle_detector.js
│   ├── background/
│   │   └── service-worker.js    # Extension coordination
│   └── popup/
│       ├── popup.html
│       └── popup.js
│
├── server/                       # Python Backend
│   ├── src/
│   │   ├── main.py              # FastAPI entry point
│   │   ├── config.py            # Pydantic Settings
│   │   ├── api/
│   │   │   ├── routes.py        # REST endpoints
│   │   │   └── schemas.py       # Request/response models
│   │   ├── extraction/
│   │   │   ├── pipeline.py      # Main extraction flow
│   │   │   ├── prompts.py       # LLM prompt templates
│   │   │   └── parsers.py       # Chat format parsers
│   │   ├── graph/
│   │   │   ├── neo4j_client.py  # Neo4j connection
│   │   │   ├── schema.py        # Cypher schema DDL
│   │   │   └── queries.py       # Common graph queries
│   │   ├── vector/
│   │   │   ├── weaviate_client.py
│   │   │   └── embeddings.py    # OpenAI embedding calls
│   │   └── injection/
│   │       ├── builder.py       # Context assembly
│   │       └── templates.py     # Injection formats
│   ├── tests/
│   │   ├── test_extraction.py
│   │   ├── test_graph.py
│   │   └── fixtures/
│   ├── requirements.txt
│   └── Dockerfile
│
├── docker-compose.yml           # Neo4j + Weaviate + Server
├── .env.example
└── README.md

14. API Contracts

Base URL

http://localhost:8742/api/v1

POST /ingest

Submit a chat transcript for processing.

Request:

{
  "platform": "claude" | "chatgpt",
  "session_id": "uuid",
  "messages": [
    {
      "role": "user" | "assistant",
      "content": "string",
      "timestamp": "ISO8601"
    }
  ]
}

Response (202 Accepted):

{
  "job_id": "uuid",
  "status": "queued",
  "estimated_seconds": 30
}

GET /ingest/{job_id}

Check extraction job status.

Response (200):

{
  "job_id": "uuid",
  "status": "completed" | "processing" | "failed",
  "extracted": {
    "facts": 12,
    "entities": 5,
    "relationships": 8
  },
  "errors": []
}

GET /context

Retrieve context for injection into an agent.

Query Parameters:

  • agent_type: "general" | "code" | "meeting" | "task" (required)
  • max_tokens: integer (default 500)
  • project_filter: string (optional, filter to specific project)

Response (200):

{
  "injection": "<context_injection>...</context_injection>",
  "token_count": 423,
  "sources": [
    { "type": "fact", "id": "uuid", "relevance": 0.92 },
    { "type": "project", "id": "uuid", "relevance": 0.88 }
  ]
}

GET /diff

Get changes since last snapshot for Morning Brief.

Query Parameters:

  • since: ISO8601 timestamp (required)

Response (200):

{
  "changes": [
    {
      "entity_type": "project",
      "entity_name": "Audit Project",
      "field": "status",
      "from": "planning",
      "to": "blocked",
      "changed_at": "ISO8601"
    }
  ],
  "summary": "Your 'Audit Project' moved from 'Planning' to 'Blocked'. 2 new facts extracted about you."
}

Error Responses

HTTP Code Meaning
400 invalid_platform Unsupported platform in request
400 empty_transcript No messages in transcript
404 job_not_found Invalid job_id
500 extraction_failed LLM or DB error during processing
503 neo4j_unavailable Graph DB connection failed

15. Environment Configuration

# .env.example

# OpenAI (Required)
OPENAI_API_KEY=sk-...

# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password

# Weaviate
WEAVIATE_URL=http://localhost:8080

# Server
SERVER_PORT=8742
LOG_LEVEL=INFO

# Extension (for development)
EXTENSION_DEV_MODE=true

16. Error Handling Strategy

Extraction Errors

Error Detection Recovery
OpenAI rate limit 429 response Exponential backoff, queue remaining chunks
OpenAI timeout >30s response Retry once, then mark job as partial
Malformed LLM output JSON parse fails Retry with stricter prompt, then skip chunk
Empty extraction 0 entities found Log warning, continue (some chats have no extractable content)

Storage Errors

Error Detection Recovery
Neo4j connection failed Connection timeout Cache extractions locally, retry on reconnect
Weaviate unavailable Health check fails Disable deduplication, proceed with extraction
Duplicate node Constraint violation Merge with existing node

Extension Errors

Error Detection Recovery
Server unreachable Fetch fails Show badge "Offline", queue transcripts locally
DOM structure changed Selectors return null Fall back to text extraction, log warning
Permission denied Content script blocked Show user notification with fix instructions

17. Security & Privacy

Data Storage

  • All data stored locally (user's machine only)
  • No cloud sync by default
  • Neo4j and Weaviate run in Docker with no external access

API Security (Local Server)

  • Server binds to 127.0.0.1 only (not exposed to network)
  • No authentication for MVP (local-only access)
  • CORS restricted to extension origin

Sensitive Data Handling

  • Chat content processed but not stored verbatim
  • Only extracted facts/entities retained
  • User can delete all data via /api/v1/reset endpoint

Extension Permissions

Permission Justification
activeTab Read current chat page DOM
storage Persist local queue when server unavailable
host_permissions: claude.ai/* Content script injection
host_permissions: chat.openai.com/* Content script injection

18. Testing Requirements

Unit Tests

Module Test Cases
extraction/parsers.py Parse Claude format, parse ChatGPT format, handle malformed input
extraction/pipeline.py Extract facts, extract relationships, handle empty input
graph/queries.py Find related nodes, resolve conflicts, calculate recency
injection/builder.py Token limiting, priority sorting, template rendering

Integration Tests

Scenario Validation
Full pipeline Ingest sample transcript → verify graph populated correctly
Context retrieval Populate graph → query context → verify relevant facts returned
Conflict resolution Insert conflicting facts → verify winner selected correctly

Test Fixtures

tests/fixtures/
├── claude_transcript_simple.json
├── chatgpt_transcript_long.json
├── expected_extraction.json
└── sample_graph_state.cypher

19. Success Criteria

MVP Definition of Done

  • Extension successfully scrapes Claude.ai and ChatGPT conversations
  • Extraction pipeline populates Neo4j with facts, projects, people
  • /context endpoint returns relevant context in < 500ms
  • Morning Brief shows accurate diff summary

Quality Gates

Metric Target
Extraction precision > 90% (facts match human labeling)
Context retrieval latency < 500ms
Token efficiency Injection uses < 80% of budget
False positive rate < 5% (irrelevant context injected)