Skip to content

An intelligent orchestrator that executes natural language queries across Gmail, Google Calendar, and Google Drive.

Notifications You must be signed in to change notification settings

feat7/orchestrator

Repository files navigation

Google Workspace Orchestrator

An intelligent orchestrator that executes natural language queries across Gmail, Google Calendar, and Google Drive.

Features

  • Intent Classification: LLM-powered parsing of natural language into structured intents with conversation context support
  • Multi-Service Orchestration: Execute queries across Gmail, Calendar, and Drive with DAG-based parallel execution
  • Hybrid Search: 3-way Reciprocal Rank Fusion combining BM25, vector similarity, and filtered search
  • Streaming Responses: Real-time SSE streaming for conversational UX
  • Email Composition: LLM-powered email drafting with automatic recipient resolution
  • Background Sync: Celery workers for incremental Google API synchronization

Quick Start

Prerequisites

  • Docker and Docker Compose
  • Python 3.11+
  • OpenAI API key (or Anthropic)

Setup

  1. Clone and configure:

    cp .env.example .env
    # Edit .env with your API keys
  2. Start services:

    docker-compose up -d db redis
  3. Install dependencies:

    pip install -r requirements.txt
  4. Run migrations:

    alembic upgrade head
  5. Seed mock data:

    python -m scripts.seed_mock_data
  6. Start the API:

    uvicorn app.main:app --reload
  7. Open the UI:

    http://localhost:8000/
    

Architecture

flowchart LR
    Q[Query] --> IC[Intent Classifier]
    IC --> QP[Query Planner]
    QP --> O[Orchestrator]
    O --> GA[Gmail Agent]
    O --> CA[Calendar Agent]
    O --> DA[Drive Agent]
    GA & CA & DA --> HS[Hybrid Search]
    HS --> RS[Response Synthesizer]
    RS --> R[Streaming Response]
Loading

Hybrid Search Architecture

We use 3-way Reciprocal Rank Fusion (RRF) to combine multiple search methods:

flowchart TB
    Q[User Query] --> E[Generate Embedding]
    E --> BM25[BM25 Full-Text<br/>PostgreSQL ts_vector]
    E --> VEC[Vector Similarity<br/>pgvector cosine]
    E --> FIL[Filtered Vector<br/>+ metadata filters]
    BM25 --> RRF[RRF Fusion<br/>score = Σ 1/k+rank]
    VEC --> RRF
    FIL --> RRF
    RRF --> BOOST[Filter Boost 1.5x]
    BOOST --> TOP[Top 10 Results]
Loading
Method What It Does Latency
BM25 PostgreSQL full-text search ~5ms
Vector Cosine similarity on embeddings ~20ms
Filtered Vector + metadata (sender, date) ~20ms

See Design Documentation for detailed architecture and scaling strategy.

Sample Queries

Single Service:

  • "What's on my calendar next week?"
  • "Find emails from sarah@company.com about the budget"
  • "Show me PDFs in Drive from last month"

Multi-Service:

  • "Cancel my Turkish Airlines flight"
  • "Prepare for tomorrow's meeting with Acme Corp"
  • "Find events that conflict with my out-of-office doc"

Actions:

  • "Draft an email to John about the project update"
  • "Send it" (after reviewing draft)
  • "Create a meeting with Sarah tomorrow at 2pm"

Project Structure

├── app/
│   ├── api/              # API routes and dependencies
│   ├── core/             # Core logic (intent, planner, orchestrator, synthesizer)
│   ├── agents/           # Service agents (gmail, gcal, gdrive)
│   ├── services/         # Google services, embedding, cache
│   ├── db/               # Database models and connection
│   ├── schemas/          # Pydantic schemas
│   ├── evaluation/       # Search quality benchmarks
│   └── static/           # Web UI
├── alembic/              # Database migrations
├── tests/                # Test suite
├── docs/                 # Documentation
└── scripts/              # Utility scripts

API Endpoints

Method Endpoint Description
POST /api/v1/query Process natural language query
POST /api/v1/intent Classify intent only (no execution)
POST /api/v1/query/stream Streaming query response (SSE)
GET /api/v1/health Health check
POST /api/v1/sync/trigger Trigger data sync
GET /api/v1/sync/status Get sync status
GET /api/v1/metrics/precision Search quality benchmark

Example: Intent Classification

curl -X POST http://localhost:8000/api/v1/intent \
  -H "Content-Type: application/json" \
  -d '{"query": "Cancel my Turkish Airlines flight"}'

Response:

{
  "query": "Cancel my Turkish Airlines flight",
  "intent": {
    "services": ["gmail", "gcal"],
    "operation": "action",
    "steps": [
      {"step": "search_gmail", "params": {"search_query": "Turkish Airlines flight booking"}},
      {"step": "search_calendar", "params": {"search_query": "Turkish Airlines flight"}},
      {"step": "draft_email", "params": {"to": "support@turkishairlines.com", "subject": "Flight Cancellation Request"}}
    ],
    "confidence": 0.9
  },
  "latency_ms": 1850
}

Example: Full Query Execution

curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What important things to do this week?"}'

Response:

{
  "response": "Here's what's important this week:\n\n- Monday at 9:00 AM: Daily Standup\n- Monday at 11:00 AM: 1:1 with Manager\n- Tuesday at 10:00 AM: Acme Corp Partnership Meeting\n\nNo urgent emails found for this week.",
  "actions_taken": [
    {"step": "search_calendar", "success": true, "data": {"results": [...]}},
    {"step": "search_gmail", "success": true, "data": {"results": [...]}}
  ],
  "intent": {"services": ["gcal", "gmail"], "operation": "search", ...},
  "latency_ms": 3200
}

See API Documentation for details and Sample Queries for more examples.

Configuration

Variable Description Default
DATABASE_URL PostgreSQL connection string -
REDIS_URL Redis connection string -
LLM_PROVIDER "openai" or "anthropic" openai
OPENAI_API_KEY OpenAI API key -
USE_MOCK_GOOGLE Use mock Google services true

Testing

# Run tests
pytest

# Run with coverage
pytest --cov=app

# Run specific test file
pytest tests/test_intent.py

# Run search quality benchmark
pytest tests/test_precision.py

Development

# Format code
black app tests

# Type checking
mypy app

# Lint
ruff app

Technical Notes

  • Built without agent frameworks (LangChain, LlamaIndex) for full control
  • Uses pgvector for self-hosted vector search with IVFFlat indexes
  • Custom orchestration with DAG-based parallel execution
  • Hybrid search using Reciprocal Rank Fusion (RRF)
  • SSE streaming for real-time responses

Documentation

License

MIT

About

An intelligent orchestrator that executes natural language queries across Gmail, Google Calendar, and Google Drive.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages