Google Workspace Orchestrator

An intelligent orchestrator that executes natural language queries across Gmail, Google Calendar, and Google Drive.

Features

Intent Classification: LLM-powered parsing of natural language into structured intents with conversation context support
Multi-Service Orchestration: Execute queries across Gmail, Calendar, and Drive with DAG-based parallel execution
Hybrid Search: 3-way Reciprocal Rank Fusion combining BM25, vector similarity, and filtered search
Streaming Responses: Real-time SSE streaming for conversational UX
Email Composition: LLM-powered email drafting with automatic recipient resolution
Background Sync: Celery workers for incremental Google API synchronization

Quick Start

Prerequisites

Docker and Docker Compose
Python 3.11+
OpenAI API key (or Anthropic)

Setup

Clone and configure:

cp .env.example .env
# Edit .env with your API keys

Start services:
```
docker-compose up -d db redis
```
Install dependencies:
```
pip install -r requirements.txt
```
Run migrations:
```
alembic upgrade head
```
Seed mock data:
```
python -m scripts.seed_mock_data
```
Start the API:
```
uvicorn app.main:app --reload
```
Open the UI:
```
http://localhost:8000/
```

Architecture

flowchart LR
    Q[Query] --> IC[Intent Classifier]
    IC --> QP[Query Planner]
    QP --> O[Orchestrator]
    O --> GA[Gmail Agent]
    O --> CA[Calendar Agent]
    O --> DA[Drive Agent]
    GA & CA & DA --> HS[Hybrid Search]
    HS --> RS[Response Synthesizer]
    RS --> R[Streaming Response]

Hybrid Search Architecture

We use 3-way Reciprocal Rank Fusion (RRF) to combine multiple search methods:

flowchart TB
    Q[User Query] --> E[Generate Embedding]
    E --> BM25[BM25 Full-Text<br/>PostgreSQL ts_vector]
    E --> VEC[Vector Similarity<br/>pgvector cosine]
    E --> FIL[Filtered Vector<br/>+ metadata filters]
    BM25 --> RRF[RRF Fusion<br/>score = Σ 1/k+rank]
    VEC --> RRF
    FIL --> RRF
    RRF --> BOOST[Filter Boost 1.5x]
    BOOST --> TOP[Top 10 Results]

Method	What It Does	Latency
BM25	PostgreSQL full-text search	~5ms
Vector	Cosine similarity on embeddings	~20ms
Filtered	Vector + metadata (sender, date)	~20ms

See Design Documentation for detailed architecture and scaling strategy.

Sample Queries

Single Service:

"What's on my calendar next week?"
"Find emails from sarah@company.com about the budget"
"Show me PDFs in Drive from last month"

Multi-Service:

"Cancel my Turkish Airlines flight"
"Prepare for tomorrow's meeting with Acme Corp"
"Find events that conflict with my out-of-office doc"

Actions:

"Draft an email to John about the project update"
"Send it" (after reviewing draft)
"Create a meeting with Sarah tomorrow at 2pm"

Project Structure

├── app/
│   ├── api/              # API routes and dependencies
│   ├── core/             # Core logic (intent, planner, orchestrator, synthesizer)
│   ├── agents/           # Service agents (gmail, gcal, gdrive)
│   ├── services/         # Google services, embedding, cache
│   ├── db/               # Database models and connection
│   ├── schemas/          # Pydantic schemas
│   ├── evaluation/       # Search quality benchmarks
│   └── static/           # Web UI
├── alembic/              # Database migrations
├── tests/                # Test suite
├── docs/                 # Documentation
└── scripts/              # Utility scripts

API Endpoints

Method	Endpoint	Description
POST	`/api/v1/query`	Process natural language query
POST	`/api/v1/intent`	Classify intent only (no execution)
POST	`/api/v1/query/stream`	Streaming query response (SSE)
GET	`/api/v1/health`	Health check
POST	`/api/v1/sync/trigger`	Trigger data sync
GET	`/api/v1/sync/status`	Get sync status
GET	`/api/v1/metrics/precision`	Search quality benchmark

Example: Intent Classification

curl -X POST http://localhost:8000/api/v1/intent \
  -H "Content-Type: application/json" \
  -d '{"query": "Cancel my Turkish Airlines flight"}'

Response:

{
  "query": "Cancel my Turkish Airlines flight",
  "intent": {
    "services": ["gmail", "gcal"],
    "operation": "action",
    "steps": [
      {"step": "search_gmail", "params": {"search_query": "Turkish Airlines flight booking"}},
      {"step": "search_calendar", "params": {"search_query": "Turkish Airlines flight"}},
      {"step": "draft_email", "params": {"to": "support@turkishairlines.com", "subject": "Flight Cancellation Request"}}
    ],
    "confidence": 0.9
  },
  "latency_ms": 1850
}

Example: Full Query Execution

curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What important things to do this week?"}'

Response:

{
  "response": "Here's what's important this week:\n\n- Monday at 9:00 AM: Daily Standup\n- Monday at 11:00 AM: 1:1 with Manager\n- Tuesday at 10:00 AM: Acme Corp Partnership Meeting\n\nNo urgent emails found for this week.",
  "actions_taken": [
    {"step": "search_calendar", "success": true, "data": {"results": [...]}},
    {"step": "search_gmail", "success": true, "data": {"results": [...]}}
  ],
  "intent": {"services": ["gcal", "gmail"], "operation": "search", ...},
  "latency_ms": 3200
}

See API Documentation for details and Sample Queries for more examples.

Configuration

Variable	Description	Default
`DATABASE_URL`	PostgreSQL connection string	-
`REDIS_URL`	Redis connection string	-
`LLM_PROVIDER`	"openai" or "anthropic"	openai
`OPENAI_API_KEY`	OpenAI API key	-
`USE_MOCK_GOOGLE`	Use mock Google services	true

Testing

# Run tests
pytest

# Run with coverage
pytest --cov=app

# Run specific test file
pytest tests/test_intent.py

# Run search quality benchmark
pytest tests/test_precision.py

Development

# Format code
black app tests

# Type checking
mypy app

# Lint
ruff app

Technical Notes

Built without agent frameworks (LangChain, LlamaIndex) for full control
Uses pgvector for self-hosted vector search with IVFFlat indexes
Custom orchestration with DAG-based parallel execution
Hybrid search using Reciprocal Rank Fusion (RRF)
SSE streaming for real-time responses

Documentation

System Design - Architecture, ER diagrams, and scaling
API Reference - Endpoint documentation
Sample Queries - Test cases with expected outputs

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
alembic		alembic
app		app
docs		docs
scripts		scripts
tests		tests
.env.example		.env.example
.env.prod.example		.env.prod.example
.gitignore		.gitignore
.gitkeep		.gitkeep
Caddyfile		Caddyfile
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
README.md		README.md
alembic.ini		alembic.ini
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Workspace Orchestrator

Features

Quick Start

Prerequisites

Setup

Architecture

Hybrid Search Architecture

Sample Queries

Project Structure

API Endpoints

Example: Intent Classification

Example: Full Query Execution

Configuration

Testing

Development

Technical Notes

Documentation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

feat7/orchestrator

Folders and files

Latest commit

History

Repository files navigation

Google Workspace Orchestrator

Features

Quick Start

Prerequisites

Setup

Architecture

Hybrid Search Architecture

Sample Queries

Project Structure

API Endpoints

Example: Intent Classification

Example: Full Query Execution

Configuration

Testing

Development

Technical Notes

Documentation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages