tabiya-tech · irumvanselme · Mar 4, 2026 · Feb 22, 2026 · Feb 22, 2026 · Mar 2, 2026
diff --git a/.gitignore b/.gitignore
@@ -3,6 +3,7 @@
 *.env
 
 .vscode/
+.claude/
 
 credentials*.json
 .run/

diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,148 @@
+# Compass Project — AI Agent Instructions
+
+## Project Overview
+
+Compass is an AI-powered chatbot that helps job-seekers discover and articulate their skills using the ESCO (European Skills, Competences, Qualifications and Occupations) taxonomy. Users describe their work experiences in a conversational interface, and the system maps those experiences to standardized occupations and skills.
+
+> **Terminology note**: "Agent" in this codebase refers to a **Compass conversation agent** — a backend Python class that handles one phase of the user's chat conversation (e.g., welcome, experience collection, skills exploration, farewell). These are *not* AI coding agents. See the [backend instructions](copilot-instructions-backend.md) for the full agent architecture.
-> **Terminology note**: "Agent" in this codebase refers to a **Compass conversation agent** — a backend Python class that handles one phase of the user's chat conversation (e.g., welcome, experience collection, skills exploration, farewell). These are *not* AI coding agents. See the [backend instructions](copilot-instructions-backend.md) for the full agent architecture.
+> **Terminology note**: "Agent" in this codebase refers to a **Compass conversation agent** — a backend Python class that handles one phase of the user's chat conversation (e.g., welcome, experience collection, skills exploration, farewell). These are *not* AI coding agents. See the [backend instructions](instructions/backend.instructions.md) for the full agent architecture.
-> **Terminology note**: "Agent" in this codebase refers to a **Compass conversation agent** — a backend Python class that handles one phase of the user's chat conversation (e.g., welcome, experience collection, skills exploration, farewell). These are *not* AI coding agents. See the [backend instructions](copilot-instructions-backend.md) for the full agent architecture.
+> **Terminology note**: "Agent" in this codebase refers to a **Compass conversation agent** — a backend Python class that handles one phase of the user's chat conversation (e.g., welcome, experience collection, skills exploration, farewell). These are *not* AI coding agents. See the [backend instructions](instructions/backend.instructions.md) for the full agent architecture.
+
+## Repository Structure
+
+This is a monorepo with three main packages:
+
+```
+compass/
+├── backend/          # Python/FastAPI REST API + multi-agent LLM system
+├── frontend-new/     # React/TypeScript SPA (chat UI)
+├── iac/              # Pulumi infrastructure-as-code (GCP)
+└── .github/workflows # CI/CD pipelines
+```
+
+Path-specific instructions are automatically applied by Copilot when working in the relevant directories:
+- [Backend instructions](backend/AGENTS.md) — applies to `backend/**`
+- [Frontend instructions](frontend-new/AGENTS.md) — applies to `frontend-new/**`
+
+## Domain Context
+
+### What is ESCO?
+
+ESCO (European Skills, Competences, Qualifications and Occupations) is a taxonomy developed by the European Commission that standardizes how occupations and skills are classified. It was chosen over alternatives like O*NET and ISCO because it offers:
+
+- **Global breadth** with local adaptability (multi-language, region-specific skills)
+- **Simpler skill descriptions** and "alternative labels" for occupations (e.g., "data engineer" as an alternative for "data scientist")
+- **Soft skills coverage** ("attitudes and values") absent from other frameworks
+- **Green and digital economy** skill frameworks built in
+- **Frequent updates** and growing adoption, especially in Latin America
+
+### Inclusive Livelihoods Taxonomy
+
+Compass uses Tabiya's **Inclusive Livelihoods Taxonomy**, which extends ESCO to cover the full spectrum of economic activities — including informal and unpaid work that traditional frameworks exclude. It classifies work into **four categories**:
+
+1. **Wage employment** — traditional salaried/hourly work
+2. **Self-employment** — independent/freelance work
+3. **Unpaid training** — internships, apprenticeships, volunteering
+4. **Unseen/unpaid work** — caregiving, household management, community work
+
+This equity focus is core to the product — Compass must recognize and validate skills from *all* types of work, not just formal employment.
+
+### Target Users
+
+- **Primary audience**: Job-seekers in emerging markets, particularly those with informal economy experience
+- **Device context**: Mobile-first, optimized for mid-range smartphones (Samsung Galaxy A23 as reference device)
+- **Language**: Moderate English proficiency expected; multi-language support is expanding
+- **Accessibility**: 88.9% of testers found Compass easy to use — maintain this standard
+
+### Product Mission
+
+Compass helps users discover skills they already have but may not know how to articulate. It does *not* answer career questions directly — instead, it **guides users through structured conversation** to extract, classify, and present their skills in a standardized format useful for CVs, job matching, and career development.
+
+---
+
+## Tech Stack
+
+| Layer          | Technology                                                       |
+| -------------- | ---------------------------------------------------------------- |
+| Backend        | Python 3.11+, FastAPI, Uvicorn, Poetry                          |
+| LLM            | Google Vertex AI (Gemini), structured output with Pydantic       |
+| Database       | MongoDB (4 instances via Motor async driver)                     |
+| Vector Search  | MongoDB Atlas Search with Vertex AI embeddings                   |
+| Frontend       | React 18, TypeScript 5.4+, MUI 7, Webpack 5                     |
+| Auth           | Firebase Authentication (email, Google OAuth, anonymous)         |
+| i18n           | i18next (backend + frontend), locales: en-GB, en-US, es-ES, etc |
+| Infra          | GCP (Cloud Run, Cloud Storage, API Gateway), Pulumi, Docker      |
+| CI/CD          | GitHub Actions                                                   |
+| Error Tracking | Sentry (both backend and frontend)                               |
+| Testing        | pytest + in-memory MongoDB (backend), Jest + RTL (frontend)      |
+
+---
+
+## Infrastructure (`iac/`)
+
+### Pulumi Stacks
+
+```
+iac/
+├── realm/        # GCP org root, projects, user groups
+├── environment/  # Per-env GCP project creation, API enablement
+├── auth/         # Identity Platform, Firebase, OAuth providers
+├── backend/      # Cloud Run service + API Gateway
+├── frontend/     # Cloud Storage bucket for static assets
+├── common/       # Load balancer, SSL certificates, DNS records
+├── dns/          # DNS zone management
+├── aws-ns/       # AWS Route 53 name server delegation
+├── lib/          # Shared utilities and types
+└── scripts/      # Deployment orchestration (prepare.py, up.py)
+```
+
+### Deployment
+
+- **Backend**: Docker image → GCP Artifact Registry → Cloud Run (port 8080, linux/amd64)
+- **Frontend**: Build artifact (tar.gz) → GCP Artifact Registry → Cloud Storage bucket
+- **DNS**: GCP Cloud DNS + AWS Route 53 for delegation
+
+### Environment Hierarchy
+
+- **Realm**: Top-level container (`compass-realm`) with org access
+- **Environment naming**: `{realm}.{env}` (e.g., `compass.dev`, `compass.prod`)
+- **Types**: `dev`, `test`, `prod` — separate GCP service accounts for lower vs production envs
+
+---
+
+## CI/CD (`.github/workflows/`)
+
+### Pipeline Flow
+
+1. **Every push**: Frontend CI (format, lint, compile, test, a11y) + Backend CI (bandit, pylint, pytest) run in parallel
+2. **Main branch** with `[pulumi up]` in commit message: Build artifacts + deploy to dev
+3. **Release creation**: Build artifacts + deploy to test, then production
+
+### Key Workflows
+
+| File              | Purpose                                |
+| ----------------- | -------------------------------------- |
+| `main.yml`        | Orchestrates all CI/CD jobs            |
+| `frontend-ci.yml` | Frontend checks, build, artifact upload |
+| `backend-ci.yml`  | Backend checks, Docker build & push    |
+| `config-ci.yml`   | Template/config uploads                |
+| `deploy.yml`      | Pulumi deployment to target env        |
+
+---
+
+## Development Guidelines
+
+### File Organization
+
+- Tests alongside source files (`*_test.py`, `*.test.tsx`)
+- No separate `tests/` directories
+- Feature modules are self-contained with routes, services, models, and tests
- Tests alongside source files (`*_test.py`, `*.test.tsx`)
- No separate `tests/` directories
- Feature modules are self-contained with routes, services, models, and tests
+- Unit tests live alongside source files (`*_test.py`, `*.test.tsx`)
+- Higher-level suites (smoke, e2e, evaluation) use dedicated test directories (e.g. `backend/evaluation_tests/`, `backend/smoke_test/`, `frontend-new/test/smoke/`)
+- Feature modules are self-contained with routes, services, models, and unit tests
- Tests alongside source files (`*_test.py`, `*.test.tsx`)
- No separate `tests/` directories
- Feature modules are self-contained with routes, services, models, and tests
+- Unit tests live alongside source files (`*_test.py`, `*.test.tsx`)
+- Higher-level suites (smoke, e2e, evaluation) use dedicated test directories (e.g. `backend/evaluation_tests/`, `backend/smoke_test/`, `frontend-new/test/smoke/`)
+- Feature modules are self-contained with routes, services, models, and unit tests
+
+### Code Style
+
+- **Backend**: Python type hints, Pydantic models, async/await, pylint + bandit
+- **Frontend**: TypeScript strict mode, ESLint + Prettier, MUI styled components
+
+### Environment Variables
+
+- Backend: see `backend/.env.example`
+- Frontend: see env vars loaded in `frontend-new/src/envService.ts`
+- Infrastructure: see `iac/templates/env.template` for full reference
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
diff --git a/backend/AGENTS.md b/backend/AGENTS.md
@@ -0,0 +1,174 @@
+# Compass Backend — AI Agent Instructions
+
+## Entry Point & Server
+
+- **`backend/app/server.py`** — FastAPI application with async lifespan management. Initializes 4 MongoDB connections in parallel, validates environment, sets up CORS and Brotli middleware, and exposes the conversation API.
- **`backend/app/server.py`** — FastAPI application with async lifespan management. Initializes 4 MongoDB connections in parallel, validates environment, sets up CORS and Brotli middleware, and exposes the conversation API.
+- **`backend/app/server.py`** — FastAPI application with async lifespan management. Fetches 4 MongoDB DB handles and then concurrently initializes the application/userdata/metrics databases and runs taxonomy validation, validates environment, sets up CORS and Brotli middleware, and exposes the conversation API.
- **`backend/app/server.py`** — FastAPI application with async lifespan management. Initializes 4 MongoDB connections in parallel, validates environment, sets up CORS and Brotli middleware, and exposes the conversation API.
+- **`backend/app/server.py`** — FastAPI application with async lifespan management. Fetches 4 MongoDB DB handles and then concurrently initializes the application/userdata/metrics databases and runs taxonomy validation, validates environment, sets up CORS and Brotli middleware, and exposes the conversation API.
+- Runs on port 8080 via Uvicorn.
+- Configuration is loaded from environment variables (see `backend/.env.example`).
+
+## Multi-Agent Architecture
+
+> **Terminology note**: "Agent" in this codebase refers to a **Compass conversation agent** — a backend Python class that handles one phase of the user's skills exploration conversation. These are *not* AI coding agents. Each agent wraps LLM calls with domain-specific prompts, manages conversation state, and produces structured responses. They live in `backend/app/agent/`.
+
+### What is a Compass Agent?
+
+Every agent extends the abstract base class `Agent` (`backend/app/agent/agent.py`) and implements a single method:
+
+```python
+async def execute(self, user_input: AgentInput, context: ConversationContext) -> AgentOutput
+```
+
+- **`AgentInput`** — The user's message text, a message ID, and a timestamp.
+- **`ConversationContext`** — Full conversation history plus a summary of older turns.
+- **`AgentOutput`** — The agent's response message, a `finished` flag (signals phase transition), and LLM usage stats.
+
+There are two implementation patterns:
+- **`SimpleLLMAgent`** — For stateless agents that make a single LLM call per turn (e.g., `FarewellAgent`, `QnaAgent`). Just provide system instructions.
+- **`Agent` (direct)** — For complex agents with internal state, multiple LLM calls, or sub-agent orchestration (e.g., `WelcomeAgent`, `ExploreExperiencesAgentDirector`).
+
+Stateful agents persist their state to MongoDB via a state object (e.g., `WelcomeAgentState`).
+
+### Agent Hierarchy
+
+```
+Agent Director (LLM-based router)
+├── WelcomeAgent           — Greets user, explains process, handles country selection
+├── ExploreExperiencesAgentDirector (sub-director)
+│   ├── CollectExperiencesAgent  — Gathers work experience details
+│   ├── SkillsExplorerAgent      — Explores and validates identified skills
+│   └── Experience Pipeline      — LLM-driven skill linking & ranking
+│       ├── ClusterResponsibilitiesTool
+│       ├── InferOccupationTool
+│       ├── SkillLinkingTool
+│       └── PickTopSkillsTool
+└── FarewellAgent          — Concludes conversation, returns summary
+```
+
+### Conversation Phases & Routing
+
+**Phases**: `INTRO → COUNSELING → CHECKOUT → ENDED`
+
+The **Agent Director** (`agent_director/llm_agent_director.py`) selects the appropriate agent for each user message via an LLM router. The router produces structured output (`RouterModelResponse` with `reasoning` and `agent_type` fields) and falls back to a default agent per phase if the LLM fails.
+
+When an agent returns `finished=True`, the Director transitions to the next phase. It can auto-advance through multiple phases in one turn by sending an artificial `"(silence)"` message to the next agent.
+
+## LLM Strategy
+
+The system uses LLMs for four distinct purposes — understand which one applies before modifying agent code:
+
+1. **Conversational direction** — Generates guided questions to steer users through the skills exploration process. The LLM asks questions, it does *not* answer them.
+2. **NLP tasks** — Clustering, entity extraction, and classification without model fine-tuning. Used in the skills identification pipeline.
+3. **Explainability** — Chain-of-Thought reasoning traces outputs back to user inputs, making the system's decisions transparent.
+4. **Taxonomy filtering** — Hybrid approach combining semantic vector search with LLM-based filtering to match user input against ESCO skills/occupations.
+
+### Model Versions
+
+Configured in `backend/app/agent/config.py`:
+
+| Purpose              | Model                   |
+| -------------------- | ----------------------- |
+| LLM (default/fast)   | `gemini-2.5-flash-lite` |
+| LLM (deep reasoning) | `gemini-2.5-flash`      |
+| LLM (ultra reasoning) | `gemini-2.5-pro`       |
+| Embeddings           | `text-embedding-005`    |
+
+### Hallucination Prevention
+
+When modifying agent prompts or LLM interactions, preserve these guardrails:
+
+- **Task decomposition** — Each agent has a narrow, specific objective. Don't merge responsibilities.
+- **State guardrails** — Favor rule-based logic over LLM decisions for control flow (e.g., phase transitions).
+- **Guided outputs** — Use few-shot examples, JSON schemas with Pydantic validation, and semantic ordering to constrain LLM responses.
+- **Taxonomy grounding** — All skill/occupation outputs must be linked to taxonomy entries. Never let the LLM invent skills or occupations.
+
+## Skills Identification Pipeline
+
+After gathering user experiences, the system processes data through a multi-stage pipeline in `backend/app/agent/`:
+
+```
+User experiences (raw text)
+  → ClusterResponsibilitiesTool  — Groups related responsibilities via LLM clustering
+  → InferOccupationTool          — Maps clusters to ESCO occupations via vector search + LLM filtering
+  → SkillLinkingTool             — Links occupations to specific ESCO skills
+  → PickTopSkillsTool            — Ranks and selects the user's top skills
+```
+
+The pipeline ensures outputs are **grounded in the taxonomy** — skills are never hallucinated, they are always linked to real ESCO entries via entity linking.
+
+## Conversation API
+
+- **`backend/app/conversations/routes.py`** — Two endpoints:
+  - `POST /conversations/{session_id}/messages` — Send user message, get agent response
+  - `GET /conversations/{session_id}/messages` — Retrieve conversation history
+- Max message length: 1000 characters
+- Session ownership is verified per request
+
+## Database Architecture
+
+Four separate MongoDB instances (connected via Motor async driver):
+
+| Database    | Purpose                                          |
+| ----------- | ------------------------------------------------ |
+| Taxonomy    | ESCO occupations, skills, embeddings for search  |
+| Application | Conversation state, session data, user reactions |
+| Userdata    | Encrypted PII, CV uploads                        |
+| Metrics     | Application state snapshots, analytics           |
+
+Connection management uses a singleton provider pattern (`CompassDBProvider`) with async locks.
+
+## Vector Search & Embeddings
+
+- **`backend/app/vector_search/`** — Template method pattern for occupation and skill search
+- Embeds user input via Vertex AI (`text-embedding-005` model)
+- Searches MongoDB Atlas vector indexes
+- Async LRU cache for occupation-skill associations (up to ~223MB)
+
+## LLM Integration
+
+- **`backend/common_libs/llm/`** — Gemini generative model wrapper
+- Structured output: LLM responses are parsed into Pydantic models
+- Retry logic: 3 attempts with increasing temperature (0.1 → 1.0) and top-P variation
+- JSON extraction from LLM text with validation
+- Token usage tracking via `LLMStats`
+
+## Authentication & Rate Limiting
+
+- JWT-based (Firebase tokens) via `HTTPBearer`
+- API key auth for search endpoints (header: `x-api-key`)
+- Supported providers: anonymous, password, Google OAuth
+- **Rate limit**: 2 requests per minute per API key by default (HTTP 429 when exceeded)
+
+## Key Patterns
+
+- **Dependency injection** via FastAPI's `Depends()`
+- **Async-first**: all I/O is async (Motor, LLM calls, HTTP)
+- **Repository pattern** for data access
+- **Pydantic v2** for all data models and validation
+- **Feature flags** via `BACKEND_FEATURES` environment variable (JSON)
+
+## Testing
+
+```bash
+poetry run pytest -m "not (evaluation_test or smoke_test)"  # Unit/integration tests
+poetry run pylint --recursive=y .                            # Linting
+poetry run bandit -c bandit.yaml -r .                        # Security scanning
+```
+
+- Tests live alongside source: `*_test.py`
+- In-memory MongoDB via `pymongo_inmemory`
+- Async test support via `pytest-asyncio`
+
+## Adding a New Agent
+
+1. Create agent class in `backend/app/agent/` implementing the agent interface
+2. Register it in the Agent Director's phase configuration
+3. Define LLM prompt and response schema (Pydantic model)
+4. Add routing rules in `_llm_router.py`
+5. Write tests with mocked LLM responses
+
+## Adding a New API Endpoint
+
+1. Create route in appropriate module under `backend/app/`
+2. Use FastAPI `Depends()` for auth and service injection
+3. Define Pydantic request/response models
+4. Add tests using in-memory MongoDB fixtures
diff --git a/backend/CLAUDE.md b/backend/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
-Original file line number
+Diff line change
@@ Expand Up / @@ -3,6 +3,7 @@ @@
     *.env
     .vscode/
+    .claude/
     credentials*.json
     .run/
@@ Expand Down @@