MIST - Multi-modal Intelligent Service Technician

AI-powered automotive diagnostic system that maps fault codes and live OBD data to precise repair guide recommendations.

Demo: mist-expo.vercel.app/sign-in

Overview

MIST combines:

Multi-Modal Embeddings: Fault codes (text) + OBD sensor data (structured)
Conversational RAG: LLM-guided clarification for ambiguous cases
Self-Improving: Learns from feedback and repair outcomes
Knowledge Graph Integration: Leverages BMW diagnostic database relationships

Setup

Prerequisites

Python 3.12+ (recommended)
Virtual environment (recommended)
GPU recommended for training and embedding generation (optional)

Installation Steps

Clone the repository (if not already done)
```
cd mist
```

Create and activate virtual environment

python3.12 -m venv .venv
source .venv/bin/activate  # On macOS/Linux
# On Windows: .venv\Scripts\activate

Install dependencies

pip install --upgrade pip
pip install -r requirements.txt
pip install -e .  # Editable install: enables `mist-cli` and importable packages

Optional — lint + git hooks (Ruff + Lefthook):

pip install -e ".[dev]"
lefthook install

Set up environment variables

Create a .env file in the project root (if .env.example exists, copy it):

# If .env.example exists:
cp .env.example .env

# Edit .env and add your API keys:
# OPENAI_API_KEY=your_openai_key
# ANTHROPIC_API_KEY=your_anthropic_key
# COHERE_API_KEY=your_cohere_key  # Optional, for re-ranking

# ChromaDB Cloud (required for vector store, see step 7):
# CHROMA_DB_API_KEY=your-chromadb-api-key
# CHROMA_DB_TENANT=your-chromadb-tenant-id

Run database migrations

mist-cli migrate
# Or: python scripts/run_migrations.py

Build knowledge graph (one-time setup)

mist-cli build-kg
# Or: python scripts/build_knowledge_graph.py

Configure ChromaDB Cloud (required for vector store)
- Set in .env:
```
CHROMA_DB_API_KEY=your-api-key
CHROMA_DB_TENANT=your-tenant-id
```
- See config/retrieval_config.yaml for collection and database settings.
Index repair guides (one-time setup, may take a while)
```
mist-cli index
# Or: python -m src.cli.main index
# Or: python scripts/index_repair_guides.py
```
Use mist-cli (not mist) to avoid conflict with the npm package.

Training

After collecting feedback data through the API, you can train the embeddings to improve retrieval performance.

Training Embeddings

Train embeddings using contrastive learning on collected feedback:

# Basic training with default configs
mist-cli train

# With custom config files
mist-cli train --config config/training_config.yaml --embedding-config config/embedding_config.yaml

# Resume from checkpoint
mist-cli train --resume data/embeddings/checkpoints/checkpoint_epoch_5.pt

# Specify device (auto, cuda, or cpu)
mist-cli train --device cuda

# Set logging level
mist-cli train --log-level DEBUG

Training Configuration

Edit config/training_config.yaml to adjust:

Learning rate
Batch size
Number of epochs
Checkpoint frequency
Loss function parameters

The training script will:

Load feedback data from the feedback database
Create positive/negative pairs for contrastive learning
Fine-tune the multi-modal encoder using InfoNCE loss
Save checkpoints periodically
Save the final trained model

Demo Usage

Option 1: Using the Python API

from src.retrieval.conversational_rag import ConversationalRAG

# Initialize the RAG system
rag = ConversationalRAG()

# Query with fault codes and OBD data
response = rag.query(
    fault_codes=["P0300", "Random/Multiple Cylinder Misfire"],
    obd_data={
        "engine_rpm": 2500,
        "coolant_temp": 95,
        "throttle_position": 45,
        "maf_sensor": 12.5
    },
    vehicle_context={
        "model": "335i",
        "year": 2011,
        "engine": "N55"
    }
)

# Check if clarification is needed
if response['needs_clarification']:
    print("Clarification questions:")
    for question in response['clarification_questions']:
        print(f"  - {question}")
    
    # Provide responses and get refined recommendations
    refined = rag.clarify(
        session_id=response['session_id'],
        responses=["Yes, the check engine light is flashing", "No unusual sounds"]
    )
    recommendations = refined['recommendations']
else:
    recommendations = response['recommendations']

# Display recommendations
print(f"\nFound {len(recommendations)} recommendations:")
for rec in recommendations:
    print(f"\n{rec['title']}")
    print(f"  Score: {rec['score']:.3f}")
    print(f"  Procedure: {rec['procedure_name']}")

Option 2: Using the HTTP API

Start the API server

python -m src.api.server
# Or with uvicorn:
uvicorn src.api.server:app --host 0.0.0.0 --port 8000

Query the API (using curl or any HTTP client)

# Health check
curl http://localhost:8000/health

# Query with fault codes and OBD data
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "fault_codes": ["P0300", "Random/Multiple Cylinder Misfire"],
    "obd_data": {
      "engine_rpm": 2500,
      "coolant_temp": 95,
      "throttle_position": 45
    },
    "vehicle_context": {
      "model": "335i",
      "year": 2011
    }
  }'

Example response:

{
  "recommendations": [
    {
      "id": "guide_123",
      "title": "Misfire Diagnosis - Cylinder 1",
      "procedure_name": "Misfire Detection",
      "procedure_id": "PROC_001",
      "score": 0.92,
      "text": "Check ignition coil and spark plug..."
    }
  ],
  "needs_clarification": false,
  "clarification_questions": null,
  "session_id": "session_abc123",
  "query_text": "Random/Multiple Cylinder Misfire P0300"
}

Handle clarification (if needed)

curl -X POST http://localhost:8000/clarify \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "session_abc123",
    "responses": [
      "Yes, the check engine light is flashing",
      "No unusual sounds"
    ]
  }'

Submit feedback

# Submit rating
curl -X POST http://localhost:8000/feedback/rating \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "session_abc123",
    "rating": 5,
    "selected_guide": "guide_123"
  }'

# Submit repair outcome
curl -X POST http://localhost:8000/feedback/outcome \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "session_abc123",
    "outcome": "success"
  }'

View API documentation

Once the server is running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc

Option 3: Using Python requests

import requests

BASE_URL = "http://localhost:8000"

# Query
response = requests.post(
    f"{BASE_URL}/query",
    json={
        "fault_codes": ["P0300"],
        "obd_data": {"engine_rpm": 2500, "coolant_temp": 95},
        "vehicle_context": {"model": "335i", "year": 2011}
    }
)
result = response.json()

print(f"Session ID: {result['session_id']}")
print(f"Recommendations: {len(result['recommendations'])}")

# Submit feedback
requests.post(
    f"{BASE_URL}/feedback/rating",
    json={
        "session_id": result['session_id'],
        "rating": 5,
        "selected_guide": result['recommendations'][0]['id']
    }
)

Project Structure

mist/
├── src/              # Source code
│   ├── embeddings/   # Multi-modal encoders
│   ├── retrieval/    # RAG components
│   ├── knowledge/    # Knowledge graph
│   ├── llm/          # LLM providers
│   ├── feedback/     # Feedback system
│   ├── learning/     # Self-improvement
│   └── api/          # FastAPI server
├── apps/mist-expo/   # Expo + Tamagui client (web, iOS, Android) — see apps/mist-expo/README.md
├── config/           # YAML configuration files
├── scripts/          # Utility scripts
├── data/             # Databases, vector store, checkpoints
├── tests/            # Test suite
└── docs/             # Documentation

Documentation

Topic-based context for humans and AI agents lives under docs/context/. Start at docs/context/README.md (navigation tree).

docs/context/OVERVIEW.md: One-page condensed spine
docs/context/core/SPEC.md: Path-grounded spec and flows
docs/context/core/ARCHITECTURE.md: Architecture and getting started
docs/context/data/DATABASE.md: BMW ISTA database overview

Configuration

Edit YAML files in config/:

embedding_config.yaml: Embedding model settings
retrieval_config.yaml: Retrieval parameters
llm_config.yaml: LLM provider settings
training_config.yaml: Training hyperparameters

Features

Multi-Stage Retrieval: Vector search → Re-ranking → KG filtering → Combined scoring
Conversational Clarification: LLM-guided questions for ambiguous cases
Self-Improving: Continuous learning from user feedback
Knowledge Graph: Exploits BMW diagnostic relationships for explainable recommendations

Requirements

Python 3.12+ (recommended)
Virtual environment (recommended)
See requirements.txt for dependencies
GPU recommended for training and embedding generation (optional)

License

[Add license information]

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
apps/mist-expo		apps/mist-expo
config		config
data		data
docs/context		docs/context
k8s		k8s
scrapers		scrapers
scripts		scripts
src		src
terraform		terraform
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
DEPLOY.md		DEPLOY.md
Dockerfile		Dockerfile
Dockerfile.indexer		Dockerfile.indexer
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
IMPROVEMENT_LOG.md		IMPROVEMENT_LOG.md
README.md		README.md
ab_test_results.json		ab_test_results.json
ab_test_retrieval.py		ab_test_retrieval.py
create_test_dataset.py		create_test_dataset.py
deep_dive_results.json		deep_dive_results.json
deploy.sh		deploy.sh
deploy_and_run.sh		deploy_and_run.sh
destroy.sh		destroy.sh
enhanced_test_report.json		enhanced_test_report.json
improvement_final_results.json		improvement_final_results.json
improvement_round2_results.json		improvement_round2_results.json
improvement_round3_results.json		improvement_round3_results.json
improvement_test_results.json		improvement_test_results.json
lefthook.yml		lefthook.yml
lifecycle.json		lifecycle.json
mist.py		mist.py
pyproject.toml		pyproject.toml
quick-deploy.sh		quick-deploy.sh
requirements-scraper.txt		requirements-scraper.txt
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
test_dataset.json		test_dataset.json
test_deep_dive.py		test_deep_dive.py
test_enhanced_matching.py		test_enhanced_matching.py
test_feedback_implementation.py		test_feedback_implementation.py
test_final.py		test_final.py
test_improvements.py		test_improvements.py
test_matching.py		test_matching.py
test_report.json		test_report.json
test_round2.py		test_round2.py
test_round3.py		test_round3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIST - Multi-modal Intelligent Service Technician

Overview

Setup

Prerequisites

Installation Steps

Training

Training Embeddings

Training Configuration

Demo Usage

Option 1: Using the Python API

Option 2: Using the HTTP API

Option 3: Using Python requests

Project Structure

Documentation

Configuration

Features

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MIST - Multi-modal Intelligent Service Technician

Overview

Setup

Prerequisites

Installation Steps

Training

Training Embeddings

Training Configuration

Demo Usage

Option 1: Using the Python API

Option 2: Using the HTTP API

Option 3: Using Python requests

Project Structure

Documentation

Configuration

Features

Requirements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages