Chatbot Analytics Lab

End-to-end experimentation environment for exploring chatbot conversation datasets, training intent classifiers, and visualizing performance. The project bundles a FastAPI service, Streamlit dashboard, data processing utilities, and monitoring hooks so teams can ingest conversation logs, iterate on models, and track outcomes in one place.

Features

Dataset ingestion – Load banking-focused datasets (BANKING77, Bitext, Schema-Guided, etc.), validate them, and persist summaries via the API.
Training pipeline – Orchestrated workflow for preprocessing, splitting, training BERT-based intent classifiers, evaluating, and versioning artifacts.
Experiment tracking – Store metrics, confusion matrices, and run metadata for later inspection in the dashboard.
Interactive dashboard – Streamlit app for experiments, intent distributions, conversation flows, and sentiment trends with CSV/PDF exports.
Monitoring & alerts – Request latency metrics, system health probes, and pluggable alert channels (log/webhook) for threshold breaches.
Docker-native workflow – Dockerfiles and compose stack to run the API and dashboard together with minimal setup.

Architecture

src/api: FastAPI application exposing dataset, conversation, intent, and training endpoints with middleware for rate limiting and metrics.
src/services: Core business logic (data preprocessing, validation, model training, performance analysis, backups, hyperparameter search).
src/models: Shared data models, dataset abstractions, and the BERT-powered IntentClassifier.
dashboard/: Streamlit front-end that consumes persisted analytics and experiment results.
src/repositories: Persistence layer for datasets, conversations, experiments, and trained model artifacts.
src/monitoring: System and request monitoring utilities plus alert channel integrations.

SQLite (default) stores metadata, while model artifacts and processed data land under ./models and ./data.

Repository Layout

chatbotAnalyticsLab/
├── src/                  # Application source (API, services, utilities)
├── dashboard/            # Streamlit entry point
├── Dataset/              # Place raw datasets here (BANKING77, Bitext, etc.)
├── data/                 # Generated processed data, caches, and SQLite DB volume
├── models/               # Saved model checkpoints and tokenizer assets
├── tests/                # Pytest suite (API, monitoring, persistence, backups)
├── requirements.txt      # Python dependencies
├── docker-compose.yml    # API + dashboard stack definition
└── Makefile              # Helper commands for Docker workflows

Getting Started

Prerequisites
- Python 3.10+
- Optional: Docker 24+ if you plan to run the stack in containers

Create a virtual environment

python3 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Configure settings (optional)
- Default values live in src/config/settings.py and config.json.
- Override by exporting environment variables (e.g. DATABASE_URL, DEFAULT_MODEL, DATASET_DIR) before running services.

Running the Services

Initialize the workspace

Populate required directories and the SQLite schema, then verify configuration:

python -m src.main

FastAPI application

uvicorn src.api.app:app --reload --host 0.0.0.0 --port 8000

Interactive docs: http://localhost:8000/docs
Key routes: /datasets/upload, /conversations/{id}, /intents/predict, /training/run, /training/hyperparameter-search, /health/live

Streamlit dashboard

Ensure the API and database artifacts are accessible, then launch:

streamlit run dashboard/app.py

Provides overviews, experiment history, intent distributions, conversation flow analytics, and export buttons (CSV/PDF).

Docker Compose (API + dashboard)

Build and run both services:

docker compose up --build

Volumes map ./data, ./logs, and ./models into the containers for persistence. Adjust ports or resource limits in docker-compose.yml.

Working With Datasets & Models

Place raw datasets under Dataset/ (e.g. Dataset/BANKING77/).

Upload via API:

curl -X POST http://localhost:8000/datasets/upload \
  -H "Content-Type: application/json" \
  -d '{"dataset_type": "banking77", "file_path": "Dataset/BANKING77", "preprocess": true}'

Trigger training (runs synchronously by default):

curl -X POST http://localhost:8000/training/run \
  -H "Content-Type: application/json" \
  -d '{"dataset_type": "banking77", "training_overrides": {"num_epochs": 3}}'

Trained artifacts, evaluation reports, and run logs are stored under ./pipeline_runs/<model_id_timestamp>/ and registered through the experiment tracker.

Monitoring & Alerts

Request metrics and latencies are collected via middleware and stored under app.state.metrics_collector.
src/monitoring/system.py samples CPU/memory; thresholds come from settings.alerts.
Configure alert channels by setting ALERT_CHANNELS (comma-separated, e.g. log,webhook:https://hooks.example.com).

Testing

pytest

The suite covers API integration, persistence, backup/restore logic, alerting, and performance helpers. Add new tests under tests/ mirroring the module namespace.

Helpful Commands

make build-api / make build-dashboard – build respective Docker images.
make compose-up / make compose-down – manage the docker-compose stack.
make clean-images / make clean-volumes – tidy up local Docker resources.

Troubleshooting

Missing datasets: check that the folder structure matches your DatasetType value (Dataset/banking77, Dataset/bitext, etc.).
Model downloads: set MODEL_CACHE_DIR to a writable location if running inside restricted environments.
Long training runs: switch to async execution by posting {"async_run": true} to /training/run and monitor progress via experiment logs.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
Dataset		Dataset
assignment		assignment
dashboard		dashboard
examples		examples
experiments		experiments
logs		logs
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile.api		Dockerfile.api
Dockerfile.dashboard		Dockerfile.dashboard
Makefile		Makefile
Procfile		Procfile
README.md		README.md
chatbot_analytics.db		chatbot_analytics.db
config.json		config.json
do_app_spec.json		do_app_spec.json
docker-compose.yml		docker-compose.yml
pipeline_output.log		pipeline_output.log
pytest.ini		pytest.ini
quick_pipeline.log		quick_pipeline.log
requirements.api.txt		requirements.api.txt
requirements.dashboard.txt		requirements.dashboard.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chatbot Analytics Lab

Features

Architecture

Repository Layout

Getting Started

Running the Services

Initialize the workspace

FastAPI application

Streamlit dashboard

Docker Compose (API + dashboard)

Working With Datasets & Models

Monitoring & Alerts

Testing

Helpful Commands

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chatbot Analytics Lab

Features

Architecture

Repository Layout

Getting Started

Running the Services

Initialize the workspace

FastAPI application

Streamlit dashboard

Docker Compose (API + dashboard)

Working With Datasets & Models

Monitoring & Alerts

Testing

Helpful Commands

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages