Visualization demo of how search engines determine ad topics and audience segments.
Search UI that evolves into a word cloud and audience-segment view, backed by a Python machine-learning taxonomy service.
- Word cloud layout
- Full Python backend:
- FastAPI API layer
- Online-learning text classifier (hashing + SGD log-loss)
- Event persistence in PostgreSQL (SQLAlchemy)
- Taxonomy training corpus generation
- Feedback + retraining endpoints
- Hidden global knowledge base (shared across all users):
global_feedback_eventsglobal_conversion_affinityglobal_model_registry
- Private per-user state (keyed by
user_id, clearable):user_search_eventsuser_feedback_eventsuser_category_totalsuser_tag_totalsuser_embeddinguser_conversion_affinity
POST /history/clear only deletes user_* rows for the active user and does not touch global tables.
- Frontend:
index.html,styles.css,app.js - Backend package:
backend/main.pyAPI endpointstaxonomy_data.pytaxonomy + seed corpus + tag expansionmodel_service.pymodel training/inference/online updatesstore.pypersistent event and aggregate store
- Frontend: Azure Static Web Apps
- Backend: Azure Container Apps
- Database: Azure Database for PostgreSQL Flexible Server
- Secrets: Azure Key Vault
- Artifacts: Azure Blob Storage (recommended for model files)
For production, use Entra ID / B2C issued JWTs and avoid X-User-Id fallback.
docker build -f backend/Dockerfile -t search-ad-learning-backend:latest .
docker run --rm -p 8001:8001 \
-e DATABASE_URL="postgresql+psycopg://<user>:<password>@<host>:5432/<db>" \
search-ad-learning-backend:latestGET /healthGET /taxonomyPOST /searchwith{ "query": "..." }POST /feedbackwith{ "query": "...", "category": "...", "confidence": 1.0 }POST /retrain
Interactive search UI that evolves into a word cloud and audience-segment view, backed by a Python machine-learning taxonomy service.
This project demonstrates:
- Online learning with user feedback loops
- Per-user state isolation with shared global knowledge
- Full-stack ML system design (UI → API → model → persistence)
- Cloud-ready architecture (Azure target)
Search Ad Learning simulates how search intent signals evolve into audience segments and ad targeting signals.
High-level flow:
- User submits a search query
- Backend predicts category probabilities
- UI updates segment view + word cloud
- User provides feedback or conversion signals
- Model updates incrementally (online learning)
- State persists per-user while global signals accumulate
The system supports both:
- Backend ML mode (FastAPI + PostgreSQL)
- Local fallback learning (browser-only simulation)
index.htmlstyles.cssapp.js
Responsibilities:
- Query submission
- Visualization (word cloud + segment bars)
- Feedback collection
- Conversion click simulation
- Local fallback learning
The frontend dynamically resolves backend URLs and gracefully degrades if the API is offline.
Located in backend/:
main.py– API routesmodel_service.py– Online learning classifier (hashing + SGD log-loss)taxonomy_data.py– Taxonomy definitions + seed corpusstore.py– Persistence layer (SQLAlchemy)
Core API endpoints:
GET /healthPOST /searchPOST /feedbackPOST /retrainPOST /history/clearPOST /conversion/click
Shared learning across all users:
global_feedback_eventsglobal_conversion_affinityglobal_model_registry
Isolated by user_id:
user_search_eventsuser_feedback_eventsuser_category_totalsuser_tag_totalsuser_embeddinguser_conversion_affinity
POST /history/clear only deletes user_* rows.
This separation simulates multi-tenant ad-learning systems.
The backend classifier uses:
- Hashing-based feature extraction
- SGD with log-loss (incremental updates)
- Online updates from feedback signals
- Confidence weighting for corrections
Learning sources:
- Initial taxonomy seed corpus
- Query text features
- Explicit user feedback
- Conversion click events
This allows the model to evolve without full retraining cycles.
Azure reference architecture:
- Frontend: Azure Static Web Apps
- Backend: Azure Container Apps
- Database: Azure Database for PostgreSQL Flexible Server
- Secrets: Azure Key Vault
- Model artifacts: Azure Blob Storage
- Auth: Entra ID / B2C JWT validation
Local development uses X-User-Id header fallback.
GitHub Actions pipeline:
- Lint + test (planned expansion)
- Docker build
- Deploy to Azure Container Apps
- Static frontend deployment
Future improvements:
- Enforce test coverage thresholds
- Add model regression evaluation step
- Add security scanning (bandit / dependency audit)
Planned improvements:
- Unit tests for model service
- API integration tests
- Persistence layer tests
- Failure case tests (invalid input, offline backend)
- Offline validation dataset
- Accuracy / F1 tracking
- Confusion matrix export
- Drift tracking across retrain cycles
- Retrieval quality metrics
- Signal weighting analysis
- Conversion lift tracking
- JWT verification support
- Per-user isolation
- No secrets in frontend
- Cloud secret management (Key Vault)
- Clear separation between local fallback and production mode
Future improvements:
- Rate limiting
- Feedback poisoning detection
- Structured audit logging
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt
export DATABASE_URL="postgresql+psycopg://<user>:<password>@<host>:5432/<db>"
uvicorn backend.main:app --host 127.0.0.1 --port 8001 --reloadpython3 -m http.server 8000Open:
This project explores:
- Online ML in interactive systems
- Feedback-driven model updates
- Multi-tenant ML state isolation
- Bridging frontend UX with ML pipelines
It is intentionally designed as a clean, non-technical entry point into applied ML system design.
MIT