Production-ready observability platform for Glean deployments with USE methodology monitoring and comprehensive cost analytics.
- π Quick Deployment Guide - Deploy in under 10 minutes
- π Complete Setup Guide - Detailed setup for local and production
- π§ͺ Testing Guide - Local testing instructions
- π§ Migration Status - Track feature completeness
- Frontend: Next.js 15 (React) + TypeScript + Tailwind CSS + Recharts
- Backend: FastAPI (Python 3.13) + Google Cloud APIs
- Monitoring: Prometheus (GKE Managed) + Cloud Monitoring API
- Deployment: Vercel (Frontend) + Cloud Run (Backend)
- Cost: ~$5-20/month for typical usage
- Utilization: CPU %, Memory %, Pod Availability
- Saturation: CPU Throttling, Queue Backlogs
- Errors: Pod Restarts, Crash Loops
- Real-time GKE cluster cost estimates
- Cloud SQL instance costs
- Total infrastructure spend
- Daily/Monthly projections
- π Crawler (Content Ingestion)
- π‘ Datasource Events Handler
- π§ Query Parser (NLP/ML)
- β‘ Query Engine
- 𧬠Semantic Index (Qdrant Vector DB)
- π Keyword Index (Cloud SQL)
- π₯ User Data Layer
- π Load Balancer
βββββββββββββββββββββββββββββββββββββββ
β Next.js Frontend β
β (TypeScript + Tailwind) β
β β
β - Dashboard UI β
β - USE Metric Visualization β
β - Cost Charts β
β - Component Health Cards β
ββββββββββββββββ¬βββββββββββββββββββββββ
β REST API
β
βββββββββββββββββββββββββββββββββββββββ
β FastAPI Backend β
β (Python) β
β β
β - Prometheus Client β
β - GCP Metrics Client β
β - Cost Estimator β
β - Config Management β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
β
βββββββββββββββββββββββββββββββββββββββ
β GCP Services β
β β
β - GKE Managed Prometheus β
β - Cloud Monitoring API β
β - Cloud SQL β
β - Cloud Billing API β
βββββββββββββββββββββββββββββββββββββββ
glean-observability-dashboard/
βββ frontend/ # Next.js application
β βββ src/
β β βββ app/ # App router pages
β β βββ components/ # React components
β β βββ lib/ # Utilities
β β βββ types/ # TypeScript types
β βββ package.json
β
βββ backend/ # FastAPI application
β βββ app/
β β βββ api/ # API endpoints
β β βββ clients/ # GCP/Prometheus clients
β β βββ models/ # Data models
β β βββ main.py # FastAPI app
β βββ requirements.txt
β
βββ README.md
For teammates: See SETUP_GUIDE.md for complete setup instructions with GCP authentication.
For deployment: See DEPLOYMENT.md for production deployment in 10 minutes.
# 1. Authenticate with GCP
gcloud auth application-default login
# 2. Start backend (Terminal 1)
cd backend && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt && uvicorn app.main:app --reload --port 8000
# 3. Start frontend (Terminal 2)
cd frontend && npm install && npm run devOpen http://localhost:3000 and you're ready! π
Click the deploy buttons at the top of this README for one-click deployment to Vercel and Cloud Run.
See DEPLOYMENT.md for step-by-step instructions.
Summary:
- Deploy backend to Cloud Run:
gcloud run deploy --source . - Deploy frontend to Vercel:
vercel --prod - Connect them with environment variables
- Share with your team! π
Deployed URLs:
- Frontend:
https://glean-observability-dashboard.vercel.app - Backend:
https://glean-observability-api-xxx.run.app
- β Real-time Metrics: Auto-refresh with configurable intervals (30-300s)
- β USE Methodology: Utilization, Saturation, Errors for all components
- β Cost Analytics: GKE + Cloud SQL cost estimates and breakdowns
- β Responsive Design: Mobile, tablet, and desktop optimized
- β Time Range Selector: 1h, 6h, 24h, 7d, 30d historical data
- β Component Cards: Expandable details with color-coded health status
- β Interactive UI: Collapsible sections, hover states, loading indicators
- β Professional Theme: Modern design with Tailwind CSS and Lucide icons
GET /api/health- Health checkGET /api/cluster/overview- Cluster metrics (nodes, CPU, memory, pods)GET /api/cluster/cost- Cost estimates (GKE + Cloud SQL)GET /api/components- List all monitored componentsGET /api/components/{name}- Detailed USE metrics for a component
project_id(required) - GCP project IDdeployment_name(optional) - Deployment identifierlookback_hours(optional) - Historical data window (default: 24)
curl "http://localhost:8000/api/cluster/overview?project_id=glean-support-sandbox&deployment_name=support-sandbox"Full API docs: http://localhost:8000/docs (FastAPI auto-generated)
- Use
gcloud auth application-default login(your Google account) - No additional setup needed
- Backend: Service account with IAM roles:
roles/monitoring.viewerroles/container.viewerroles/cloudsql.viewer
- Frontend: Optional OAuth 2.0 or Vercel password protection
- CORS: Configured for Vercel domains
- Rate Limiting: Built into Cloud Run (80 concurrent requests)
See SETUP_GUIDE.md for detailed security configuration.
This dashboard implements Brendan Gregg's USE Method:
-
For every resource, check:
- Utilization: How busy is it?
- Saturation: Is there queuing?
- Errors: Are there failures?
-
Color-coded thresholds:
- π’ Green: Healthy
- π‘ Yellow: Warning
- π΄ Red: Critical
This is an internal Glean tool. For improvements or bug fixes:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Test thoroughly (see TESTING_GUIDE.md)
- Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open a Pull Request
# Make changes to backend
cd backend && source venv/bin/activate
# Edit code, then restart: uvicorn app.main:app --reload --port 8000
# Make changes to frontend
cd frontend
# Edit code - Next.js hot-reloads automatically"Failed to fetch data"
- Check backend is running:
curl http://localhost:8000/api/health - Verify GCP auth:
gcloud auth application-default print-access-token - Check browser console for CORS errors
"Permission denied"
- Run:
gcloud auth application-default login - Ensure you have GCP project access
"No metrics available"
- Verify GKE cluster has Managed Prometheus enabled
- Check project ID is correct
See SETUP_GUIDE.md for detailed troubleshooting.
Internal Glean tool - Not for external distribution.
- USE Method: Based on Brendan Gregg's USE methodology
- Built for: Glean infrastructure monitoring
- Powered by: Google Cloud Platform, Prometheus, FastAPI, Next.js
- Migration: Successfully migrated from Streamlit to Next.js for better UX
- Internal Slack: #glean-observability
- Issues: Open a GitHub issue
- Documentation: See linked guides above
- On-call: Check PagerDuty rotation
After deploying, you should see:
- β Cluster overview with health status
- β Cost estimates for GKE and Cloud SQL
- β 8 component cards with USE metrics
- β Auto-refresh working
- β Time range selection functional
- β All metrics color-coded (green/yellow/red)
Happy monitoring! ππ