Skip to content

askscio/glean-observability-dashboard

Repository files navigation

πŸ“Š Glean Observability Dashboard

Production-ready observability platform for Glean deployments with USE methodology monitoring and comprehensive cost analytics.

Deploy to Vercel Deploy to Cloud Run


🎯 Quick Links


πŸš€ Tech Stack

  • Frontend: Next.js 15 (React) + TypeScript + Tailwind CSS + Recharts
  • Backend: FastAPI (Python 3.13) + Google Cloud APIs
  • Monitoring: Prometheus (GKE Managed) + Cloud Monitoring API
  • Deployment: Vercel (Frontend) + Cloud Run (Backend)
  • Cost: ~$5-20/month for typical usage

✨ Features

USE Methodology Monitoring

  • Utilization: CPU %, Memory %, Pod Availability
  • Saturation: CPU Throttling, Queue Backlogs
  • Errors: Pod Restarts, Crash Loops

Cost Analytics

  • Real-time GKE cluster cost estimates
  • Cloud SQL instance costs
  • Total infrastructure spend
  • Daily/Monthly projections

Components Monitored

  • πŸ” Crawler (Content Ingestion)
  • πŸ“‘ Datasource Events Handler
  • 🧠 Query Parser (NLP/ML)
  • ⚑ Query Engine
  • 🧬 Semantic Index (Qdrant Vector DB)
  • πŸ”Ž Keyword Index (Cloud SQL)
  • πŸ‘₯ User Data Layer
  • 🌐 Load Balancer

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        Next.js Frontend             β”‚
β”‚     (TypeScript + Tailwind)         β”‚
β”‚                                     β”‚
β”‚  - Dashboard UI                     β”‚
β”‚  - USE Metric Visualization         β”‚
β”‚  - Cost Charts                      β”‚
β”‚  - Component Health Cards           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚ REST API
               ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚       FastAPI Backend               β”‚
β”‚          (Python)                   β”‚
β”‚                                     β”‚
β”‚  - Prometheus Client                β”‚
β”‚  - GCP Metrics Client               β”‚
β”‚  - Cost Estimator                   β”‚
β”‚  - Config Management                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         GCP Services                β”‚
β”‚                                     β”‚
β”‚  - GKE Managed Prometheus           β”‚
β”‚  - Cloud Monitoring API             β”‚
β”‚  - Cloud SQL                        β”‚
β”‚  - Cloud Billing API                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

glean-observability-dashboard/
β”œβ”€β”€ frontend/              # Next.js application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ app/          # App router pages
β”‚   β”‚   β”œβ”€β”€ components/   # React components
β”‚   β”‚   β”œβ”€β”€ lib/          # Utilities
β”‚   β”‚   └── types/        # TypeScript types
β”‚   └── package.json
β”‚
β”œβ”€β”€ backend/               # FastAPI application
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/          # API endpoints
β”‚   β”‚   β”œβ”€β”€ clients/      # GCP/Prometheus clients
β”‚   β”‚   β”œβ”€β”€ models/       # Data models
β”‚   β”‚   └── main.py       # FastAPI app
β”‚   └── requirements.txt
β”‚
└── README.md

πŸš€ Quick Start

For teammates: See SETUP_GUIDE.md for complete setup instructions with GCP authentication.

For deployment: See DEPLOYMENT.md for production deployment in 10 minutes.

Fastest Local Setup (3 commands)

# 1. Authenticate with GCP
gcloud auth application-default login

# 2. Start backend (Terminal 1)
cd backend && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt && uvicorn app.main:app --reload --port 8000

# 3. Start frontend (Terminal 2)
cd frontend && npm install && npm run dev

Open http://localhost:3000 and you're ready! πŸŽ‰


🌐 Production Deployment

Option 1: Automated (Recommended)

Click the deploy buttons at the top of this README for one-click deployment to Vercel and Cloud Run.

Option 2: Manual (Full Control)

See DEPLOYMENT.md for step-by-step instructions.

Summary:

  1. Deploy backend to Cloud Run: gcloud run deploy --source .
  2. Deploy frontend to Vercel: vercel --prod
  3. Connect them with environment variables
  4. Share with your team! 🎊

Deployed URLs:

  • Frontend: https://glean-observability-dashboard.vercel.app
  • Backend: https://glean-observability-api-xxx.run.app

🎨 UI Features

  • βœ… Real-time Metrics: Auto-refresh with configurable intervals (30-300s)
  • βœ… USE Methodology: Utilization, Saturation, Errors for all components
  • βœ… Cost Analytics: GKE + Cloud SQL cost estimates and breakdowns
  • βœ… Responsive Design: Mobile, tablet, and desktop optimized
  • βœ… Time Range Selector: 1h, 6h, 24h, 7d, 30d historical data
  • βœ… Component Cards: Expandable details with color-coded health status
  • βœ… Interactive UI: Collapsible sections, hover states, loading indicators
  • βœ… Professional Theme: Modern design with Tailwind CSS and Lucide icons

πŸ“Š API Endpoints

Core Endpoints

  • GET /api/health - Health check
  • GET /api/cluster/overview - Cluster metrics (nodes, CPU, memory, pods)
  • GET /api/cluster/cost - Cost estimates (GKE + Cloud SQL)
  • GET /api/components - List all monitored components
  • GET /api/components/{name} - Detailed USE metrics for a component

Query Parameters

  • project_id (required) - GCP project ID
  • deployment_name (optional) - Deployment identifier
  • lookback_hours (optional) - Historical data window (default: 24)

Example Request

curl "http://localhost:8000/api/cluster/overview?project_id=glean-support-sandbox&deployment_name=support-sandbox"

Full API docs: http://localhost:8000/docs (FastAPI auto-generated)


πŸ”’ Security & Authentication

For Local Development

  • Use gcloud auth application-default login (your Google account)
  • No additional setup needed

For Production

  • Backend: Service account with IAM roles:
    • roles/monitoring.viewer
    • roles/container.viewer
    • roles/cloudsql.viewer
  • Frontend: Optional OAuth 2.0 or Vercel password protection
  • CORS: Configured for Vercel domains
  • Rate Limiting: Built into Cloud Run (80 concurrent requests)

See SETUP_GUIDE.md for detailed security configuration.


πŸ“ˆ Monitoring Best Practices

This dashboard implements Brendan Gregg's USE Method:

  1. For every resource, check:

    • Utilization: How busy is it?
    • Saturation: Is there queuing?
    • Errors: Are there failures?
  2. Color-coded thresholds:

    • 🟒 Green: Healthy
    • 🟑 Yellow: Warning
    • πŸ”΄ Red: Critical

🀝 Contributing

This is an internal Glean tool. For improvements or bug fixes:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes
  4. Test thoroughly (see TESTING_GUIDE.md)
  5. Commit: git commit -m 'Add amazing feature'
  6. Push: git push origin feature/amazing-feature
  7. Open a Pull Request

Development Workflow

# Make changes to backend
cd backend && source venv/bin/activate
# Edit code, then restart: uvicorn app.main:app --reload --port 8000

# Make changes to frontend
cd frontend
# Edit code - Next.js hot-reloads automatically

πŸ› Troubleshooting

Common Issues

"Failed to fetch data"

  • Check backend is running: curl http://localhost:8000/api/health
  • Verify GCP auth: gcloud auth application-default print-access-token
  • Check browser console for CORS errors

"Permission denied"

  • Run: gcloud auth application-default login
  • Ensure you have GCP project access

"No metrics available"

  • Verify GKE cluster has Managed Prometheus enabled
  • Check project ID is correct

See SETUP_GUIDE.md for detailed troubleshooting.


πŸ“ License

Internal Glean tool - Not for external distribution.


πŸ™ Acknowledgments

  • USE Method: Based on Brendan Gregg's USE methodology
  • Built for: Glean infrastructure monitoring
  • Powered by: Google Cloud Platform, Prometheus, FastAPI, Next.js
  • Migration: Successfully migrated from Streamlit to Next.js for better UX

πŸ“ž Support

  • Internal Slack: #glean-observability
  • Issues: Open a GitHub issue
  • Documentation: See linked guides above
  • On-call: Check PagerDuty rotation

πŸŽ‰ Success Metrics

After deploying, you should see:

  • βœ… Cluster overview with health status
  • βœ… Cost estimates for GKE and Cloud SQL
  • βœ… 8 component cards with USE metrics
  • βœ… Auto-refresh working
  • βœ… Time range selection functional
  • βœ… All metrics color-coded (green/yellow/red)

Happy monitoring! πŸ“ŠπŸš€

About

Production-ready observability dashboard for Glean deployments with USE methodology monitoring and cost analytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors