Skip to content

feat: Automatic Cluster Count Selection (Elbow Method / Silhouette Score)#23

Merged
aGallea merged 9 commits into
masterfrom
auto-cluster-suggestion
Feb 26, 2026
Merged

feat: Automatic Cluster Count Selection (Elbow Method / Silhouette Score)#23
aGallea merged 9 commits into
masterfrom
auto-cluster-suggestion

Conversation

@aGallea

@aGallea aGallea commented Feb 26, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #9

Adds a "Suggest" button next to the cluster count slider that automatically analyzes the optimal number of clusters using the elbow method (inertia curve) and silhouette score analysis across k=2..30.

Changes

Backend

  • scatter_plot.py: Added suggest_optimal_clusters() and load_chromadb_embeddings() functions. Runs KMeans for each k, computes inertia and silhouette scores, returns the k with the highest silhouette score. Supports on_progress callback for real-time progress reporting. Samples up to 5000 points for large datasets.
  • server/routes/plot.py: Added async POST /api/plot/suggest-clusters endpoint (returns job_id immediately) and GET /api/plot/suggest-clusters/{job_id} polling endpoint with progress phases (loading_embeddings, analyzing k=N/M). Reuses existing TaskRegistry pattern.
  • server/models.py: Added SuggestClustersRequest, SuggestClustersResponse, and SuggestClustersStatusResponse Pydantic models.

Frontend

  • ClusterSuggestion.tsx: New component with "Suggest" button, SVG dual-axis chart (inertia line + silhouette bars), progress indicator with phase messages, and "Apply" button to set the slider value.
  • PlotControls.tsx: Integrated ClusterSuggestion component next to the cluster count slider.
  • types/index.ts and api/plot.ts: Added TypeScript types and API functions for the async polling pattern.

Docs

  • README.md: Added cluster suggestion feature to the Features list and Plot page description.

Testing

  • 195 tests passing, 98.38% coverage
  • New tests for suggest_optimal_clusters() (progress callback, edge cases, monotonic inertia, silhouette range)
  • New tests for async endpoint (job creation, polling, completion, failure, custom k range)
  • All lint (ruff), format, type checking (mypy), and frontend build clean

@github-actions

Copy link
Copy Markdown

Tests Report 📄

Tests Succees ✅

JUnit Details

Total Tests Failures Errors Skipped Time ⏳
195 0 0 0 24.58s

Coverage Details (98% >= 90%) ✅

Diff Cover Details
FileCovered LinesMissing Lines
embedding_cluster/scatter_plot.py75/7699%109
embedding_cluster/server/models.py25/25100%
embedding_cluster/server/routes/plot.py78/8098%86,123
Total178/18198%

@aGallea aGallea merged commit 0f6f4ad into master Feb 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Automatic Cluster Count Selection (Elbow Method / Silhouette Score)

1 participant