feat: Automatic Cluster Count Selection (Elbow Method / Silhouette Score)#23
Merged
Conversation
…beddings functions Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Tests Report 📄Tests Succees ✅JUnit Details
Coverage Details (98% >= 90%) ✅Diff Cover Details
|
||||||||||||||||||||||||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #9
Adds a "Suggest" button next to the cluster count slider that automatically analyzes the optimal number of clusters using the elbow method (inertia curve) and silhouette score analysis across k=2..30.
Changes
Backend
scatter_plot.py: Addedsuggest_optimal_clusters()andload_chromadb_embeddings()functions. Runs KMeans for each k, computes inertia and silhouette scores, returns the k with the highest silhouette score. Supportson_progresscallback for real-time progress reporting. Samples up to 5000 points for large datasets.server/routes/plot.py: Added asyncPOST /api/plot/suggest-clustersendpoint (returnsjob_idimmediately) andGET /api/plot/suggest-clusters/{job_id}polling endpoint with progress phases (loading_embeddings,analyzing k=N/M). Reuses existingTaskRegistrypattern.server/models.py: AddedSuggestClustersRequest,SuggestClustersResponse, andSuggestClustersStatusResponsePydantic models.Frontend
ClusterSuggestion.tsx: New component with "Suggest" button, SVG dual-axis chart (inertia line + silhouette bars), progress indicator with phase messages, and "Apply" button to set the slider value.PlotControls.tsx: IntegratedClusterSuggestioncomponent next to the cluster count slider.types/index.tsandapi/plot.ts: Added TypeScript types and API functions for the async polling pattern.Docs
README.md: Added cluster suggestion feature to the Features list and Plot page description.Testing
suggest_optimal_clusters()(progress callback, edge cases, monotonic inertia, silhouette range)