Problem
Article visualization colors cluster into only 2-3 color variants (purple/red and green/orange) instead of spreading across the full color spectrum.
Current Implementation
The compute_content_hue() function in embeddings/embeddings.py derives hue by:
- Computing the mean embedding vector for the article
- Hashing it with SHA-256
- Using
hash_int % 360 to get a hue
This produces deterministic but poorly distributed colors when articles are semantically similar.
Proposed Solution
Use UMAP to reduce the mean embedding to 1D, then map that value to hue (0-360). This approach:
- Spreads articles along their primary semantic axis
- Similar articles get nearby (but distinct) colors
- Very different articles get contrasting colors
- Scales well as more articles are added
Implementation
Modify compute_content_hue() to:
- Use UMAP with
n_components=1 on the mean embedding
- Normalize the resulting value to 0-360 range
- Return as hue
This may require processing all articles together for optimal spread, or using a pre-fitted UMAP model.
Problem
Article visualization colors cluster into only 2-3 color variants (purple/red and green/orange) instead of spreading across the full color spectrum.
Current Implementation
The
compute_content_hue()function inembeddings/embeddings.pyderives hue by:hash_int % 360to get a hueThis produces deterministic but poorly distributed colors when articles are semantically similar.
Proposed Solution
Use UMAP to reduce the mean embedding to 1D, then map that value to hue (0-360). This approach:
Implementation
Modify
compute_content_hue()to:n_components=1on the mean embeddingThis may require processing all articles together for optimal spread, or using a pre-fitted UMAP model.