Design a subsystem that continuously identifies and groups the top 20 most frequent queries—merging near‑duplicates into representative clusters—and caches their full results. By serving these hot queries directly from cache, we eliminate redundant latency overhead. This cache layer should be lightweight, fault‑tolerant, and fully compatible with the existing RAG pipeline, automatically refreshing its entries as query patterns evolve and renew after a fixed period of time.