Skip to content

Caching of frequent user queries for immediate response #3

@beingamanforever

Description

@beingamanforever

Design a subsystem that continuously identifies and groups the top 20 most frequent queries—merging near‑duplicates into representative clusters—and caches their full results. By serving these hot queries directly from cache, we eliminate redundant latency overhead. This cache layer should be lightweight, fault‑tolerant, and fully compatible with the existing RAG pipeline, automatically refreshing its entries as query patterns evolve and renew after a fixed period of time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions