Summary
Add an optional multi-query capability that lets the service accept and execute multiple queries in parallel, retrieving and composing results using a composite retriever implemented in a langgraph subgraph. The feature must be toggleable and configurable via the Helm chart (default: disabled), align with the repo’s dependency-injection pattern, and allow administrators to specify how many queries may be served concurrently.
Motivation
- Improve throughput and user experience for workflows that benefit from multiple simultaneous query formulations (e.g., multiple paraphrases, different retrieval heuristics, or multi-prompt strategies).
- Offload per-query orchestration into an explicit langgraph subgraph to keep application code simple and to leverage graph-based composition and observability.
- Provide operator control via Helm so multi-query can be enabled only where cluster resources and DB load can support it.
Goals / Requirements
- Optional feature: multi-query must be disabled by default and enabled via a Helm values flag.
- Langgraph subgraph: place the multiquery orchestration (fan-out, per-query retrieval, merging) inside a dedicated langgraph subgraph (e.g., "multiquery" or "composite-retriever") rather than scattering logic in the API service.
- Configurable number of queries: allow operators to decide the maximum number of queries to run per request (e.g., 1..N).
- Parallel processing: queries are executed in parallel (async / concurrent pool) in the subgraph and API-level endpoints should remain responsive with per-request timeouts.
- Composite retriever: within the subgraph provide a composite retriever node that merges results from each sub-retriever, handles deduplication and ranking, and returns a single candidate set to the downstream pipeline.
- DI compliance: register retriever, multi-query service, and configuration via the project’s DI container (existing admin dependency container referenced in the repo), ensuring unit-testable abstractions and easy overrides.
- Backwards compatible API: existing single-query behavior must be unchanged when the feature is disabled. If the API expands (e.g., accept multiple queries in one call), document the new payload and provide fallbacks.
Design (high level)
-
Langgraph subgraph (recommended name: "multiquery-subgraph")
- Input port: list/array of queries + per-query parameters (optional).
- Fan-out node: instantiates N query nodes (N = configured number, bounded per-request).
- Retriever node(s): each query node invokes the configured retriever (vector DB calls) — can reuse existing retriever implementations via DI.
- Merge node: composite retriever merges results, deduplicates, and sorts by relevance/scoring; optionally re-rank with LLM.
- Output: unified candidate list + metadata (per-query stats, latencies).
-
API & Request model
- New endpoint or optional field on existing query endpoint:
- Option A: existing endpoint accepts "queries": string[] (when missing, fallback to single-query behavior).
- Option B: new endpoint POST /v1/query/multi with payload { queries: string[], options?: {...} }.
- Validate queries count against configured maxQueries.
- When multiquery disabled in Helm, API rejects array payloads (or internally runs only the first query).
-
Helm / configuration (values.yaml additions)
- Add a top-level block, e.g.:
multiQuery:
enabled: false
maxQueries: 3 # maximum queries allowed per request
perRequestTimeoutSec: 10 # timeout applied to the multiquery orchestration
parallelism: 3 # how many queries to run in parallel (<= maxQueries)
langgraph:
subgraphName: "multiquery-subgraph"
image: "" # optional override for a dedicated subgraph image
retriever:
topK_per_query: 10
- Ensure values are documented in chart README and defaults keep the feature off.
-
DI / implementation strategy
- Define interfaces (abstractions) for:
- IMultiQueryService: orchestrates queries and invokes the langgraph subgraph.
- ICompositeRetriever: merging/dedup behavior.
- Update existing dependency container (admin dependency container referenced in repository PRs/issues) to optionally register MultiQueryService and CompositeRetriever when multiQuery.enabled is true.
- Keep concrete implementations behind interfaces to respect existing DI patterns and allow for test doubles in unit tests.
-
Concurrency & safety concerns
- Use async/await + asyncio.gather or a bounded semaphore to limit per-request concurrency.
- Use per-query timeouts and global orchestration timeout to avoid resource exhaustion.
- Rate-limit/queue if necessary at ingress or via Helm-configured concurrency throttles.
- Ensure vector DB (e.g., Qdrant) client concurrency is safe — reuse pooled connections rather than creating many clients.
Implementation steps (suggested)
- Add Helm chart values (values.yaml + README) with defaults to disable multi-query.
- Create the langgraph subgraph definition (subgraph files, nodes) for multiquery orchestration and composite retriever logic.
- Add server-side wiring:
- New service implementation that calls the langgraph subgraph or, if lightweight, orchestrates call to retrievers and uses CompositeRetriever implementation.
- Register the service and retriever in the project DI container only when enabled.
- Add API changes:
- Accept list of queries and validate count.
- Preserve existing single-query semantics if "queries" not provided or feature disabled.
- Implement unit tests:
- MultiQueryService unit tests with mocks for retriever and langgraph client.
- Composite retriever merging/dedup tests.
- Add integration tests:
- Local Docker Compose or k3d+tilt test that runs langgraph subgraph, vector DB, and API to validate end-to-end flow.
- Document the feature in README and Helm chart values document.
Acceptance criteria
- Helm values contain a clear toggle and configuration for multi-query (enabled: false by default).
- Langgraph subgraph implementing parallel query orchestration and composite retriever exists in repo (or clearly documented if external).
- API accepts multiple queries when feature is enabled (or provides new endpoint), with validation and a configured maximum.
- The multi-query behavior processes queries in parallel and returns a single merged ranked list.
- Implementation registers services via the repository’s DI container and keeps retriever abstractions testable.
- Unit and integration tests exist demonstrating basic parallel multi-query execution and merging.
- Documentation and example values.yaml are included.
Edge cases and notes
- Decide whether each "query" should use identical retriever config or allow per-query retriever options.
- Consider dedupe logic for identical/doc-overlap results and how to surface per-query provenance in the response (optional).
- Add observability: per-query latencies, counts, and errors.
Additional notes for implementation
- For the rephrasing step, leverage the existing rephrasing chain with its prompt template. For each incoming request, use both the original question and its rephrased form(s) as queries to the multi-query system. This ensures the retrieval pipeline benefits from rephrasings while also including the original phrasing.
Suggested labels
Summary
Add an optional multi-query capability that lets the service accept and execute multiple queries in parallel, retrieving and composing results using a composite retriever implemented in a langgraph subgraph. The feature must be toggleable and configurable via the Helm chart (default: disabled), align with the repo’s dependency-injection pattern, and allow administrators to specify how many queries may be served concurrently.
Motivation
Goals / Requirements
Design (high level)
Langgraph subgraph (recommended name: "multiquery-subgraph")
API & Request model
Helm / configuration (values.yaml additions)
DI / implementation strategy
Concurrency & safety concerns
Implementation steps (suggested)
Acceptance criteria
Edge cases and notes
Additional notes for implementation
Suggested labels