-
Notifications
You must be signed in to change notification settings - Fork 6
ADR to introduce Builder and Strategy patterns to OpenSearch query construction #941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
JPrevost
wants to merge
1
commit into
main
Choose a base branch
from
adr-opensearch-refactor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
155 changes: 155 additions & 0 deletions
155
...s/0005-refactor-opensearch-query-building-with-builder-and-strategy-patterns.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,155 @@ | ||
| # 5. Refactor OpenSearch query building with Builder and Strategy patterns | ||
|
|
||
| Date: 2026-02-02 | ||
|
|
||
| ## Status | ||
|
|
||
| Accepted | ||
|
|
||
| ## Context | ||
|
|
||
| The `Opensearch` model (`app/models/opensearch.rb`) has grown into a large, monolithic class (≈360+ lines). | ||
|
|
||
| The model: | ||
|
|
||
| - Builds the entire OpenSearch request body (query, aggregations, sort, highlight) in one place. | ||
| - Mixes multiple concerns: | ||
| - Lexical query construction (multi_match, single-field matches, nested matches). | ||
| - Geographic constraints (geodistance, bounding box). | ||
| - Aggregation filters (contributors, formats, languages, access rights, etc.). | ||
| - Top-level request assembly and client invocation. | ||
| - Is already difficult to understand and extend safely. We know we will soon add **semantic** and **hybrid** (lexical + semantic) search behaviors, which will further increase complexity. | ||
|
|
||
| Current state: | ||
|
|
||
| - The GraphQL `search` field (`app/graphql/types/query_type.rb`) is the main entry point and calls `Opensearch.new.search(...)`. | ||
| - All lexical and geo query construction is hardcoded inside `Opensearch#query`, `#matches`, `#multisearch`, and various helper methods. | ||
| - Filter logic (`filters`, `filter_field_by_value`, etc.) is also embedded in the same class. | ||
|
|
||
| We want to: | ||
|
|
||
| - Make the query-building logic easier to understand and test in isolation. | ||
| - Prepare for additional query modes (semantic, hybrid). | ||
| - Keep the **external API and behavior unchanged** for now (same GraphQL schema, same OpenSearch request shape). | ||
|
|
||
| ## Decision | ||
|
|
||
| We will refactor the `Opensearch` model to introduce: | ||
|
|
||
| 1. **A query strategy abstraction (Strategy pattern)** | ||
| - Define a simple query-strategy contract, e.g. a module `Opensearch::QueryStrategy` with a single method: | ||
| - `build(params, fulltext) # => Hash` | ||
| - `Opensearch.search` will: | ||
| - Select a query strategy (for now, always the lexical strategy). | ||
| - Delegate query construction to `strategy.build(params, fulltext)` inside `build_query`. | ||
| - Future query modes (semantic, hybrid) will be modeled as additional strategies that implement the same interface. | ||
| - NOTE: **we could choose to defer implementation of the Strategy abstraction** and focus solely on the Builders at this time. It is relatively simple, this is not a requirement to gain the core benefits of this refactor. | ||
|
|
||
| 2. **A lexical query builder as the first strategy (Builder + Strategy)** | ||
| - Create `Opensearch::LexicalQueryBuilder` (e.g. `app/models/opensearch/lexical_query_builder.rb`), which: | ||
| - Implements the `QueryStrategy` interface. | ||
| - Encapsulates the current “lexical” query behavior: | ||
| - bool query structure (must/should/filter). | ||
| - `multisearch` (prefix/term boosts for title, contributors, etc.). | ||
| - `matches` for main `q`. | ||
| - Single-field and nested matches (citation, title, contributors, subjects, etc.). | ||
| - Geographic clauses (geodistance, bounding box) as part of the overall bool query. | ||
| - Uses `FilterBuilder` (see next item) for the filter portion of the bool query. | ||
| - **`Opensearch` becomes an orchestrator**: | ||
| - It builds the top-level request: `from`, `size`, `query` (via strategy), `aggregations`, `sort`, and optional `highlight`. | ||
| - It no longer encodes the detailed lexical/geo/field logic directly. | ||
|
|
||
| 3. **A dedicated filter builder (Builder pattern)** | ||
| - Create `Opensearch::FilterBuilder` (e.g. `app/models/opensearch/filter_builder.rb`). | ||
| - Move all aggregation/filter-related responsibilities out of `Opensearch`: | ||
| - `filters` | ||
| - `filter_field_by_value` | ||
| - `filter_sources` / `source_array` | ||
| - `filter_access_to_files` / `access_to_files_array` | ||
| - `FilterBuilder` will: | ||
| - Accept the search params. | ||
| - Return the array of filter clauses used in the OpenSearch query. | ||
| - This builder is anticipated to be reusable in future strategies which is why it is pulled out of `LexicalQueryBuilder`. | ||
|
|
||
| 4. **A dedicated sort builder** | ||
| - While sort is currently extremely simple, we'll take this opportunity to move it out of OpenSearch to simplify the OpenSearch class. | ||
| - This will prepare us for additional sort algorithms, but mostly this move is to simplify the main OpenSearch class. | ||
|
|
||
| 5. **No external API or behavior changes** | ||
| - The signature of `Opensearch#search` remains the same. | ||
| - The GraphQL schema and `construct_query` behavior remain unchanged. | ||
| - The shape of the OpenSearch request (including query, filters, aggregations, sort, highlight) is preserved. | ||
| - Existing tests (including VCR-based tests) should continue to pass unchanged. | ||
|
|
||
| ## Consequences | ||
|
|
||
| ### Positive | ||
|
|
||
| - **Improved maintainability and readability** | ||
| - `Opensearch` is smaller and focused on orchestration and top-level request assembly. | ||
| - Lexical query details and filter details are encapsulated in dedicated classes. | ||
|
|
||
| - **Better testability** | ||
| - `FilterBuilder` and `LexicalQueryBuilder` can be unit-tested in isolation: | ||
| - Given specific params, they produce well-defined hash structures. | ||
| - We can more easily assert on the shape of queries and filters without coupling to client calls. | ||
|
|
||
| - **Clear extension point for future features** | ||
| - Semantic and hybrid search can be added later by: | ||
| - Implementing new query strategies (e.g. `SemanticQueryStrategy`, `HybridQueryStrategy`) that conform to the same `build(params, fulltext)` interface. | ||
| - Reusing `FilterBuilder` for aggregation filters and geo constraints. | ||
| - Adding a small “strategy selection” step in `Opensearch` (e.g. based on a `search_mode` param). | ||
| - This prepares us to focus solely on the semantic/hybrid work when we get to that rather than trying to introduce a refactor with a new feature. | ||
|
|
||
| - **Easier incremental refactors** | ||
| - Additional internal refactoring (e.g. introducing a `SearchParams` value object or specialized helpers for geo) can be done within builders/strategies without touching the external API. | ||
|
|
||
| ### Negative / Trade-offs | ||
|
|
||
| - **More classes and indirection** | ||
| - Adding `FilterBuilder`, `QueryStrategy`, and `LexicalQueryBuilder` increases the number of moving parts. | ||
| - Developers must follow the new abstractions to trace how a query is built. | ||
|
|
||
| - **Short-term refactor cost** | ||
| - Code must be carefully moved to avoid breaking existing behavior and VCR tests. | ||
| - There is some overhead in introducing new tests for `FilterBuilder` and `LexicalQueryBuilder`. | ||
|
|
||
| ### Out of scope | ||
|
|
||
| - **Semantic and hybrid search are not implemented in this decision** | ||
| - No k-NN/vector fields, Neural Search configuration, or hybrid score combination are added here. | ||
| - No new GraphQL arguments (e.g. `searchMode`) are introduced at this time. | ||
| - Index mappings, ingest pipelines, and model selection remain unchanged. | ||
|
|
||
| These will be covered by future ADR(s) once we are ready to add semantic and hybrid query capabilities on top of the refactored structure. | ||
|
|
||
| ## Notes | ||
|
|
||
| A simplified view of the target architecture (including future state of Semantic/Hybrid builders): | ||
|
|
||
| ```mermaid | ||
| flowchart LR | ||
| subgraph api [API layer] | ||
| GraphQL[query_type search] | ||
| end | ||
| subgraph opensearch [Opensearch#search] | ||
| build[build_query] | ||
| strategy_sel[Strategy selection] | ||
| end | ||
| subgraph strategies [Query strategies] | ||
| QueryStrategy[QueryStrategy interface] | ||
| Lexical[LexicalQueryBuilder] | ||
| Filters[FilterBuilder] | ||
| Semantic[SementicQueryBuilder] | ||
| Hybrid[HybridQueryBuilder] | ||
| end | ||
| GraphQL --> build | ||
| build --> strategy_sel | ||
| strategy_sel --> QueryStrategy | ||
| QueryStrategy -->|implemented by| Lexical | ||
| QueryStrategy -->|implemented by| Semantic | ||
| QueryStrategy -->|implemented by| Hybrid | ||
| Lexical --> Filters | ||
| build --> Aggregations | ||
| build --> Sort | ||
| ``` | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, possibly, move to stubs/mocks...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I think I'd like this to work as-is with no test changes to prove it works. If we then move to mocks/stubs I'm open to it... but keeping tests stable during a refactor would be ideal if we can swing it.