-
Notifications
You must be signed in to change notification settings - Fork 182
[DOC-9267] Guidance for Search Service SIzing #4078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/8.0
Are you sure you want to change the base?
Changes from all commits
0dd1df8
14ba599
5ba9d74
88b8567
2fc6a0a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -534,6 +534,280 @@ NOTE: The storage engine used in the sizing calculation corresponds to the stora | |
| | Nitro | ||
| |=== | ||
|
|
||
| == Sizing Search Service Nodes | ||
|
|
||
| Search Service nodes manage Search indexes and serve your Search queries. | ||
|
|
||
| Basic Search indexes are lists of all the unique terms that appear in the documents on your cluster. | ||
| For each term, the Search index also contains a list of the documents where that term appears. | ||
| These lists inside a Search index can cause the Search index to be larger than your original dataset. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a possibility, it depends very much on the data though .. in many a situation it could be lesser too because we de-dup terms/prefixes/sub-strings/suffixes in the index and the inverted-index/postings-lists are compressed bitmaps alongside a map for doc keys. Will defer to your judgement on how to frame all of this :) |
||
|
|
||
| Specific options in your Search index configuration can also increase its size, such as *Store*, *Include in _all field*, and *Include Term Vectors*. | ||
| For more information about what options can increase index size and storage requirements, see xref:search:child-field-options-reference.adoc[]. | ||
|
|
||
| In general, when sizing nodes for a deployment that uses the Search Service, you need to determine the number of vCPUs and the amount of RAM that will support your workload. | ||
|
|
||
| === Calculating Node Requirements | ||
|
|
||
| To size the Search Service nodes in your cluster, you need the following information: | ||
|
|
||
| * The number of documents you need to include in your Search index or indexes. | ||
| * The average size of the documents that need to be included in your Search index, in KB. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not accurate enough - it's not the average size of the docs that we need, but the number of the fields and their sizes that'll drive footprint. |
||
| * A sample document or documents that show the structure of your data. | ||
| * The specific queries per second (QPS) target you need from the Search Service. | ||
|
|
||
| You should also consider your replication and recovery needs. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. recovery/high-availability* needs |
||
|
|
||
| With all this information, you can work with Couchbase Support to get the most accurate sizing for your Search workload. | ||
|
|
||
| If you want to try sizing your cluster yourself, you can use some of the following guidelines to size your <<search-vcpus,>> and <<search-ram,>>, using averages and estimates from other Search deployments. | ||
|
|
||
| For the best results with Search node sizing, contact Couchbase Support. | ||
|
|
||
| [#search-vcpus] | ||
| ==== vCPUS | ||
|
|
||
| A heavy QPS workload requires more vCPUs. | ||
| If your workload requires a high QPS, this is the most important part of your sizing for the Search Service. | ||
|
|
||
| For example, if your target QPS is 30,000 and your queries are less complex, divide your total QPS target by 200 to get your required vCPUs: | ||
|
|
||
| [stem] | ||
| ++++ | ||
| 30,0000_{\mathrm{QPS}} \div 200_{\mathrm{Mid}} = 150_{\mathrm{vCPUs}} | ||
| ++++ | ||
|
|
||
| The formula gives a target of 150 vCPUs for a mid range workload with a less complex query. | ||
|
|
||
| If your queries were more complex, but the QPS target was the same, the calculation changes to use a value of 150 and a result of 200 vCPUs: | ||
|
|
||
| [stem] | ||
| ++++ | ||
| 30,0000_{\mathrm{QPS}} \div 150_{\mathrm{Low}} = 200_{\mathrm{vCPUs}} | ||
| ++++ | ||
|
|
||
| You can then divide your result by the vCPU configuration you want to use to calculate the number of nodes you need: | ||
|
|
||
| [stem] | ||
| ++++ | ||
| \lceil 150_{\mathrm{vCPUs}} \div 32_{\mathrm{vCPUs Per Node}} \rceil = 5_{\mathrm{Nodes}} | ||
| ++++ | ||
|
|
||
| Based on the formula, if you wanted to use nodes with 32 vCPUs and reach a target QPS of 30,000 with less complex queries, you would need 5 nodes in your deployment. | ||
|
|
||
| [#search-ram] | ||
| ==== RAM | ||
|
|
||
| In general, you should allocate 65% of the RAM on a node in your cluster where you want to run the Search Service. | ||
| A Search node needs more RAM if you: | ||
|
|
||
| * Are xref:search:child-field-options-reference.adoc#store[storing field values] or xref:search:child-field-options-reference.adoc#doc-values[using doc values]. | ||
| * Have xref:search:customize-index.adoc#analyzers[analyzed text fields]. | ||
| * Want to use more complex queries than xref:search:search-request-params.adoc#analytic-queries[keyword matches]. | ||
|
|
||
| To calculate a more precise estimate for the required RAM for the Search Service, you need to: | ||
|
|
||
| . <<index-bytes,>> | ||
| . <<index-gb,>> | ||
| . <<add-replicas,>> | ||
| . <<total-ram,>> | ||
|
|
||
| [#index-bytes] | ||
| ===== Calculate Your Per Doc Index Bytes | ||
|
|
||
| Use the following formula first to calculate the number of bytes per document in your Search index: | ||
|
|
||
| [latexmath] | ||
| ++++ | ||
| \begin{equation} | ||
| \begin{split} | ||
| \text{Per Doc Index Bytes} = ( ( W \cdot 1024 \cdot \text{f_text} \cdot \text{m_text} ) + ( W \cdot 1024 \cdot \text{f_kw} \cdot \text{m_kw} ) + B ) \times (1 + D) | ||
| \end{split} | ||
| \end{equation} | ||
| ++++ | ||
|
|
||
| You need to know the following variables for the formula: | ||
|
|
||
| [cols="1,2"] | ||
| |==== | ||
| |Variable |Description | ||
|
|
||
| | stem:[W] | ||
| | The average size of your JSON documents, in KB. | ||
|
|
||
| | stem:[{\text{f_text}}] | ||
| a| A measure of the analyzed text from your JSON documents. | ||
|
|
||
| You can omit this value if you're using primarily keyword searches and do not have longer-form text fields that require an xref:search:customize-index.adoc#analyzers[analyzer]. | ||
|
|
||
| You can use the following value ranges based on the kind of analyzed text you have in your index: | ||
|
|
||
| * *Product descriptions, titles and body snippets, support ticket descriptions*: `0.10-0.20` | ||
| * *Long note fields, email bodies, articles, knowledge-base content*: `0.20-0.40` | ||
| * *Log files, message streams, event payloads with large message fields*: `0.40-0.70` | ||
|
|
||
| If you're not sure about the size and complexity of the text fields in your documents and how they match to the example ranges, use a value of `0.25` to get a rough estimate. | ||
|
|
||
| To get the most accurate values for stem:[{\text{f_text}}] and your RAM sizing calculations, contact Couchbase Support. | ||
|
|
||
| | stem:[{\text{m_text}}] | ||
| a| A multiplier for calculating how the bytes in your documents translate into your Search index for analyzed text fields. | ||
|
|
||
| For a good planning range, try a value between `0.12-0.35`, increasing based on the complexity of your analyzed text fields. | ||
|
|
||
| To get the most accurate values for stem:[{\text{m_text}}] and your RAM sizing calculations, contact Couchbase Support. | ||
|
|
||
| | stem:[{\text{f_kw}}] | ||
| a| A measure of the keywords from your JSON documents. | ||
|
|
||
| For a good planning range for a keyword search use case or a filter-heavy workload, use a value of `0.10`. | ||
|
|
||
| To get the most accurate values for stem:[{\text{f_kw}}] and your RAM sizing calculations, contact Couchbase Support. | ||
|
|
||
| | stem:[{\text{m_kw}}] | ||
| a| A multiplier for calculating how the bytes in your documents translate into your Search index for keywords. | ||
|
|
||
| For a good planning range, try a value between `0.10-0.18`. | ||
|
|
||
| To get the most accurate values for stem:[{\text{m_kw}}] and your RAM sizing calculations, contact Couchbase Support. | ||
|
|
||
| | stem:[B] | ||
| a| The number of bytes needed for storing field values for your documents, if xref:search:child-field-options-reference.adoc#store[store] is enabled for a child field mapping. | ||
|
|
||
| If you're not storing any field values in your Search index, set this value to `0`. | ||
|
|
||
| | stem:[D] | ||
| a| The additional overhead from adding xref:search:child-field-options-reference.adoc#doc-values[doc values] to your Search index from a child field mapping. | ||
|
|
||
| Use a value from `0-1`. | ||
| If you're not using doc values in your Search index, set this value to `0`. | ||
| |==== | ||
|
|
||
| [#index-gb] | ||
| ===== Calculate Your Total Index GB | ||
|
|
||
| After you have calculated your stem:[{\text{Per Doc Index Bytes}}], calculate the total GB needed for your Search index, where: | ||
|
|
||
| * stem:[N] is the total number of JSON documents you want to include in your Search index. | ||
| * stem:[S] is a measure of your system overhead. | ||
| For a rough estimate, use a value of `0.10`. | ||
|
|
||
| Use the following formula: | ||
|
|
||
| [latexmath] | ||
| ++++ | ||
| \begin{equation} | ||
| \begin{split} | ||
| \text{Total Index GB} = | ||
| \frac{(N \times \text{Per Doc Index Bytes})}{10^{9}} \times (1 + S) | ||
| \end{split} | ||
| \end{equation} | ||
| ++++ | ||
|
|
||
| [#add-replicas] | ||
| ===== Add Your Replication Factor | ||
|
|
||
| If you want to add replicas to your Search index, you need to factor that into your stem:[{\text{Total Index GB}}]. | ||
|
|
||
| Use the following formula: | ||
|
|
||
| [latexmath] | ||
| ++++ | ||
| \begin{equation} | ||
| \begin{split} | ||
| \text{Total Index GB With Replicas} = \text{Total Index GB} \times (\text{Number Of Replicas} + 1) | ||
| \end{split} | ||
| \end{equation} | ||
| ++++ | ||
|
|
||
| [#total-ram] | ||
| ===== Calculate Your Total Required RAM | ||
|
|
||
| Then, you can calculate the total RAM required on a node for your use case with the following formula: | ||
|
|
||
| [latexmath] | ||
| ++++ | ||
| \begin{equation} | ||
| \begin{split} | ||
| \text{Total Node RAM} = \text{Total Index GB With Replicas} \times 0.65 | ||
| \end{split} | ||
| \end{equation} | ||
| ++++ | ||
|
|
||
| [#search-examples] | ||
| === Search Node Sizing Examples | ||
|
|
||
| You'll get the most accurate results by going through sizing with Couchbase Support, but you can use the following examples for a sizing estimate for a Search workload: | ||
|
|
||
| * <<high-qps,>> | ||
| * <<low-qps,>> | ||
|
|
||
| [#high-qps] | ||
| ==== High QPS and Keyword-Only Searches | ||
|
|
||
| The following sizing scenario assumes a high QPS target, a CPU-bound configuration, and a keyword-only workload for a compact Search index. | ||
|
|
||
| This example uses the following variables: | ||
|
|
||
| |==== | ||
| |Number of Documents |Per Doc Index Bytes |QPS Target |System Overhead |Replica Factor | ||
|
|
||
| |194,000,000 | ||
| |258.05 | ||
| |87,000 | ||
| |0.10 | ||
| |2 (1 replica + 1) | ||
|
|
||
| |==== | ||
|
|
||
| Based on these variables, the required vCPUs could be either: | ||
|
|
||
| * stem:[580], using a value of `150` in the vCPU calculation. | ||
| * stem:[435], using a value of `200` in the vCPU calculation. | ||
|
|
||
| The Total Index GB With Replicas is stem:[110.13 \text{ GB}]. | ||
|
|
||
| The vCPUs matter the most in this workload. | ||
|
|
||
| To get a higher QPS for each vCPU, you could try a configuration of: | ||
|
|
||
| . 14 nodes with 32 vCPUs and 128{nbsp}GB of RAM | ||
| . 7 nodes with 64 vCPUs and 256{nbsp}GB of RAM | ||
|
|
||
| Otherwise, for a lower QPS for each vCPU, you could try a configuration of: | ||
|
|
||
| . 18 nodes with 32 vCPUs and 128{nbsp}GB of RAM | ||
| . 9 nodes with 64 vCPUs and 256{nbsp}GB of RAM | ||
|
|
||
| [#low-qps] | ||
| ==== Lower QPS with Higher Storage and a Larger Index | ||
|
|
||
| The following sizing scenario assumes a comparatively lower QPS target, a storage-bound configuration, and a larger Search index. | ||
|
|
||
| This example uses the following variables: | ||
|
|
||
| [cols="1,2,1,1,1"] | ||
| |==== | ||
| |Number of Documents |Per Doc Index Bytes |QPS Target |System Overhead |Replica Factor | ||
|
|
||
| |500,000,000 | ||
| |344.86 (For faceting, sorting, and more complex queries) | ||
| |12,000 | ||
| |0.10 | ||
| |2 (1 replica + 1) | ||
|
|
||
| |==== | ||
|
|
||
| Based on these variables, the required vCPUs would be stem:[60], based on the more complex queries needing a higher QPS per vCPU and using a value of `200` in the calculation. | ||
|
|
||
| If you wanted to use nodes with 32 vCPUs, you would need 2 nodes. | ||
|
|
||
| The Total Index GB With Replicas is stem:[379.34 \text{ GB}]. | ||
|
|
||
| Each of the 2 nodes would need stem:[379.34 \text{ GB} \times 0.65 = 123.28 \text{ GB}] of RAM. | ||
|
|
||
| As a result, the best configuration for this workload should be 2 nodes with 32 vCPUs and 128{nbsp}GB of RAM. | ||
|
|
||
| == Sizing Query Service Nodes | ||
|
|
||
| A node that runs the Query Service executes queries for your application needs. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| sources: | ||
| docs-devex: | ||
| branches: DOC-9267-fts-sizing | ||
|
|
||
| docs-analytics: | ||
| branches: release/8.0 | ||
|
|
||
| couchbase-cli: | ||
| branches: morpheus | ||
| startPaths: docs/ | ||
|
|
||
| backup: | ||
| branches: morpheus | ||
| startPaths: docs/ | ||
|
|
||
| #analytics: | ||
| # url: ../../docs-includes/docs-analytics | ||
| # branches: HEAD | ||
|
|
||
| cb-swagger: | ||
| url: https://github.com/couchbaselabs/cb-swagger | ||
| branches: release/8.0 | ||
| start_path: docs | ||
|
|
||
| # Minimal SDK build | ||
| docs-sdk-common: | ||
| branches: [release/8.0] | ||
| docs-sdk-java: | ||
| branches: [3.8-api] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to use the nomenclature -
inverted indexhere .. which is the primary data structure within the search index that indicates the list of the documents where the term appears.