Skip to content

[feature](be) Add SNII inverted index storage format#64909

Open
airborne12 wants to merge 17 commits into
apache:masterfrom
airborne12:snii
Open

[feature](be) Add SNII inverted index storage format#64909
airborne12 wants to merge 17 commits into
apache:masterfrom
airborne12:snii

Conversation

@airborne12

@airborne12 airborne12 commented Jun 27, 2026

Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary:
This PR integrates the SNII inverted index storage format into Doris as a separate storage-format path. SNII index files are written and read through the SNII implementation instead of falling through to the existing inverted-index storage stack, while analyzer usage remains shared where needed.

The change adds SNII-specific handling in index file reader/writer paths, preserves Doris IO context during SNII reads, bounds expanding prefix/regexp/wildcard/phrase-prefix queries before materializing all matching terms, and skips old inverted-index compaction/drop-index handling for SNII files. Unsupported SNII paths are rejected explicitly, including BKD-backed non-string indexes, ANN indexes, and BUILD INDEX.

The PR also adds inverted-index IO profile metrics for comparing real remote index scan behavior, then optimizes SNII high-df term and phrase hot paths found during cloud_sim E2E benchmarking. CPU profiles showed phrase execution dominated by docid conjunction, PFOR PRX decode, selective PRX CSR compaction, and exact two-term phrase verification overhead, not remote I/O. The optimization streams eligible term/prefix/regexp/wildcard results directly into Roaring, emits dense docid windows as ranges, carries PRX doc ordinal context through phrase execution, builds selected PRX count ranges during PFOR decode, skips redundant final-candidate filtering, adds sparse galloping docid conjunction paths, caps reserve sizes to the maximum possible match count, uses a chunk-level merge verifier for non-repeated two-term phrases, adds low-bit PFOR unpack fast paths, and uses bounded-span bitset/rank intersection for phrase candidate ordinal mapping.

Hotspot analysis for MATCH_PHRASE 'failed order' on the 10B cloud_sim dataset:

  • Before the final CPU work, pprof showed intersect_window_candidates_with_ordinals 20.2%, pfor_decode 16.3%, selected PFOR range building 11.5%, and PRX position materialization 11.5%.
  • After low-bit PFOR unpack fast paths, pfor_decode dropped to 6.6%, while docid conjunction became the dominant remaining self-CPU at 20.1%.
  • After bounded-span bitset/rank intersection, docid conjunction self-CPU dropped to 12.9%. IO bytes and serial read rounds were unchanged, confirming this was CPU-side algorithmic work rather than cache or remote-IO variance.

Representative cloud_sim cold benchmark on the 10B textbench dataset:

  • OP_term_high_df: V3 wall 3113.411 ms / CPU 22.27 s / HWM 6464 MB / read 819.39 MB / serial rounds 209.764K; SNII wall 2963.013 ms / CPU 19.62 s / HWM 3563 MB / read 579.32 MB / serial rounds 1.664K.
  • MATCH_PHRASE 'failed order': V3 wall 6434.967 ms / CPU 69.63 s / HWM 5561 MB / read 3.05 GB / serial rounds 799.466K; SNII baseline wall 5480.060 ms / CPU 48.07 s / scanner 45.607 s / read 2.92 GB / remote 2.64 GB / serial rounds 2.272K; SNII after final CPU optimization wall 5305.507 ms / CPU 41.68 s / scanner 39.231 s / read 2.92 GB / remote 2.64 GB / serial rounds 2.272K.
  • MATCH_PHRASE_PREFIX 'failed ord': V3 wall 7231.285 ms / CPU 82.25 s / HWM 5541 MB / read 3.07 GB / serial rounds 803.965K; SNII baseline wall 5688.971 ms / CPU 49.95 s / scanner 47.404 s / read 3.02 GB / remote 2.69 GB / serial rounds 2.353K; SNII after final CPU optimization wall 5189.219 ms / CPU 43.57 s / scanner 41.221 s / read 3.02 GB / remote 2.69 GB / serial rounds 2.353K.

Release note

Add SNII inverted index storage format and reject unsupported BKD, ANN, and BUILD INDEX operations for SNII.

Check List (For Author)

  • Test

    • Regression test
      • ./bootstrap.sh run -- --run -d inverted_index_p0/storage_format -s test_storage_format_snii -forceGenOut
      • ./bootstrap.sh run -- --run -d inverted_index_p0/storage_format -s test_storage_format_snii
    • Unit Test
      • ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.IndexDefinitionTest,org.apache.doris.alter.IndexChangeJobTest
      • ./run-be-ut.sh --run --filter='SniiPhraseQueryTest.*:SniiTermQueryTest.*:SniiPrxPodTest.*:SniiPforTest.*'
    • Manual test (add detailed scripts or steps below)
      • ./build.sh --be --fe -j 192
      • ./build.sh --be -j 192
      • ./build-support/clang-format.sh be/src/storage/index/snii/core/src/encoding/pfor.cpp be/src/storage/index/snii/core/src/query/docid_conjunction.cpp be/test/storage/index/snii_query_test.cpp
      • ./build-support/check-format.sh be/src/storage/index/snii/core/src/encoding/pfor.cpp be/src/storage/index/snii/core/src/query/docid_conjunction.cpp be/test/storage/index/snii_query_test.cpp
      • git diff --check
      • build-support/run-clang-tidy.sh --build-dir be/build_Release
      • Deployed Release BE to /mnt/disk1/jiangkai/cloud_sim/jiangkai_test, verified current-user MS/recycler/FE/BE processes, and verified SHOW BACKENDS Alive=true.
      • Ran support_phrase MATCH_PHRASE and MATCH_PHRASE_PREFIX smoke queries against textbench_10b_perf.otel10b_phrase40_snii.
      • Ran cloud_sim cold E2E benchmarks and pprof analysis for OP_term_high_df, PH5_phrase_failed_order, and PP5_phrase_prefix_failed.
      • Final PH5/PP5 run: /mnt/disk15/jiangkai/textbench/runs/20260628_phrase_cpu_opt_final_verified.
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Doris only routed inverted index files through the existing V1/V2 storage implementations. This change adds SNII as an independent inverted index storage format, copies the SNII core reader/writer/query implementation into BE, and branches the Doris index file reader/writer paths so SNII reads, writes, queries, and null bitmap handling go through SNII code. SNII reuses Doris analyzer integration only; it does not route SNII storage through the existing CLucene directory/compound reader paths. SNII currently supports string and array string inverted indexes, while numeric/BKD indexes are rejected for this format until BKD support is implemented.

### Release note

Add SNII as an inverted index storage format for string inverted indexes. BKD indexes are not supported with SNII yet.

### Check List (For Author)

- Test: Build
    - `./build.sh --be`
    - `./build.sh --fe`
    - `build-support/clang-format.sh`
    - `build-support/check-format.sh`
    - `build-support/run-clang-tidy.sh --build-dir be/build_Release` attempted; it failed because clang-tidy could not resolve `stddef.h` in this toolchain and also reported pre-existing unrelated diagnostics.
- Behavior changed: Yes. Tables using `inverted_index_storage_format=SNII` route string inverted index storage/query/null handling through SNII and reject BKD indexes.
- Does this need documentation: Yes. No doc PR yet.
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Add a focused regression case for the SNII inverted index storage format. The test creates a string inverted index with inverted_index_storage_format=SNII, verifies MATCH_ANY, MATCH_ALL, MATCH_PHRASE, and NULL bitmap behavior, and validates that numeric/BKD inverted index creation is rejected for SNII.

### Release note

None

### Check List (For Author)

- Test: Regression test
    - bash /mnt/disk1/jiangkai/cloud_sim/bootstrap.sh run -- --run -d inverted_index_p0/storage_format -s test_storage_format_snii -genOut
    - bash /mnt/disk1/jiangkai/cloud_sim/bootstrap.sh run -- --run -d inverted_index_p0/storage_format -s test_storage_format_snii
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: This change completes the SNII inverted index storage-format split by routing SNII reads and writes through the SNII implementation, preserving Doris IO context during SNII file reads, bounding expanding queries before materializing all prefix terms, and rejecting unsupported SNII operations such as BKD-backed indexes, ANN indexes, and BUILD INDEX. It also avoids applying old CLucene index-compaction/drop-index paths to SNII files and adds focused FE and regression coverage for unsupported paths.

### Release note

SNII inverted index storage format rejects unsupported BKD, ANN, and BUILD INDEX operations.

### Check List (For Author)

- Test:
    - Build: ./build.sh --be --fe -j 192
    - Build: ./build.sh --be -j 192
    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.IndexDefinitionTest,org.apache.doris.alter.IndexChangeJobTest
    - Regression test: ./bootstrap.sh run -- --run -d inverted_index_p0/storage_format -s test_storage_format_snii -forceGenOut; ./bootstrap.sh run -- --run -d inverted_index_p0/storage_format -s test_storage_format_snii
    - Format: ./build-support/clang-format.sh; ./build-support/check-format.sh; git diff --check
    - Static Analysis: ./build-support/run-clang-tidy.sh and ./build-support/run-clang-tidy.sh --build-dir be/build_Release attempted; failed because clang-tidy could not resolve system stddef.h and also reported existing large-function/NOLINT diagnostics outside the safe scope of this SNII integration.
- Behavior changed: Yes. SNII explicitly rejects unsupported BKD/ANN/BUILD INDEX paths instead of falling through to non-SNII index handling.
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: SNII performance validation in cloud mode needs comparable IO observability against the existing CLucene/V3 inverted index path. Before this change, SNII opened remote index files without the same file-cache options as V3 and only part of the IO context reached SNII/CLucene readers, so query profiles could not compare logical requested index bytes, physical index reads, and serial read rounds. This change routes SNII index file opens through Doris file-cache options, propagates copied inverted-index IO context through SNII reads, records request/read bytes and read round counters for both SNII and CLucene index readers, and exposes those counters in the file-cache profile reporter.

### Release note

SNII and V3 inverted index scans now expose additional IO profile counters for request bytes, physical read bytes, range read count, and serial read rounds.

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --clean --run --filter='DorisSniiFileReaderTest.*:DorisFSDirectoryTest.FSIndexInputReadInternalRecordsIndexIOStatsAndContext:FileCacheProfileReporterTest.*' -j "192"
    - Unit Test: ./run-be-ut.sh --run --filter='DorisSniiFileReaderTest.*:DorisFSDirectoryTest.FSIndexInputReadInternalRecordsIndexIOStatsAndContext:FileCacheProfileReporterTest.*' -j "192"
    - Format: build-support/clang-format.sh; build-support/check-format.sh; git diff --check
    - Static Analysis: build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN attempted; failed because clang-tidy could not resolve system stddef.h and also reported existing large-function/C-header/NOLINT diagnostics outside this change. Clear new SNII adapter style warnings were fixed.
- Behavior changed: Yes. SNII remote index file reads now use the same Doris file-cache reader options as V3 when file cache is enabled, and both SNII/V3 report additional profile counters.
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: SNII phrase and phrase-prefix queries spent most CPU time re-scanning phrase candidate docids for every PRX chunk and allocating per-doc expected tail position vectors. On the 10B TextBench cloud benchmark, MATCH_PHRASE 'failed order' took 253.0s wall / 3942.9 CPU-s and MATCH_PHRASE_PREFIX 'failed ord' took 438.3s wall / 6939.5 CPU-s before this optimization. The fix starts PRX candidate filtering from the chunk's first docid, keeps an all-selected fast path, stores expected tail positions in flat CSR-style arrays, and uses a single exact-term fast path for phrase-prefix expected tail positions. The same benchmark now runs in about 3.8s / 59.3 CPU-s for phrase and 3.9s / 61-62 CPU-s for phrase-prefix.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter=SniiPhraseQueryTest.*
    - Manual test: release BE build and cloud_sim E2E TextBench phrase benchmark
    - Static check: git diff --check; build-support/run-clang-tidy.sh --build-dir be/build_Release passed changed lines in phrase_query.cpp, while the new BE UT source is blocked by a local libstdc++ _POSIX_SEM_VALUE_MAX toolchain header error.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: SNII MATCH_PHRASE in cloud mode spent most CPU in selective PRX/PFOR decode, candidate ordinal materialization, and generic two-term position checks. This change removes per-doc ordinal Status overhead, uses selected PRX ranges to compact dense PFOR decodes, decodes sparse PFOR runs through a stack buffer, and adds a two-term phrase merge path. In cloud_sim PH5 on textbench_10b_perf.otel10b_phrase40_snii, BE CPU is now about 47.67s on average with SQL and inverted-index query cache disabled, down from roughly 55-59s observed before these optimizations.

### Release note

None

### Check List (For Author)

- Test: Unit Test and Manual test
    - Unit Test: ./run-be-ut.sh --run --filter=SniiPhraseQueryTest.*:SniiPrxPodTest.*
    - Build: ./build.sh --be -j 192
    - Manual test: deployed BE to cloud_sim and ran PH5 benchmark under /mnt/disk15/jiangkai/textbench/runs/20260628_phrase_cpu_opt_final_refactor_nocache
    - Static check: git diff --check; build-support/run-clang-tidy.sh --build-dir be/build_Release attempted but failed because the local clang/GCC sysroot cannot resolve stddef.h
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#64909

Problem Summary: SNII phrase and phrase-prefix queries spent most CPU in docid intersection, selective PRX/PFOR decode, and position verification. This change avoids generating PRX ordinals for full on-disk windows so the reader can use the full CSR path, folds selective PRX count validation into selected-range construction, removes hot-loop overflow helper calls from two-term phrase matching, and routes single-tail phrase-prefix queries through the streaming phrase path to avoid materializing all expected tail positions. On the 10B cloud_sim PP5 cold query, SNII BE CPU improved from 72.20s to 63.29s and HWM dropped from 20.26GB to 7.08GB.

### Release note

None

### Check List (For Author)

- Test: Unit Test and Manual test
    - Unit Test: ./run-be-ut.sh --run --filter=SniiPhraseQueryTest.*:SniiPrxPodTest.*
    - Static check: build-support/check-format.sh; git diff --check
    - Manual test: ./build.sh --be -j 192
    - Manual test: cloud_sim cold query benchmark for PH5/PP5 via /mnt/disk15/jiangkai/textbench/cold_query_bench.sh
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: SNII high-df term and phrase queries spent avoidable CPU in vector-to-Roaring materialization, dense docid expansion, selected PRX range construction, repeated final-candidate filtering, and sorted docid conjunction. The CPU profile showed phrase execution dominated by docid conjunction, PFOR PRX decode, and selective PRX CSR compaction instead of remote I/O. This change streams eligible term, prefix, regexp, and wildcard query results directly into Roaring, emits dense docid windows as ranges, carries PRX doc ordinal context through phrase execution, builds selected PRX count ranges during PFOR decode, skips redundant final-candidate filtering, and adds sparse galloping paths for docid conjunction. It also caps conjunction reserve sizes to the maximum possible match count and refactors the SNII reader query dispatch to keep the storage reader control flow smaller.

### Release note

None

### Check List (For Author)

- Test: Unit Test / Manual test
    - Unit Test: ./run-be-ut.sh --run --filter=SniiPhraseQueryTest.*:SniiTermQueryTest.*:SniiPrxPodTest.*
    - Manual test: ./build.sh --be -j 192
    - Manual test: deployed BE to cloud_sim and ran support_phrase MATCH_PHRASE smoke query
    - Manual test: cloud_sim cold benchmark for OP_term_high_df, PH5_phrase_failed_order, and PP5_phrase_prefix_failed
    - Static check: git diff --check; build-support/check-format.sh
    - Static check: clang-tidy was attempted with build-support/run-clang-tidy.sh --build-dir be/build_Release, but this environment failed before useful changed-line validation with missing stddef.h/toolchain include errors and pre-existing full-file warnings.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: SNII phrase queries on the 10B cloud benchmark spent most CPU in PRX position decode, posting cursor iteration, and docid conjunction. This change adds a two-term phrase streaming path, uses an adjacent-pair prefilter for multi-term phrase verification, reuses candidate ranges during docid conjunction, and adds low bit-width PFOR unpack paths for common PRX count and delta widths. A near-dense ordinal shortcut was tested and removed because it regressed the PH5 and PP5 phrase cases. In the cloud_sim 10B cold benchmark, PH5 improved from 6181 ms wall time and 60.22 s BE CPU to 5794 ms wall time and 55.43 s BE CPU; PP5 improved from 6211 ms wall time and 59.82 s BE CPU to 6038 ms wall time and 56.26 s BE CPU. PH5 pprof shows pfor_decode self CPU reduced to about 11.4% after the low bit-width fast paths.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter='SniiPforTest.*:SniiPhraseQueryTest.*:SniiTermQueryTest.*:SniiPrxPodTest.*'
    - Manual test: ./build.sh --be -j 192
    - Manual test: cloud_sim PH5/PP5 cold benchmark at /mnt/disk15/jiangkai/textbench/runs/20260628_phrase_cpu_final_pfor_next_cold
    - Manual test: cloud_sim PH5 pprof at /mnt/disk15/jiangkai/textbench/runs/20260628_phrase_final_pfor_next_pprof
    - Static Check: build-support/clang-format.sh, build-support/check-format.sh, and build-support/run-clang-tidy.sh --build-dir be/build_Release
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: SNII phrase queries on the 10B cloud benchmark still spent the largest CPU share in docid/ordinal intersection before PRX verification. The hot path had to intersect dense or near-dense candidate windows with term docids and produce PRX doc ordinals. This change adds a candidate-span fast path that directly accepts all term docids when candidates continuously cover the term span, adds dense/near-dense ordinal mapping for term spans with few missing docs, and fixes mixed dense/non-dense window output ordering by batching window work and flushing in original order. In cloud_sim PH5/PP5 cold benchmark, PH5 improved from 5794 ms wall and 55.43 s BE CPU to 5702 ms wall and 53.55 s BE CPU; PP5 kept similar wall time and reduced BE CPU from 56.26 s to 55.21 s. The inverted index read bytes, remote bytes, cache bytes, serial rounds, and IO counts stayed unchanged, so the improvement is from CPU-side algorithm work rather than reduced remote reads. A bulk dense append variant was tested and reverted because it regressed PH5/PP5.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter='SniiPforTest.*:SniiPhraseQueryTest.*:SniiTermQueryTest.*:SniiPrxPodTest.*'
    - Manual test: ./build.sh --be -j 192
    - Manual test: cloud_sim PH5/PP5 cold benchmark at /mnt/disk15/jiangkai/textbench/runs/20260628_phrase_final_cover_span_refactor_cold
    - Manual test: cloud_sim PH5 pprof at /mnt/disk15/jiangkai/textbench/runs/20260628_phrase_final_cover_span_refactor_pprof
    - Static Check: build-support/clang-format.sh, build-support/check-format.sh, git diff --check
    - Static Check: build-support/run-clang-tidy.sh --build-dir be/build_Release failed because the local ldb clang-tidy toolchain could not find stddef.h while parsing system headers
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: SNII phrase verification still spent CPU in per-candidate cursor/status handling for exact two-term phrases. Profiles showed PostingCursor::next and positions handling in the hot path while IO metrics stayed unchanged. This change reuses a shared PRX chunk decoder and adds a two-term chunk merge path for non-repeated phrases so overlapping chunks decode once and docids are verified with a linear merge. Repeated-term phrases still use the existing streaming path, and multi-term streaming is split into smaller helpers. On 10B textbench cold cloud_sim runs, PH5 CPU dropped from 53.55s to 48.07s and PP5 CPU dropped from 55.21s to 49.95s with identical IO bytes.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter='SniiPhraseQueryTest.*:SniiTermQueryTest.*:SniiPrxPodTest.*:SniiPforTest.*'
    - Manual test: ./build.sh --be -j 192
    - Manual test: cloud_sim BE redeploy and smoke MATCH_PHRASE / MATCH_PHRASE_PREFIX
    - Benchmark: textbench 10B cold PH5/PP5 comparison
    - Static check: build-support/clang-format.sh, build-support/check-format.sh, git diff --check; build-support/run-clang-tidy.sh --build-dir be/build_Release attempted but blocked by local clang-tidy system header resolution where stddef.h is not found.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#64909

Problem Summary: SNII phrase queries over high-df terms were CPU-bound in PFOR unpacking and docid conjunction ordinal mapping. PH5/PP5 profiling on the 10B cloud_sim dataset showed pfor_decode and intersect_window_candidate_range_with_ordinals as top self CPU consumers while remote bytes and serial read rounds stayed fixed. This change adds low-bit PFOR unpack fast paths for common widths 3/5/6/7 and a bounded-span bitset/rank intersection path that preserves PRX doc ordinals for 16K-doc windows. The optimized path keeps the on-disk format unchanged and reduces CPU in the cold cloud_sim phrase benchmark: PH5 BE CPU 48.07s -> 41.68s, PP5 BE CPU 49.95s -> 43.57s.

### Release note

None

### Check List (For Author)

- Test: Unit Test / Manual test
    - build-support/clang-format.sh be/src/storage/index/snii/core/src/encoding/pfor.cpp be/src/storage/index/snii/core/src/query/docid_conjunction.cpp be/test/storage/index/snii_query_test.cpp
    - build-support/check-format.sh be/src/storage/index/snii/core/src/encoding/pfor.cpp be/src/storage/index/snii/core/src/query/docid_conjunction.cpp be/test/storage/index/snii_query_test.cpp
    - git diff --check
    - build-support/run-clang-tidy.sh --build-dir be/build_Release
    - ./run-be-ut.sh --run --filter='SniiPhraseQueryTest.*:SniiTermQueryTest.*:SniiPrxPodTest.*:SniiPforTest.*'
    - ./build.sh --be -j 192
    - cloud_sim deploy/start BE and PH5/PP5 cold phrase benchmark under /mnt/disk15/jiangkai/textbench/runs/20260628_phrase_cpu_opt_final_verified
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: SNII logical index readers were reopened on every searcher-cache miss path, so logical-index metadata and resident small headers were repeatedly loaded during query execution. This also made profile data unable to distinguish whether remote file-cache amplification came from segment metadata, dictionary, BSBF, posting, norms, or null bitmap reads. This change stores opened SNII logical index readers in the existing inverted-index searcher cache, keeps the owning IndexFileReader alive for the cached reader lifetime, registers logical section ranges before opening the reader, and propagates SNII section type through IOContext so profile counters can report physical remote bytes and file-cache block behavior by section.

### Release note

Add SNII inverted-index searcher cache support and SNII section-level file-cache profile counters.

### Check List (For Author)

- Test: Manual test
    - Ran build-support/run_clang_format.py on changed BE files
    - Ran git diff --check and git diff --cached --check
    - Ran ninja -C be/build_Release -j192 doris_be
    - Deployed Release BE to cloud_sim and verified current-user MS/recycler/FE/BE processes plus SHOW BACKENDS Alive=true
    - Ran cloud_sim E2E SNII query/profile probes for cold Q4 and warm follow-up queries
- Behavior changed: Yes. SNII format now reuses the inverted-index searcher cache for logical readers and exposes section-level profile counters.
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: SNII metadata reads were routed through Doris file cache even when the logical request was an exact small metadata/header read. In cloud mode this made logical metadata requests appear as large physical remote reads and wrote metadata blocks into file cache. The SNII existence probe also opened a full logical index reader, which could read per-index metadata and small headers outside the searcher-cache lookup path. This change honors IOContext::read_file_cache=false in the cached remote reader, uses direct exact remote reads for SNII metadata scopes, opens SNII logical readers from owned per-index metadata bytes, and makes SNII index existence checks use the cached tail directory without materializing a LogicalIndexReader. LogicalIndexReader keeps per-index meta and small BSBF headers resident so searcher-cache hits avoid reopening metadata.

### Release note

None

### Check List (For Author)

- Test: Unit Test / Manual test
    - Unit Test: ./run-be-ut.sh --run --filter='SniiSegmentReaderTest.IndexExistsUsesCachedTailDirectory:SniiSegmentReaderTest.NonResidentBsbfCachesHeaderAndProbesBodyBlock:SniiSegmentReaderTest.OpenDoesNotReadWholeTailMetaRegion:BlockFileCacheTest.ReadFileCacheFalseReadsExactRemoteBytesAndDoesNotPopulateCache' -j "192"
    - Manual test: ninja -C be/build_Release -j"192" doris_be
    - Manual test: deployed release BE to cloud_sim and ran SNII cold/warm benchmark probes under /mnt/disk15/jiangkai/textbench/runs/20260628_index_exists_*
- Behavior changed: Yes. SNII metadata reads now bypass file cache when requested and SNII existence checks no longer open logical readers.
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: SNII phrase-prefix with multiple tail expansions built a full phrase execution state for each tail term. That read complete tail docid and position postings before intersecting with the expected documents produced by the exact phrase prefix. The change reuses the docid conjunction planner with the expected docids as the initial candidate set, so tail expansions only fetch docid and PRX windows that can contain expected documents. The shared conjunction loop now covers both normal conjunctions and candidate-filtered conjunctions to avoid behavior drift.

### Release note

None

### Check List (For Author)

- Test: Unit Test / Manual test
    - Unit Test: ./run-be-ut.sh --run --filter='SniiPhraseQueryTest.MultiTailPhrasePrefixFiltersTailPrxByExpectedDocs:SniiPhraseQueryTest.WindowedPhrasePrefixQueryKeepsCorrectCandidateOrdinals:SniiPhraseQueryTest.SingleTailPhrasePrefixUsesStreamingPhrasePath:SniiPhraseQueryTest.MultiTermPhraseUsesPairPrefilter' -j "$(nproc)"
    - Manual test: ./build.sh --be -j "$(nproc)"; redeployed BE to cloud_sim; ran SNII cold PP4_phrase_prefix_payment and PP5_phrase_prefix_failed
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: SNII two-term phrase queries still had to read and verify positional postings even when the query only needed an adjacent token pair. This was much slower than expected for common phrase predicates and made cold remote index scans expensive. This change writes filtered hidden phrase-bigram terms for SNII indexes with positions, uses them as a docid-only fast path for exact two-term MATCH_PHRASE, and filters hidden internal terms from prefix/wildcard/regexp expansion. Non-indexable phrase terms fall back to the existing positional path to preserve correctness. The SNII has_null path also reads per-index metadata through the resident segment reader instead of opening a logical index reader.

### Release note

None

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test
    - BE unit test: ./run-be-ut.sh --run --filter='SniiSegmentReaderTest.*:SniiPhraseQueryTest.*:SniiTermQueryTest.*:InvertedIndexFileReaderTest.TestHasNullSnii*' -j $(nproc)
    - Build: ./build.sh --be -j $(nproc)
    - Regression test: ./bootstrap.sh run -- --run -d inverted_index_p0/storage_format -s test_storage_format_snii
    - Manual test: cloud_sim 10B phrase SNII load, full compaction, and cold query/profile benchmark against V3
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Replace local `(void)` unused-result suppressions in SNII core with explicit parsing behavior. The posting replay path now uses a named skip helper when consuming docid-delta varints, and format readers explicitly reject unsupported tail format versions or reserved tail-meta flags instead of parsing fields and suppressing unused-variable warnings.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-be-ut.sh --run --filter="SniiSegmentReaderTest.*" -j "$(nproc)"
- Behavior changed: No
- Does this need documentation: No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants