feat(storage): optimize glob func#2776
Open
baojun-zhang wants to merge 1 commit into
Open
Conversation
Collaborator
Author
性能测试(Glob)前置导入 10w 个文件资源,用于准备 glob 测试数据: 测试命令python3 perf/s3/glob/load_test_glob_100.py \
--account-id 100k \
--user-id 100k \
--api-key 'M...0Yg' \
--uri viking://resources \
--pattern "*.yaml" \
--iterations 1结果汇总(100 次)
一句话结论在相同结果规模( 原始输出优化前优化后 |
Collaborator
Author
PS |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR lands the RAGFS-backed glob implementation for OpenViking, and completes the S3-side pagination optimization in this change set.
The new flow keeps the existing OpenViking visibility semantics in Python while moving candidate enumeration and glob pagination into Rust. For S3,
glob_directorynow uses scan-state pagination on top ofListObjectsV2, stops early once the requested page is filled, and keeps opaque continuation tokens scoped to the original query.Related Issue
N/A
Type of Change
Changes Made
glob_directorycontract withGlobEntry/GlobPage, plus the Python binding and client plumbing needed forVikingFS.glob().PurePath.match()-compatible semantics.ListObjectsV2, added scoped opaque continuation tokens, removed shadow/legacy rollout guidance from the design doc, and fixed S3 fetch size handling so the internal scan batch stays large enough for sparse-match workloads.Testing
Checklist
Screenshots (if applicable)
N/A
Additional Notes
ListObjectsV2scanning for generic glob patterns. This PR fixes the regression where the internal S3 scan batch was tied to the outward page size, but it does not introduce an index-based lookup layer.