Skip to content

Design for integrating LanceDB into Hyrax#683

Merged
gitosaurus merged 17 commits intomainfrom
dtj-lance-design
Feb 12, 2026
Merged

Design for integrating LanceDB into Hyrax#683
gitosaurus merged 17 commits intomainfrom
dtj-lance-design

Conversation

@gitosaurus
Copy link
Copy Markdown
Contributor

Using this mechanism for comment on a proposed design for integrating LanceDB into Hyrax.

@gitosaurus gitosaurus marked this pull request as ready for review February 6, 2026 23:05
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.26%. Comparing base (19fb5f9) to head (85c518d).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #683   +/-   ##
=======================================
  Coverage   63.26%   63.26%           
=======================================
  Files          59       59           
  Lines        5771     5771           
=======================================
  Hits         3651     3651           
  Misses       2120     2120           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a design document describing a proposed, stepwise plan for integrating LanceDB/Lance as a new storage backend for Hyrax “result datasets” (inference/UMAP/test/engine outputs), including writer/reader abstractions, verb wiring, migration, deprecation, and testing strategy.

Changes:

  • Introduces a new design doc laying out incremental steps for Lance-backed result writing/reading (ResultDatasetWriter / ResultDataset).
  • Proposes config changes (results.storage_format), verb wiring via factories, and an .npy → Lance migration verb.
  • Captures risks (performance, API churn, visualize incompatibility, disk layout conflicts) and test/benchmark plans.

Comment thread lance_design.md Outdated
Comment thread specs/lance_db_spec.md
Comment thread lance_design.md Outdated
Comment thread lance_design.md Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 6, 2026

Before [19fb5f9] After [a13f0e7] Ratio Benchmark (Parameter)
101±0.5μs 104±0.9μs 1.03 data_request_benchmarks.DatasetRequestBenchmarks.time_request_all_data
9.44±0.2ms 9.63±0.06ms 1.02 vector_db_benchmarks.VectorDBSearchBenchmarks.time_search_by_vector_many_shards(128, 'chromadb')
6.53±0s 6.57±0.01s 1.01 data_cache_benchmarks.DataCacheBenchmarks.time_preload_cache_hsc1k
16.944909319024536 17.169437703760522 1.01 data_cache_benchmarks.DataCacheBenchmarks.track_cache_hsc1k_hyrax_size_undercount
6.59±0.04s 6.65±0.03s 1.01 vector_db_benchmarks.VectorDBInsertBenchmarks.time_load_vector_db(2048, 'qdrant')
9.30±0.2ms 9.43±0.2ms 1.01 vector_db_benchmarks.VectorDBSearchBenchmarks.time_search_by_vector_many_shards(64, 'chromadb')
1.75±0.01s 1.75±0.03s 1 benchmarks.time_database_connection_help
1.75±0.02s 1.75±0.01s 1 benchmarks.time_prepare_help
1.74±0.01s 1.74±0.03s 1 benchmarks.time_train_help
1.76±0.03s 1.76±0.02s 1 benchmarks.time_visualize_help

Click here to view all benchmarks.

Copy link
Copy Markdown
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a handful of comments. Overall I really like that approach. Were you BMAD'ing to produce this?

Comment thread specs/lance_db_spec.md
Comment thread specs/lance_db_spec.md
Comment thread lance_design.md Outdated
Comment thread lance_design.md Outdated
Comment thread lance_design.md Outdated
Comment thread lance_design.md Outdated
Comment thread specs/lance_db_spec.md
Comment thread lance_design.md Outdated
Comment thread specs/lance_db_spec.md
Comment thread lance_design.md Outdated
@gitosaurus gitosaurus mentioned this pull request Feb 10, 2026
4 tasks
Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 11, 2026

@gitosaurus I've opened a new pull request, #691, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 11, 2026

@gitosaurus I've opened a new pull request, #692, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 19 comments.

Comment thread specs/lance_db_spec.md
Comment thread lance_design.md Outdated
Comment thread lance_design.md Outdated
Comment thread lance_design.md Outdated
Comment thread specs/lance_db_spec.md
Comment thread lance_design.md Outdated
Comment thread specs/lance_db_spec.md
Comment thread specs/lance_db_spec.md
Comment thread specs/lance_db_spec.md
Comment thread lance_design.md Outdated
Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 12, 2026

@gitosaurus I've opened a new pull request, #696, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 12, 2026

@gitosaurus I've opened a new pull request, #697, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 12, 2026

@gitosaurus I've opened a new pull request, #698, to work on those changes. Once the pull request is ready, I'll request review from you.

gitosaurus and others added 5 commits February 11, 2026 17:03
* Use dedicated lance_db subdirectory for LanceDB storage

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
* Update lance_design.md terminology to match actual codebase

Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
gitosaurus and others added 8 commits February 11, 2026 17:03
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI and others added 4 commits February 11, 2026 17:12
* Simplify writer factory to always use ResultDatasetWriter without config option

Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>

---------

Co-authored-by: Derek T. Jones <dtj@mac.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
Co-authored-by: Derek T. Jones <dtj1s@uw.edu>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Address feedback on lines 104-106: clarify that tensor metadata (shape, dtype) is stored in Arrow table schema's custom metadata dictionary, show proper JSON serialization using json.dumps(), and explain the serialization/deserialization process with concrete code examples.

Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>

---------

Co-authored-by: Derek T. Jones <dtj@mac.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
Co-authored-by: Derek T. Jones <dtj1s@uw.edu>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
gitosaurus added a commit that referenced this pull request Feb 12, 2026
Copy link
Copy Markdown
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. We've reviewed it a couple of times, good to go.

@gitosaurus gitosaurus merged commit ba1430a into main Feb 12, 2026
9 checks passed
@gitosaurus gitosaurus deleted the dtj-lance-design branch February 12, 2026 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants