Design for integrating LanceDB into Hyrax#683
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #683 +/- ##
=======================================
Coverage 63.26% 63.26%
=======================================
Files 59 59
Lines 5771 5771
=======================================
Hits 3651 3651
Misses 2120 2120 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a design document describing a proposed, stepwise plan for integrating LanceDB/Lance as a new storage backend for Hyrax “result datasets” (inference/UMAP/test/engine outputs), including writer/reader abstractions, verb wiring, migration, deprecation, and testing strategy.
Changes:
- Introduces a new design doc laying out incremental steps for Lance-backed result writing/reading (
ResultDatasetWriter/ResultDataset). - Proposes config changes (
results.storage_format), verb wiring via factories, and an.npy→ Lance migration verb. - Captures risks (performance, API churn, visualize incompatibility, disk layout conflicts) and test/benchmark plans.
Click here to view all benchmarks. |
drewoldag
left a comment
There was a problem hiding this comment.
Left a handful of comments. Overall I really like that approach. Were you BMAD'ing to produce this?
|
@gitosaurus I've opened a new pull request, #691, to work on those changes. Once the pull request is ready, I'll request review from you. |
9a784c1 to
17cbe9b
Compare
|
@gitosaurus I've opened a new pull request, #692, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@gitosaurus I've opened a new pull request, #696, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@gitosaurus I've opened a new pull request, #697, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@gitosaurus I've opened a new pull request, #698, to work on those changes. Once the pull request is ready, I'll request review from you. |
* Use dedicated lance_db subdirectory for LanceDB storage --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
* Update lance_design.md terminology to match actual codebase Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
89d8e82 to
cadb17c
Compare
* Simplify writer factory to always use ResultDatasetWriter without config option Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com> --------- Co-authored-by: Derek T. Jones <dtj@mac.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com> Co-authored-by: Derek T. Jones <dtj1s@uw.edu> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Address feedback on lines 104-106: clarify that tensor metadata (shape, dtype) is stored in Arrow table schema's custom metadata dictionary, show proper JSON serialization using json.dumps(), and explain the serialization/deserialization process with concrete code examples. Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com> --------- Co-authored-by: Derek T. Jones <dtj@mac.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com> Co-authored-by: Derek T. Jones <dtj1s@uw.edu> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
drewoldag
left a comment
There was a problem hiding this comment.
This looks good to me. We've reviewed it a couple of times, good to go.
Using this mechanism for comment on a proposed design for integrating LanceDB into Hyrax.