fix: replace hardcoded "images" dataset check with generic corruption handling#40
Merged
gordonmurray merged 1 commit intolance-format:mainfrom Apr 7, 2026
Conversation
… handling
The `/datasets/{name}/rows` endpoint had a hardcoded branch that forced a
schema-only "corrupted_but_readable_schema" response whenever the dataset
was named `images`, regardless of whether the data was actually corrupted.
Any healthy dataset sharing that name was incorrectly shown as corrupted,
and any corrupted dataset with a different name got no special handling.
Remove the name-based check and rely on the existing exception handler
around the read path. Any dataset that fails to read (corruption, format
error, unreadable bytes) now falls back to the same informational single-row
response, matching the graceful-degradation behavior already documented for
the endpoint. Healthy datasets named `images` are read normally.
Also drop the log level for the fallback from `error` to `warning`, since
graceful degradation is an expected path rather than an error condition.
Fixes lance-format#19
21231d9 to
a516b7e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #19
Problem
/datasets/{name}/rowshad a hardcoded branch that forced a schema-onlycorrupted_but_readable_schemaresponse whenever the dataset was namedimages, regardless of whether the data was actually corrupted. Two failure modes fell out of this:imageswas incorrectly surfaced as corrupted.Change
Remove the name-based check and rely on the existing
exceptaround the read path. Any dataset that fails to read (corruption, format error, unreadable bytes) now falls back to the same informational single-row response that was already documented as the graceful-degradation path. Healthy datasets namedimagesare read normally.Also drop the fallback log level from
errortowarning, since graceful degradation is an expected path rather than an error condition.Verification
Smoke-tested locally against a temp LanceDB directory containing two tables:
normal(10 rows, with a vector column): read normally, returns rows and totals as expected.images(5 rows): previously this would return the hardcodedcorrupted_but_readable_schemasingle-row response. With this change it now returns the real data:{ "rows": [ {"id": 0, "label": "img0"}, {"id": 1, "label": "img1"}, {"id": 2, "label": "img2"} ], "total": 5, "limit": 3, "offset": 0 }The fallback path still triggers on any read failure and returns the existing
error/dataset/detailsinformational row, so the graceful-degradation contract is unchanged for actually-corrupted datasets.Notes
imagesnow return their real rows instead of the synthetic schema-info row.