Skip to content

支持lance数据格式 #103

@yaogang2060

Description

@yaogang2060

Checklist / 检查清单

  • I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。

Bug Description / Bug 描述

https://github.com/modelscope/twinkle/blob/main/src/twinkle/dataset/base.py#L131
这里如果设置为load_dataset(dataset_id)可以导入lance格式的数据,使用最新版本的datasets。

How to Reproduce / 如何复现

安装lance
uv pip install pylance uv pip install --upgrade datasets

创建一个lance数据集
`import lance
import pyarrow as pa

table = pa.Table.from_pylist([{"name": "Alice", "age": 20},
{"name": "Bob", "age": 30}])
ds = lance.write_dataset(table, "./alice_and_bob.lance")
`

dataset_path="./alice_and_bob.lance" dataset = Dataset(DatasetMeta(dataset_id=dataset_path)

Additional Information / 补充信息

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions