Skip to content

deps(deps): bump datasets from 4.4.2 to 4.7.0#47

Closed
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/datasets-4.7.0
Closed

deps(deps): bump datasets from 4.4.2 to 4.7.0#47
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/datasets-4.7.0

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Mar 16, 2026

Copy link
Copy Markdown
Contributor

Bumps datasets from 4.4.2 to 4.7.0.

Release notes

Sourced from datasets's releases.

4.7.0

Datasets Features

  • Add Json() type by @​lhoestq in huggingface/datasets#8027
    • JSON Lines files that contain arbitrary JSON objects like tool calling datasets are now supported. When there is a field or subfield containing mixed types (e.g. mix of str/int/float/dict/list or dictionaries with arbitrary keys), the Json()type is used to store such data that would normally not be supported in Arrow/Parquet
    • Use the Json() type in Features() for any dataset, it is supported in any functions that accepts features=like load_dataset(), .map(), .cast(), .from_dict(), .from_list()
    • Use on_mixed_types="use_json" to automatically set the Json() type on mixed types in .from_dict(), .from_list() and .map()

Examples:

You can use on_mixed_types="use_json" or specify features= with a [Json] type:

>>> ds = Dataset.from_dict({"a": [0, "foo", {"subfield": "bar"}]})
Traceback (most recent call last):
  ...
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert 'foo' with type str: tried to convert to int64
>>> features = Features({"a": Json()})
>>> ds = Dataset.from_dict({"a": [0, "foo", {"subfield": "bar"}]}, features=features)
>>> ds.features
{'a': Json()}
>>> list(ds["a"])
[0, "foo", {"subfield": "bar"}]

This is also useful for lists of dictionaries with arbitrary keys and values, to avoid filling missing fields with None:

>>> ds = Dataset.from_dict({"a": [[{"b": 0}, {"c": 0}]]})
>>> ds.features
{'a': List({'b': Value('int64'), 'c': Value('int64')})}
>>> list(ds["a"])
[[{'b': 0, 'c': None}, {'b': None, 'c': 0}]]  # missing fields are filled with None
>>> features = Features({"a": List(Json())})
>>> ds = Dataset.from_dict({"a": [[{"b": 0}, {"c": 0}]]}, features=features)
>>> ds.features
{'a': List(Json())}
>>> list(ds["a"])
[[{'b': 0}, {'c': 0}]]  # OK

Another example with tool calling data and the on_mixed_types="use_json" argument (useful to not have to specify features= manually):

>>> messages = [
...     {"role": "user", "content": "Turn on the living room lights and play my electronic music playlist."},
...     {"role": "assistant", "tool_calls": [
...         {"type": "function", "function": {
</tr></table> 

... (truncated)

Commits
  • ac9c452 release: 4.7.0 (#8058)
  • bd4fb05 Limit dataset listing to first 20 entries in readme (#8057)
  • 4de29bf Fix unstable tokenizer fingerprinting (enables map cache reuse) (#7982)
  • fdd8a65 fix: handle nested null types in feature alignment for multi-proc map (#8047)
  • 0751557 fix(iterable_dataset): preserve features when chaining filter() on typed Iter...
  • 1bd0a5c Don't extract bad files (#8056)
  • 6ef54e7 Fix silent data loss in push_to_hub when num_proc > num_shards (#8044)
  • 38511fc Use num_examples instead of len(self) for iterable_dataset's SplitInfo (#8041)
  • c410be5 Fix non-deterministic by sorting metadata extensions (#8034) (#8039)
  • 70f7474 Fix typos in iterable_dataset.py (#8049)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [datasets](https://github.com/huggingface/datasets) from 4.4.2 to 4.7.0.
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@4.4.2...4.7.0)

---
updated-dependencies:
- dependency-name: datasets
  dependency-version: 4.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Mar 16, 2026
@dependabot @github

dependabot Bot commented on behalf of github Mar 23, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #49.

@dependabot dependabot Bot closed this Mar 23, 2026
@dependabot dependabot Bot deleted the dependabot/pip/datasets-4.7.0 branch March 23, 2026 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant