Add support for pandas 3.0 by hagenw · Pull Request #500 · audeering/audformat

hagenw · 2026-01-27T12:57:27Z

Closes #487

Timedelta/datetime updates

pandas changed the default unit of timedelta and datetime entries from nanoseconds to a resolution that matches the given input precision.

We update the code here to ensure we always get nanosecond resolution as before.

String updates

pandas introduces a new default string type (str/<StringDtype(na_value=nan)>), which replaces object as default.
Unfortunately, the new string type is different from string/<StringDtype(na_value=<NA>)> as it uses a different value to represent missing values.

We update the code here to ensure we get the same results wit pandas <3.0 and pandas >=3.0:

We continue to use string/<StringDtype(na_value=<NA>)> for the string scheme
We continue to use object for schemes with string labels (which are represented by a categorical data type)
Fix audformat.utils.set_index_dtypes() to be able to change between all available string types

Examples of changed string behavior

Output of print(obj.dtype)

Command	pandas 2.3.3	pandas 3.0.0
pd.Series([])	object	object
pd.Series(["a"])	object	str
pd.Series(["a", pd.NA])	object	str
pd.Series(["a", np.nan])	object	str
pd.Series(["a"], dtype="string")	string	string
pd.Series(["a"], dtype=str)	object	str
pd.Series(["a"], dtype=str)	object	str

Output of obj.dtype

Command	pandas 2.3.3	pandas 3.0.0
pd.Series([])	`dtype('O')`	`dtype('O')`
pd.Series(["a"])	`dtype('O')`	`<StringDtype(na_value=nan)>`
pd.Series(["a", pd.NA])	`dtype('O')`	`<StringDtype(na_value=nan)>`
pd.Series(["a", np.nan])	`dtype('O')`	`<StringDtype(na_value=nan)>`
pd.Series(["a"], dtype="string")	`string[python]`	`<StringDtype(na_value=<NA>)>`
pd.Series(["a"], dtype=str)	`dtype('O')`	`<StringDtype(na_value=nan)>`
pd.Series(["a"], dtype="str")	`dtype('O')`	`<StringDtype(na_value=nan)>`

* Add failing test * Make test pandas 3.0.0 compatible * Fix set_index_dtypes() for pandas 3.0 * Add comment * Fix doctests * Update segmented_index() * Use segmented_index in test * Add test for segmented_index

* pandas 3.0: fix utils.hash() * Fix comment * Remove unneeded code * Add more tests * Preserve ordered setting * Update comment

* Fix categorical dtype with Database.get() * Update tests * Add additional test * Improve code * Clean up comment * We converted to categorical data * Simplify test * Simplify string test

* Require timedelta64[ns] in assert_index() * Add tests for mixed cases

* pandas 3.0: segmented_index() and set_index_dtypes() (#490) * Add failing test * Make test pandas 3.0.0 compatible * Fix set_index_dtypes() for pandas 3.0 * Add comment * Fix doctests * Update segmented_index() * Use segmented_index in test * Add test for segmented_index * Avoid warning in testing.add_table() (#491) * pandas 3.0: fix utils.hash() (#492) * pandas 3.0: fix utils.hash() * Fix comment * Remove unneeded code * Add more tests * Preserve ordered setting * Update comment * Fix categorical dtype with Database.get() (#493) * Fix categorical dtype with Database.get() * Update tests * Add additional test * Improve code * Clean up comment * We converted to categorical data * Simplify test * Simplify string test * Require timedelta64[ns] in assert_index() (#494) * Require timedelta64[ns] in assert_index() * Add tests for mixed cases * pandas 3.0: fix doctests output

* Update test_utils.py * Update test_misc_table * Set index dtypes directly * Fix test_table * Update to_timedelta in index.py * Fix conversion to timedelta in testing.py * Update test_utils_concat.py * Add comment * Update to_timedelta()

sourcery-ai · 2026-01-27T12:57:33Z

Reviewer's Guide

Adds pandas 3.0 compatibility by enforcing nanosecond-resolution datetime/timedelta dtypes, normalizing string and categorical dtypes (especially for schemes and indices), making hashing/index utilities robust to new pandas string behavior, and relaxing the pandas upper bound; tests and docs are updated accordingly.

Updated class diagram for categorical and scheme dtype handling

classDiagram
    class Scheme {
        +labels
        +dtype
        +to_pandas_dtype() pd_dtype
    }

    class CommonModule {
        +to_categorical_dtype(labels) CategoricalDtype
        +to_pandas_dtype(dtype) pandas_dtype
    }

    class Table {
        +_pyarrow_convert_dtypes(df) DataFrame
    }

    class Database {
        +schemes
    }

    class Column {
        +scheme_id
    }

    Database --> Scheme : contains
    Table --> Database : references
    Table --> Column : contains
    Column --> Scheme : uses_scheme_id

    Scheme ..> CommonModule : uses_to_categorical_dtype
    Table ..> CommonModule : uses_to_categorical_dtype

File-Level Changes

Change	Details	Files
Normalize string, categorical, and index dtypes to behave consistently across pandas <3.0 and >=3.0, including schemes and segmented indices.	Introduce common.to_categorical_dtype() and reuse it from Scheme.to_pandas_dtype() and Table._pyarrow_convert_dtypes() to build categorical dtypes with stable category dtypes (ints as nullable int64, strings as object). Adjust segmented index helpers and assertions to enforce file level as pandas string dtype with and start/end levels as timedelta64[ns] (including via to_timedelta/to_timedelta helpers and assert_index checks). Update set_index_dtypes() to compare dtypes via a helper that distinguishes different StringDtype na_value variants, cast timedelta levels explicitly to requested units, and add extensive tests around string/StringDtype/NA vs nan variants on Index and MultiIndex. Update schemes and database get/append logic so string-based schemes use object-backed categoricals, normalize mixed string/object categorical category dtypes to object when aggregating, and ensure error messages stringify dtypes for robustness across pandas versions.	`audformat/core/common.py` `audformat/core/index.py` `audformat/core/scheme.py` `audformat/core/database.py` `audformat/core/table.py` `audformat/core/testing.py` `tests/test_index.py` `tests/test_scheme.py` `tests/test_database_get.py` `tests/test_misc_table.py` `tests/test_table.py` `tests/test_utils.py` `tests/test_utils_concat.py` `tests/test_column.py`
Make hashing of pandas objects stable across pandas 3.0 string/categorical changes and pyarrow inference differences, especially for empty frames and string/categorical columns.	In utils.hash(), normalize string-typed columns to object dtype before conversion to pyarrow, and normalize categorical columns whose categories are string-like to use object-backed categories. Build an explicit pyarrow schema for empty DataFrames where needed so object columns map to string rather than null, and fall back to normal from_pandas for non-empty frames. Extend hash tests to cover string vs object dtypes (including filewise indices and categorical data) and ensure the resulting hashes are identical across dtype variants.	`audformat/core/utils.py` `tests/test_utils.py`
Align tests, docs, and misc-table utilities with explicit dtypes and new pandas string defaults, and relax the pandas version constraint.	Update many tests to construct Index/MultiIndex objects with explicit dtype (object, string, Int64, timedelta64[ns], datetime64[ns]) so expectations are stable under pandas 3.0, and adjust expected error messages or xfails where dtype representations changed. Update misc-table creation/extension tests and examples to use explicit string index dtypes, remove cases that relied on implicit str->object behavior, and ensure drop/extend/pick operations respect index dtypes. Fix read_csv and other helpers to normalize empty DataFrame column dtypes under pandas 3.0 (e.g., explicitly casting columns/index names to string where pandas changed defaults). Change the development dependency on pandas in pyproject.toml to allow pandas >=2.0.0 with no <3.0 upper bound, and refresh doc examples where necessary.	`tests/test_misc_table.py` `tests/test_table.py` `tests/test_index.py` `tests/test_utils.py` `tests/test_utils_concat.py` `tests/test_scheme.py` `tests/test_database_get.py` `tests/test_column.py` `audformat/core/utils.py` `pyproject.toml` `audformat/core/utils.py` `audformat/core/index.py` `docs/data-misc-tables.rst`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#487	Ensure audformat’s timedelta and segmented index handling (including utilities like segmented_index/to_segmented_index) remains correct and stable under pandas 3.0.0’s new datetime/timedelta resolution inference, preserving the expected nanosecond precision.	✅
#487	Update audformat to be fully compatible with pandas 3.0.0 overall (e.g., new default string dtype and related categorical/index behavior) so that the package works correctly with pandas 3.x.	✅

Possibly linked issues

Pandas 3.0.0 breaking changes #487: They both address pandas 3.0’s timedelta precision change breaking segmented indexes; PR restores ns-resolution and compatibility.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

codecov · 2026-01-27T12:59:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (56a8268) to head (0e7089d).

Additional details and impacted files

Files with missing lines	Coverage Δ
audformat/core/common.py	`100.0% <100.0%> (ø)`
audformat/core/database.py	`100.0% <100.0%> (ø)`
audformat/core/index.py	`100.0% <100.0%> (ø)`
audformat/core/scheme.py	`100.0% <100.0%> (ø)`
audformat/core/table.py	`100.0% <100.0%> (ø)`
audformat/core/testing.py	`100.0% <100.0%> (ø)`
audformat/core/utils.py	`100.0% <100.0%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

* Ensure object dtype for string categories * Adjust tests * Better label type detection * Fix linter

* Add tests for expected categorical dtype * Add tests for expected scheme dtypes * Fix test * Fix test * Use explicit StringDtype

* Simplify checking for string dtype * Improve variable names

* Simplify creation of segmented index * Fix set_index_dtypes() * Revert "Simplify creation of segmented index" This reverts commit 73ff082. * Clean up comment * Fix dtype normalization * Revert "Revert "Simplify creation of segmented index"" This reverts commit 6f51a35. * Add tests for empty string index

hagenw · 2026-02-02T11:28:28Z

audformat/core/table.py

                and df[column_id].dtype == "object"
            ):
-                df[column_id] = df[column_id].astype("string", copy=False)
+                df[column_id] = df[column_id].astype("string")


This avoids a FutureWarning as copy will be removed from astype().

hagenw added 12 commits January 23, 2026 11:31

CI: run tests on dev branch

3c88176

Fix segmented_index() and set_index_dtypes() (#490)

f732b5d

* Add failing test * Make test pandas 3.0.0 compatible * Fix set_index_dtypes() for pandas 3.0 * Add comment * Fix doctests * Update segmented_index() * Use segmented_index in test * Add test for segmented_index

Avoid warning in testing.add_table() (#491)

8ce1358

Fix utils.hash() to return old value (#492)

f915774

* pandas 3.0: fix utils.hash() * Fix comment * Remove unneeded code * Add more tests * Preserve ordered setting * Update comment

Fix categorical dtype with Database.get() (#493)

9bff331

* Fix categorical dtype with Database.get() * Update tests * Add additional test * Improve code * Clean up comment * We converted to categorical data * Simplify test * Simplify string test

Require timedelta64[ns] in assert_index() (#494)

5c9b7c7

* Require timedelta64[ns] in assert_index() * Add tests for mixed cases

TST: fix misc table tests (#496)

639d29d

TST: fix remaining tests (#497)

591af86

* Update test_utils.py * Update test_misc_table * Set index dtypes directly * Fix test_table * Update to_timedelta in index.py * Fix conversion to timedelta in testing.py * Update test_utils_concat.py * Add comment * Update to_timedelta()

DOC: show again full table output (#498)

f0a4f35

Remove deprecated copy from astype() (#499)

5990e02

TST: enable pandas 3.0 in tests

0ea46b8

hagenw added 6 commits January 27, 2026 16:32

Fix error message for non-matching categories

d18c884

Improve comments

51ea81e

Ensure object dtype for string categories (#501)

0e048e8

* Ensure object dtype for string categories * Adjust tests * Better label type detection * Fix linter

TST: additional dtype tests for schemes (#502)

74f1c93

* Add tests for expected categorical dtype * Add tests for expected scheme dtypes * Fix test * Fix test * Use explicit StringDtype

Simplify checking for string dtype (#503)

6e48937

* Simplify checking for string dtype * Improve variable names

hagenw commented Feb 2, 2026

View reviewed changes

hagenw added 3 commits February 2, 2026 12:30

Require object in doctest for categorical data

2ea579c

Extend tests for segmented_index() dtype

12da5d9

Remove dev from CI

4c8b72d

hagenw marked this pull request as ready for review February 2, 2026 11:39

This comment was marked as outdated.

Sign in to view

hagenw added 2 commits February 2, 2026 13:06

Add more tests for empty objects

026e650

Fix hash() for empty dataframes

95ce635

hagenw self-assigned this Feb 2, 2026

hagenw requested a review from frankenjoe February 2, 2026 12:17

Always adjust time when converting from pyarrow

0e7089d

hagenw mentioned this pull request Feb 5, 2026

Add support for pandas 3.0 audeering/audb#542

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for pandas 3.0#500

Add support for pandas 3.0#500
hagenw wants to merge 24 commits intomainfrom
dev

hagenw commented Jan 27, 2026 •

edited

Loading

Uh oh!

sourcery-ai bot commented Jan 27, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

codecov bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

hagenw Feb 2, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hagenw commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Timedelta/datetime updates

String updates

Uh oh!

sourcery-ai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Updated class diagram for categorical and scheme dtype handling

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

codecov bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hagenw Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hagenw commented Jan 27, 2026 •

edited

Loading

sourcery-ai bot commented Jan 27, 2026 •

edited

Loading

codecov bot commented Jan 27, 2026 •

edited

Loading