Conversation
* Add failing test * Make test pandas 3.0.0 compatible * Fix set_index_dtypes() for pandas 3.0 * Add comment * Fix doctests * Update segmented_index() * Use segmented_index in test * Add test for segmented_index
* pandas 3.0: fix utils.hash() * Fix comment * Remove unneeded code * Add more tests * Preserve ordered setting * Update comment
* Fix categorical dtype with Database.get() * Update tests * Add additional test * Improve code * Clean up comment * We converted to categorical data * Simplify test * Simplify string test
* Require timedelta64[ns] in assert_index() * Add tests for mixed cases
* pandas 3.0: segmented_index() and set_index_dtypes() (#490) * Add failing test * Make test pandas 3.0.0 compatible * Fix set_index_dtypes() for pandas 3.0 * Add comment * Fix doctests * Update segmented_index() * Use segmented_index in test * Add test for segmented_index * Avoid warning in testing.add_table() (#491) * pandas 3.0: fix utils.hash() (#492) * pandas 3.0: fix utils.hash() * Fix comment * Remove unneeded code * Add more tests * Preserve ordered setting * Update comment * Fix categorical dtype with Database.get() (#493) * Fix categorical dtype with Database.get() * Update tests * Add additional test * Improve code * Clean up comment * We converted to categorical data * Simplify test * Simplify string test * Require timedelta64[ns] in assert_index() (#494) * Require timedelta64[ns] in assert_index() * Add tests for mixed cases * pandas 3.0: fix doctests output
* Update test_utils.py * Update test_misc_table * Set index dtypes directly * Fix test_table * Update to_timedelta in index.py * Fix conversion to timedelta in testing.py * Update test_utils_concat.py * Add comment * Update to_timedelta()
Contributor
Reviewer's GuideAdds pandas 3.0 compatibility by enforcing nanosecond-resolution datetime/timedelta dtypes, normalizing string and categorical dtypes (especially for schemes and indices), making hashing/index utilities robust to new pandas string behavior, and relaxing the pandas upper bound; tests and docs are updated accordingly. Updated class diagram for categorical and scheme dtype handlingclassDiagram
class Scheme {
+labels
+dtype
+to_pandas_dtype() pd_dtype
}
class CommonModule {
+to_categorical_dtype(labels) CategoricalDtype
+to_pandas_dtype(dtype) pandas_dtype
}
class Table {
+_pyarrow_convert_dtypes(df) DataFrame
}
class Database {
+schemes
}
class Column {
+scheme_id
}
Database --> Scheme : contains
Table --> Database : references
Table --> Column : contains
Column --> Scheme : uses_scheme_id
Scheme ..> CommonModule : uses_to_categorical_dtype
Table ..> CommonModule : uses_to_categorical_dtype
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
🚀 New features to boost your workflow:
|
* Ensure object dtype for string categories * Adjust tests * Better label type detection * Fix linter
* Add tests for expected categorical dtype * Add tests for expected scheme dtypes * Fix test * Fix test * Use explicit StringDtype
* Simplify checking for string dtype * Improve variable names
* Simplify creation of segmented index * Fix set_index_dtypes() * Revert "Simplify creation of segmented index" This reverts commit 73ff082. * Clean up comment * Fix dtype normalization * Revert "Revert "Simplify creation of segmented index"" This reverts commit 6f51a35. * Add tests for empty string index
hagenw
commented
Feb 2, 2026
| and df[column_id].dtype == "object" | ||
| ): | ||
| df[column_id] = df[column_id].astype("string", copy=False) | ||
| df[column_id] = df[column_id].astype("string") |
Member
Author
There was a problem hiding this comment.
This avoids a FutureWarning as copy will be removed from astype().
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #487
Timedelta/datetime updates
pandaschanged the default unit of timedelta and datetime entries from nanoseconds to a resolution that matches the given input precision.We update the code here to ensure we always get nanosecond resolution as before.
String updates
pandasintroduces a new default string type (str/<StringDtype(na_value=nan)>), which replacesobjectas default.Unfortunately, the new string type is different from
string/<StringDtype(na_value=<NA>)>as it uses a different value to represent missing values.We update the code here to ensure we get the same results wit pandas <3.0 and pandas >=3.0:
string/<StringDtype(na_value=<NA>)>for the string schemeobjectfor schemes with string labels (which are represented by a categorical data type)audformat.utils.set_index_dtypes()to be able to change between all available string typesExamples of changed string behavior
Output of
print(obj.dtype)Output of
obj.dtypedtype('O')dtype('O')dtype('O')<StringDtype(na_value=nan)>dtype('O')<StringDtype(na_value=nan)>dtype('O')<StringDtype(na_value=nan)>string[python]<StringDtype(na_value=<NA>)>dtype('O')<StringDtype(na_value=nan)>dtype('O')<StringDtype(na_value=nan)>