Add a new optional argument, index_name, to DataFrame Collections.
When this argument is provided, the DataFrame index is materialized as a new column with the given name.
Behavior:
- If index_name is set, a new column is created in the DataFrame collection containing the index values.
- The original DataFrame index remains unchanged.
- The new column may participate in uniqueness constraints if its values are unique.
Validations:
- Type Validation: Valid types are str, range (default) and numbers (integers or floats)
- Name conflict: The name of the column must not conflict with the columns in the Dataframe
- Uniqueness: Uniqueness of the index (if custom indexes are allowed, this validation may be required)
Note: This new columns can be added to the list of unique_column_names.
This is determinate by the uniqueness of the column
Exmple:
df = pd.DataFrame({
"A": [1, 2, 3] * 2,
"B": ["A", "B] * 3,
}, index=range(6))
pydough.dataframe_collection(
name="my_df",
dataframe=df,
unique_column_names=["C", ["A", "B"]],
index_name="C"
)
The Dataframe collection would be created with a new column called "C" that contains [0, 1, 2, 3, 4, 5]
Result
C A B
0 1 A
1 2 B
2 3 A
3 1 B
4 2 A
5 3 B
Unique columns validation: Include the ability to have at least one column from unique_column_names in filter_columns instead of all being required.
Example:
unique columns are ["column1", ["column2", "column3"]]
but in the filter column I can include column1 only or column2 and column3 without column1
This also requires smarter validation for unique_columns. Making sure that if a unique column is compose of more than one column, all of them are included in the filter_column if provided. Following the last example, if column1 is not included it must include column2 AND column3.
Add a new optional argument, index_name, to DataFrame Collections.
When this argument is provided, the DataFrame index is materialized as a new column with the given name.
Behavior:
Validations:
Note: This new columns can be added to the list of
unique_column_names.This is determinate by the uniqueness of the column
Exmple:
The Dataframe collection would be created with a new column called "C" that contains [0, 1, 2, 3, 4, 5]
Result
Unique columns validation: Include the ability to have at least one column from
unique_column_namesinfilter_columnsinstead of all being required.Example:
unique columns are ["column1", ["column2", "column3"]]
but in the filter column I can include column1 only or column2 and column3 without column1
This also requires smarter validation for unique_columns. Making sure that if a unique column is compose of more than one column, all of them are included in the
filter_columnif provided. Following the last example, ifcolumn1is not included it must includecolumn2ANDcolumn3.