Skip to content

Saving dtypes as metadata for roundtrip consistency#262

Open
yfukai wants to merge 13 commits intoroyerlab:mainfrom
yfukai:from_other_roundtrip
Open

Saving dtypes as metadata for roundtrip consistency#262
yfukai wants to merge 13 commits intoroyerlab:mainfrom
yfukai:from_other_roundtrip

Conversation

@yfukai
Copy link
Contributor

@yfukai yfukai commented Feb 17, 2026

This is a follow-up on the dtype, assuming #260 has been merged.
In this PR, the graphs store the dtypes of the fields as metadata, enabling robust recovery of the original dtypes during round-trip from_other conversions.

@yfukai yfukai marked this pull request as draft February 17, 2026 06:42
@yfukai
Copy link
Contributor Author

yfukai commented Feb 17, 2026

Notes:

  • I should also store the default values. Serializing the polars dataframes is useful to store the values?
  • The roles of AttrSchema overlap with the current metadata. This could be SQLGraph-specific machinery.

@yfukai
Copy link
Contributor Author

yfukai commented Feb 18, 2026

Both tasks are done. I'll review the code for SQLgraph and mark this open for review.

@codecov-commenter
Copy link

codecov-commenter commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 86.20690% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.76%. Comparing base (e4bb183) to head (f99e20f).

Files with missing lines Patch % Lines
src/tracksdata/utils/_dtypes.py 71.83% 11 Missing and 9 partials ⚠️
src/tracksdata/graph/_sql_graph.py 95.50% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #262      +/-   ##
==========================================
- Coverage   87.92%   87.76%   -0.16%     
==========================================
  Files          56       56              
  Lines        4471     4610     +139     
  Branches      789      810      +21     
==========================================
+ Hits         3931     4046     +115     
- Misses        344      360      +16     
- Partials      196      204       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JoOkuma
Copy link
Member

JoOkuma commented Feb 27, 2026

@yfukai sorry for the delay.
I'll review this once #260 is merged. Everything looks good so far.

Copy link
Member

@JoOkuma JoOkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR @yfukai,

I appreciate all the tests you're adding.
I left a few minor comments.

Comment on lines +594 to +597
ordered_keys = [key for key in preferred_order if key in schemas]
ordered_keys.extend(key for key in table_class.__table__.columns.keys() if key not in ordered_keys)
ordered_keys.extend(key for key in schemas if key not in ordered_keys)
return {key: schemas[key] for key in ordered_keys}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this implementation? This way, we can avoid checking if it's in ordered_keys, which is a list, and it takes a bit longer to check if the column is there.

Suggested change
ordered_keys = [key for key in preferred_order if key in schemas]
ordered_keys.extend(key for key in table_class.__table__.columns.keys() if key not in ordered_keys)
ordered_keys.extend(key for key in schemas if key not in ordered_keys)
return {key: schemas[key] for key in ordered_keys}
result = {}
# return dictionary in preferred order
for source in (
preferred_order,
table_class.__table__.columns.keys(),
schemas,
):
for key in source:
if key in schemas:
result.setdefault(key, schemas[key])
return result

Comment on lines +1370 to +1371
else:
nodes_df = nodes_df.select([pl.col(c) for c in self._node_attr_schemas() if c in nodes_df.columns])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfukai , it's not clear to me why this is necessary.
Is it because it might have private attributes?
Could you include a comment on this and the equivalent edge_attr_code?

Comment on lines +1735 to +1751
edge_schemas = self.__edge_attr_schemas
# Process arguments and create validated schema
schema = process_attr_key_args(key_or_schema, dtype, default_value, self.__edge_attr_schemas)

# Store schema
self.__edge_attr_schemas[schema.key] = schema
schema = process_attr_key_args(key_or_schema, dtype, default_value, edge_schemas)

# Add column to database
self._add_new_column(self.Edge, schema)
edge_schemas[schema.key] = schema
self.__edge_attr_schemas = edge_schemas

def remove_edge_attr_key(self, key: str) -> None:
if key not in self.edge_attr_keys():
raise ValueError(f"Edge attribute key {key} does not exist")

edge_schemas = self.__edge_attr_schemas
self._drop_column(self.Edge, key)
self.__edge_attr_schemas.pop(key, None)
edge_schemas.pop(key, None)
self.__edge_attr_schemas = edge_schemas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit lost at first here, but this seems required because self.__edge_attr_schemas has a setter and getter.

Do you think it would be worth being more explicit than my original implementation with @property and .setter? We could address this in another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants