Skip to content

Update explain to Include Singular, CROSS and User Collections#501

Open
hadia206 wants to merge 12 commits intomainfrom
Hadia/update_explain
Open

Update explain to Include Singular, CROSS and User Collections#501
hadia206 wants to merge 12 commits intomainfrom
Hadia/update_explain

Conversation

@hadia206
Copy link
Contributor

@hadia206 hadia206 commented Mar 9, 2026

Extends pydough.explain() so it can explain

  • Singular
  • CROSS and
  • user-generated collections (range_collection, dataframe_collection)

Extends pydough.expalin_term()

  • Singular
  • CROSS
  • UDF

@hadia206 hadia206 changed the title Update explain to include Singular/CROSS and user collections Update explain to Include Singular/CROSS and User Collections Mar 9, 2026
@hadia206 hadia206 marked this pull request as ready for review March 10, 2026 01:10
@hadia206 hadia206 requested review from a team, john-sanchez31, juankx-bodo and knassre-bodo and removed request for a team March 10, 2026 01:10
Copy link
Contributor

@john-sanchez31 john-sanchez31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hadia206 hadia206 changed the title Update explain to Include Singular/CROSS and User Collections Update explain to Include Singular, CROSS and User Collections Mar 12, 2026
Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things should be added in my opinion, but so far looks great @hadia206 :)

Comment on lines +282 to +300
if root is not None:
qualified_node = qualify_node(node, session)
else:
# If the root is None, it means that the node was an expression
# without information about its context.
lines.append(
f"Cannot call pydough.explain on {display_raw(node)}.\n"
"Did you mean to use pydough.explain_term?"
)
# No root in the tree (e.g. UnqualifiedGeneratedCollection, or a
# bare expression like LOWER(first_name + last_name)). Try to
# qualify anyway for generated collections. If it still fails,
# raise an exception.
try:
qualified_node = qualify_node(node, session)
except Exception:
lines.append(
f"Cannot call pydough.explain on {display_raw(node)}.\n"
"Did you mean to use pydough.explain_term?"
)
return "\n".join(lines)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we jus got rid of if root is not None and always did the try-except?

└─── Singular

This node applies the SINGULAR operator, asserting that the preceding collection is singular (1-to-1) with respect to the parent context.
Collection made singular: TPCH.nations.region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, can we say which sub-collection is made singular with regards to the parent?

E.g. a test like this (for explain_term, using unqualified_term_exploration_test_data)

richest_customer = nations.customers.WHERE(RANKING(by=account_balance.DESC(), per="nations") == 1).SINGULAR()
result = regions.CALCULATE(region_name=name, rc_name=richest_customer.name)

Where the tuple of terms is regions vs richest_customer, so the explain would show us how the parent context is TPCH.regions, and the child being declared singular with regards to that context is nations.customers.WHERE(RANKING(by=account_balance.DESC(), per="nations") == 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Let me know if it looks correct. See region_richest_customer_term_impl

Comment on lines +1254 to +1260
cross_impl,
"""
PyDough collection representing the following logic:
──┬─ TPCH
└─┬─ TableCollection[nations]
└─┬─ TPCH
└─── TableCollection[regions]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, can do we do an explain_term test where one argument is a collection and the other is CROSS(collection)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +363 to +381
if isinstance(prop, CartesianProductMetadata):
child_name = prop.child_collection.name
left_desc = (
qualified_node.preceding_context.to_string()
if qualified_node.preceding_context is not None
else collection_name
)
lines.append(
"This node is a CROSS join: every row of the left "
"collection is paired with every row of the right "
"collection."
)
lines.append(f"Left (parent): {left_desc}")
lines.append(f"Right (child): {child_name}")
lines.append(
f"Metadata: {collection_name}.{property_name} -> {child_name}. "
f"Call pydough.explain(graph['{collection_name}']['{property_name}']) "
"to learn more."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't necessarily like this way of handling CartesianProductMetadata; what was wrong with using the same message as other sub-collection types, then relying on pydough.explain on the metadata to explain the distinction between it being a simple join vs general join vs cross join?

Copy link
Contributor Author

@hadia206 hadia206 Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I'll revert that. I started with wrong CROSS test and went rabbit hole on that.

Comment on lines +1387 to +1402
PyDough collection representing the following logic:
──┬─ TPCH
└─── DataframeCollection(name='df_coll', shape=(1, 1), columns=['id'])

This node accesses user-generated collection 'df_coll'.
Columns: id
Unique columns: id

The following terms will be included in the result if this collection is executed:
id

The collection has access to the following expressions:
id

Call pydough.explain_term(collection, term) to learn more about any of these
expressions or collections that the collection has access to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the verbose version also showed the DataFrame in question? E.g. include df.to_string() in the verbose message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added it.

"Did you mean to use pydough.explain_term?"
)
# No root in the tree (e.g. UnqualifiedGeneratedCollection, or a
# bare expression like LOWER(first_name + last_name)). Try to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare expressions shouldn't use explain, they should use explain_term since things like LOWER(first_name) can only be explained within a context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to do that because I was hitting a problem with if isinstance(qualified_node, PyDoughExpressionQDAG): being an issue with collection.
I updated the code to handle it differently. See if that's okay now

Comment on lines +1246 to +1247
Call pydough.explain(collection, verbose=True) for more details.
""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized another thing we should be testing: explain_term where the term is a function call to a UDF defined in the metadata. Let's add tests for this, and also update term.py to add another special case of ExpressionFunctionCall to account for what this special UDF looks like (which includes explaining what it does, accounting for alias vs macro, regular vs aggregation vs window functions, any type specs) + we can use a description of the function from the metadata (if it exists) if needed.

@hadia206 hadia206 requested a review from knassre-bodo March 16, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants