Update explain to Include Singular, CROSS and User Collections#501
Update explain to Include Singular, CROSS and User Collections#501
explain to Include Singular, CROSS and User Collections#501Conversation
explain to Include Singular/CROSS and User Collections
explain to Include Singular/CROSS and User Collectionsexplain to Include Singular, CROSS and User Collections
knassre-bodo
left a comment
There was a problem hiding this comment.
A few things should be added in my opinion, but so far looks great @hadia206 :)
pydough/exploration/explain.py
Outdated
| if root is not None: | ||
| qualified_node = qualify_node(node, session) | ||
| else: | ||
| # If the root is None, it means that the node was an expression | ||
| # without information about its context. | ||
| lines.append( | ||
| f"Cannot call pydough.explain on {display_raw(node)}.\n" | ||
| "Did you mean to use pydough.explain_term?" | ||
| ) | ||
| # No root in the tree (e.g. UnqualifiedGeneratedCollection, or a | ||
| # bare expression like LOWER(first_name + last_name)). Try to | ||
| # qualify anyway for generated collections. If it still fails, | ||
| # raise an exception. | ||
| try: | ||
| qualified_node = qualify_node(node, session) | ||
| except Exception: | ||
| lines.append( | ||
| f"Cannot call pydough.explain on {display_raw(node)}.\n" | ||
| "Did you mean to use pydough.explain_term?" | ||
| ) | ||
| return "\n".join(lines) |
There was a problem hiding this comment.
What if we jus got rid of if root is not None and always did the try-except?
| └─── Singular | ||
|
|
||
| This node applies the SINGULAR operator, asserting that the preceding collection is singular (1-to-1) with respect to the parent context. | ||
| Collection made singular: TPCH.nations.region |
There was a problem hiding this comment.
If possible, can we say which sub-collection is made singular with regards to the parent?
E.g. a test like this (for explain_term, using unqualified_term_exploration_test_data)
richest_customer = nations.customers.WHERE(RANKING(by=account_balance.DESC(), per="nations") == 1).SINGULAR()
result = regions.CALCULATE(region_name=name, rc_name=richest_customer.name)
Where the tuple of terms is regions vs richest_customer, so the explain would show us how the parent context is TPCH.regions, and the child being declared singular with regards to that context is nations.customers.WHERE(RANKING(by=account_balance.DESC(), per="nations") == 1)
There was a problem hiding this comment.
Added. Let me know if it looks correct. See region_richest_customer_term_impl
| cross_impl, | ||
| """ | ||
| PyDough collection representing the following logic: | ||
| ──┬─ TPCH | ||
| └─┬─ TableCollection[nations] | ||
| └─┬─ TPCH | ||
| └─── TableCollection[regions] |
There was a problem hiding this comment.
Similarly, can do we do an explain_term test where one argument is a collection and the other is CROSS(collection)
pydough/exploration/explain.py
Outdated
| if isinstance(prop, CartesianProductMetadata): | ||
| child_name = prop.child_collection.name | ||
| left_desc = ( | ||
| qualified_node.preceding_context.to_string() | ||
| if qualified_node.preceding_context is not None | ||
| else collection_name | ||
| ) | ||
| lines.append( | ||
| "This node is a CROSS join: every row of the left " | ||
| "collection is paired with every row of the right " | ||
| "collection." | ||
| ) | ||
| lines.append(f"Left (parent): {left_desc}") | ||
| lines.append(f"Right (child): {child_name}") | ||
| lines.append( | ||
| f"Metadata: {collection_name}.{property_name} -> {child_name}. " | ||
| f"Call pydough.explain(graph['{collection_name}']['{property_name}']) " | ||
| "to learn more." | ||
| ) |
There was a problem hiding this comment.
I don't necessarily like this way of handling CartesianProductMetadata; what was wrong with using the same message as other sub-collection types, then relying on pydough.explain on the metadata to explain the distinction between it being a simple join vs general join vs cross join?
There was a problem hiding this comment.
ok. I'll revert that. I started with wrong CROSS test and went rabbit hole on that.
| PyDough collection representing the following logic: | ||
| ──┬─ TPCH | ||
| └─── DataframeCollection(name='df_coll', shape=(1, 1), columns=['id']) | ||
|
|
||
| This node accesses user-generated collection 'df_coll'. | ||
| Columns: id | ||
| Unique columns: id | ||
|
|
||
| The following terms will be included in the result if this collection is executed: | ||
| id | ||
|
|
||
| The collection has access to the following expressions: | ||
| id | ||
|
|
||
| Call pydough.explain_term(collection, term) to learn more about any of these | ||
| expressions or collections that the collection has access to. |
There was a problem hiding this comment.
What if the verbose version also showed the DataFrame in question? E.g. include df.to_string() in the verbose message.
There was a problem hiding this comment.
Good point. Added it.
pydough/exploration/explain.py
Outdated
| "Did you mean to use pydough.explain_term?" | ||
| ) | ||
| # No root in the tree (e.g. UnqualifiedGeneratedCollection, or a | ||
| # bare expression like LOWER(first_name + last_name)). Try to |
There was a problem hiding this comment.
Bare expressions shouldn't use explain, they should use explain_term since things like LOWER(first_name) can only be explained within a context.
There was a problem hiding this comment.
I had to do that because I was hitting a problem with if isinstance(qualified_node, PyDoughExpressionQDAG): being an issue with collection.
I updated the code to handle it differently. See if that's okay now
| Call pydough.explain(collection, verbose=True) for more details. | ||
| """, |
There was a problem hiding this comment.
I just realized another thing we should be testing: explain_term where the term is a function call to a UDF defined in the metadata. Let's add tests for this, and also update term.py to add another special case of ExpressionFunctionCall to account for what this special UDF looks like (which includes explaining what it does, accounting for alias vs macro, regular vs aggregation vs window functions, any type specs) + we can use a description of the function from the metadata (if it exists) if needed.
…update_explain
Extends
pydough.explain()so it can explainExtends
pydough.expalin_term()