Skip to content

Support NaN values in UpliftTree and UpliftRandomForest#860

Open
aman-coder03 wants to merge 6 commits intouber:masterfrom
aman-coder03:feature/handle-nan-uplift-trees
Open

Support NaN values in UpliftTree and UpliftRandomForest#860
aman-coder03 wants to merge 6 commits intouber:masterfrom
aman-coder03:feature/handle-nan-uplift-trees

Conversation

@aman-coder03
Copy link

This PR adds native support for missing values (NaNs) in UpliftTree and
UpliftRandomForest.

Each candidate split evaluates both possible NaN routing directions
(left/right) and learns the optimal routing per node, similar to
scikit-learn’s decision tree behavior.

The learned NaN routing is stored in each tree node and applied
consistently during training, pruning, filling, and prediction.

This resolves #802.

@CLAassistant
Copy link

CLAassistant commented Dec 29, 2025

CLA assistant check
All committers have signed the CLA.

@aman-coder03
Copy link
Author

The Read the Docs build failure appears to be due to Cython extensions requiring a compiler in the RTD environment.
No documentation-related files were modified in this PR.
Happy to adjust the RTD config if you’d like me to handle this

Copy link
Collaborator

@jeongyoonlee jeongyoonlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @aman-coder03. I left a few comments. Can you address them? Also, please add test code to tests/test_uplift_trees.py accordingly. Thanks!

is_split_by_gt = False

for value in lsUnique:
len_X_l = group_counts_by_divide(columnValues, value, is_split_by_gt, treatment_idx, y, left_count_arr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this code block. Is there any reason you removed it?

n_reg
)

early_stopping_flag = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this code block for early stopping.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN handling is needed here in the percentile calculation.

@aman-coder03
Copy link
Author

@jeongyoonlee i will look into this!

@aman-coder03 aman-coder03 force-pushed the feature/handle-nan-uplift-trees branch from e13b20b to bd23f63 Compare February 3, 2026 15:03
@aman-coder03
Copy link
Author

hey @jeongyoonlee i have implemented the changes, please have a look!

@jeongyoonlee
Copy link
Collaborator

@aman-coder03, I see some code blocks necessary (e.g., code for the CTS evaluation criteria) are removed without any comments. Also, some tests are failing.

@aman-coder03
Copy link
Author

Thanks for pointing that out. I will restore the original CTS and evaluation blocks and limit changes strictly to NaN routing logic to preserve original behavior @jeongyoonlee

@aman-coder03
Copy link
Author

@jeongyoonlee can you please have a look at the code changes now

@aman-coder03 aman-coder03 force-pushed the feature/handle-nan-uplift-trees branch from 97c291e to 4163bf5 Compare February 11, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for Inputs with NaN in Uplift Trees/Uplift RandomForest

3 participants