chore(examples): Use QuantileDMatrix for histogram tree method in XGBoost example#3376
chore(examples): Use QuantileDMatrix for histogram tree method in XGBoost example#3376sunalawa wants to merge 1 commit intokubeflow:masterfrom
Conversation
…oost example Replace DMatrix with QuantileDMatrix in distributed XGBoost training example when using histogram tree method. This reduces memory usage and aligns with XGBoost best practices for distributed workloads. Fixes kubeflow#3300 Signed-off-by: Suyash Nalawade <sunalawa@redhat.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow Trainer! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
There was a problem hiding this comment.
Pull request overview
Updates the distributed XGBoost training example notebook to follow XGBoost best practices for histogram-based training by switching to QuantileDMatrix, reducing memory overhead in distributed runs.
Changes:
- Replace
xgb.DMatrixwithxgb.QuantileDMatrixfor training and validation datasets (withref=dtrainfor validation). - Explicitly set
tree_methodto"hist"to matchQuantileDMatrix’s intended usage.
Local Testing:1. make test2. make test-integration3. make test-e2e |
Replace DMatrix with QuantileDMatrix in distributed XGBoost training example when using histogram tree method. This reduces memory usage and aligns with XGBoost best practices for distributed workloads.
What this PR does / why we need it:
Updates the distributed XGBoost example to use QuantileDMatrix instead of DMatrix.
This reduces memory usage and follows XGBoost best practices for distributed training workloads.
Which issue(s) this PR fixes:
Fixes #3300
Checklist: