chore(trainer): add data and model initializers guide#414
chore(trainer): add data and model initializers guide#4141Ayush-Petwal wants to merge 2 commits intokubeflow:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow SDK! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
There was a problem hiding this comment.
Pull request overview
Adds end-user documentation for Kubeflow Trainer “initializers” (dataset/model prefetch steps) and links it into the training docs and API reference to address #413.
Changes:
- Adds a new “Data and Model Initializers” guide page with examples and container-backend configuration/debugging.
- Extends the Trainer API reference to include initializer classes.
- Links the new guide from the training landing page and the main docs toctree.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| docs/source/train/initializers.rst | New guide explaining how to use dataset/model initializers, config knobs, and log debugging. |
| docs/source/train/index.rst | Adds a grid card linking to the new initializers guide. |
| docs/source/train/api.rst | Adds autoclass entries for initializer-related types in the API reference. |
| docs/source/index.rst | Adds the new initializers page to the main Trainer toctree. |
49f82b1 to
0abd70e
Compare
dbcb539 to
921027e
Compare
Add docs/source/train/initializers.rst covering dataset and model initializers for the container backend (added in kubeflow#188, parallelised in kubeflow#313). Includes per-type code examples, combined usage, ContainerBackendConfig options, and debugging via get_job_logs(). Signed-off-by: Ayush Petwal <ayushpetwal.0105@gmail.com>
- Fix model output path from /workspace/model-weights to /workspace/model to match the MODEL_PATH constant in constants.py - Clarify DataCacheInitializer is Kubernetes-only in the backend note and annotate the Available Initializers table row accordingly - Add DataCacheInitializer usage example with required fields (storage_uri, metadata_loc, num_data_nodes) and backend constraint note Signed-off-by: 1Ayush-Petwal <ayushpetwal.0105@gmail.com>
921027e to
f82ac9b
Compare
|
/ok-to-test |
What this PR does / why we need it:
docs/source/train/initializers.rst— new guide covering all 5 initializertypes (
HuggingFaceDatasetInitializer,S3DatasetInitializer,DataCacheInitializer,HuggingFaceModelInitializer,S3ModelInitializer),combined usage,
ContainerBackendConfigimage/timeout options, and debuggingvia
get_job_logs()autoclassentries for all 6 exported initializer types indocs/source/train/api.rstdocs/source/train/index.rsttrain/initializersto the Trainer toctree indocs/source/index.rstWhich issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #413
Checklist: