Skip to content

chore(trainer): add data and model initializers guide#414

Open
1Ayush-Petwal wants to merge 2 commits intokubeflow:mainfrom
1Ayush-Petwal:docs/initializers-guide
Open

chore(trainer): add data and model initializers guide#414
1Ayush-Petwal wants to merge 2 commits intokubeflow:mainfrom
1Ayush-Petwal:docs/initializers-guide

Conversation

@1Ayush-Petwal
Copy link
Copy Markdown

What this PR does / why we need it:

  • Add docs/source/train/initializers.rst — new guide covering all 5 initializer
    types (HuggingFaceDatasetInitializer, S3DatasetInitializer,
    DataCacheInitializer, HuggingFaceModelInitializer, S3ModelInitializer),
    combined usage, ContainerBackendConfig image/timeout options, and debugging
    via get_job_logs()
  • Add autoclass entries for all 6 exported initializer types in docs/source/train/api.rst
  • Add grid card linking to the new page in docs/source/train/index.rst
  • Add train/initializers to the Trainer toctree in docs/source/index.rst

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Fixes #413

Checklist:

  • Docs included if any changes are user facing

Copilot AI review requested due to automatic review settings March 21, 2026 22:21
@google-oss-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions
Copy link
Copy Markdown
Contributor

🎉 Welcome to the Kubeflow SDK! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

  • If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
  • Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-user documentation for Kubeflow Trainer “initializers” (dataset/model prefetch steps) and links it into the training docs and API reference to address #413.

Changes:

  • Adds a new “Data and Model Initializers” guide page with examples and container-backend configuration/debugging.
  • Extends the Trainer API reference to include initializer classes.
  • Links the new guide from the training landing page and the main docs toctree.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
docs/source/train/initializers.rst New guide explaining how to use dataset/model initializers, config knobs, and log debugging.
docs/source/train/index.rst Adds a grid card linking to the new initializers guide.
docs/source/train/api.rst Adds autoclass entries for initializer-related types in the API reference.
docs/source/index.rst Adds the new initializers page to the main Trainer toctree.

@1Ayush-Petwal 1Ayush-Petwal force-pushed the docs/initializers-guide branch from 49f82b1 to 0abd70e Compare March 21, 2026 22:26
@1Ayush-Petwal 1Ayush-Petwal changed the title docs(trainer): add data and model initializers guide chore(trainer): add data and model initializers guide Mar 21, 2026
@1Ayush-Petwal 1Ayush-Petwal force-pushed the docs/initializers-guide branch from dbcb539 to 921027e Compare March 23, 2026 17:01
Add docs/source/train/initializers.rst covering dataset and model
initializers for the container backend (added in kubeflow#188, parallelised
in kubeflow#313). Includes per-type code examples, combined usage, ContainerBackendConfig
options, and debugging via get_job_logs().

Signed-off-by: Ayush Petwal <ayushpetwal.0105@gmail.com>
- Fix model output path from /workspace/model-weights to /workspace/model
  to match the MODEL_PATH constant in constants.py
- Clarify DataCacheInitializer is Kubernetes-only in the backend note
  and annotate the Available Initializers table row accordingly
- Add DataCacheInitializer usage example with required fields
  (storage_uri, metadata_loc, num_data_nodes) and backend constraint note

Signed-off-by: 1Ayush-Petwal <ayushpetwal.0105@gmail.com>
@1Ayush-Petwal 1Ayush-Petwal force-pushed the docs/initializers-guide branch from 921027e to f82ac9b Compare March 28, 2026 22:12
@Fiona-Waters
Copy link
Copy Markdown
Contributor

/ok-to-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document data and model initializers

3 participants