Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
199 changes: 199 additions & 0 deletions docs/source/train/options.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,205 @@
Options Reference
=================

Options let to customize how a TrainJob is created and executed. Pass them as a list to the '''options''' parameter of the
:py:meth:`kubeflow.trainer.TrainerClient.train` method.
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a blank line between the paragraph ending with the :py:meth: role and the .. code-block:: directive; without it Sphinx can emit an "Explicit markup" warning and render incorrectly.

Suggested change
:py:meth:`kubeflow.trainer.TrainerClient.train` method.
:py:meth:`kubeflow.trainer.TrainerClient.train` method.

Copilot uses AI. Check for mistakes.
.. code-block:: python

from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import Name, Labels, Annotations

trainer_client = TrainerClient()
job_id = trainer_client.train(
trainer=CustomTrainer(func=train_function),
options=[
Name("my-train-job"),
Labels({"team": "ml", "env": "prod"}),
Annotations({"note": "experiment-42"}),
],
)

.. note::
Not all options work with every backend. Each option documents
which backends it supports. An unsupported option will raise a
`ValueError` at runtime.
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError is currently written with single backticks; in this doc it should be marked as code or a Python class reference (e.g., ValueError or :class:ValueError) to avoid mis-rendering.

Suggested change
`ValueError` at runtime.
``ValueError`` at runtime.

Copilot uses AI. Check for mistakes.

----

Usage Guide
-----------

Name
----

Set a custom name for the TrainJob resource. Works with all backends.

.. code-block:: python
from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import Name

trainer_client = TrainerClient()

job_id = trainer_client.train(
trainer=CustomTrainer(func=train_function),
options=[Name("my-custom-job")],
)

Labels
------

Add labels to the TrainJob resource metadata (``metadata.labels``). Only supported on the **Kubernetes backend**.

.. code-block:: python

from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import Labels

trainer_client = TrainerClient()

job_id = trainer_client.train(
trainer=CustomTrainer(func=train_function),
options=[Labels({"team": "ml-platform", "version": "v2"})],
)

Annotations
-----------

Add annotations to the TrainJob resource metadata(``metadata.annotations``). Only supported on the Kubernetes backend.
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Annotations description is missing a space before the inline literal ("metadata(metadata.annotations)"); add the space so the text renders/readably parses as "metadata (metadata.annotations)".

Suggested change
Add annotations to the TrainJob resource metadata(``metadata.annotations``). Only supported on the Kubernetes backend.
Add annotations to the TrainJob resource metadata (``metadata.annotations``). Only supported on the Kubernetes backend.

Copilot uses AI. Check for mistakes.

.. code-block:: python

from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import Annotations

trainer_client = TrainerClient()

job_id = trainer_client.train(
trainer=CustomTrainer(func=train_function),
options=[Annotations({"owner": "alice", "ticket": "JIRA-42"})],
)

TrainerCommand
--------------

Override the trainer container command (``spec.trainer.command``).
Can Only be used with ''CustomTrainerContainer'' not with ''CustomTrainer''' or ''BuiltinTrainer''.
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TrainerCommand restrictions line uses incorrect reST markup (''...'' and mismatched quotes) and inconsistent capitalization ("Can Only"); use consistent inline-literal/class markup (e.g., CustomTrainerContainer, CustomTrainer, BuiltinTrainer) and "can only".

Suggested change
Can Only be used with ''CustomTrainerContainer'' not with ''CustomTrainer''' or ''BuiltinTrainer''.
Can only be used with ``CustomTrainerContainer`` and not with ``CustomTrainer`` or ``BuiltinTrainer``.

Copilot uses AI. Check for mistakes.

.. code-block:: python

from kubeflow.trainer import TrainerClient, CustomTrainerContainer
from kubeflow.trainer.options import TrainerCommand

trainer_client = TrainerClient()

job_id = trainer_client.train(
trainer=CustomTrainerContainer(image="my-image:latest"),
options=[TrainerCommand(["python", "train.py", "--epochs", "10"])],
)

TrainerArgs
-----------

Append extra arguments to the trainer container command.

.. code-block:: python

from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import TrainerArgs

trainer_client = TrainerClient()

job_id = trainer_client.train(
trainer=CustomTrainer(func=train_function),
Comment on lines +102 to +112
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TrainerArgs section is inaccurate: the option overrides .spec.trainer.args and is only valid with CustomTrainerContainer on the Kubernetes backend (per the implementation), but the text implies it appends args generically.

Suggested change
Append extra arguments to the trainer container command.
.. code-block:: python
from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import TrainerArgs
trainer_client = TrainerClient()
job_id = trainer_client.train(
trainer=CustomTrainer(func=train_function),
Override the trainer container arguments (``spec.trainer.args``). Only supported on the **Kubernetes backend** with ``CustomTrainerContainer``.
.. code-block:: python
from kubeflow.trainer import TrainerClient, CustomTrainerContainer
from kubeflow.trainer.options import TrainerArgs
trainer_client = TrainerClient()
job_id = trainer_client.train(
trainer=CustomTrainerContainer(image="my-image:latest"),

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +112
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TrainerArgs example uses CustomTrainer (func-based), but TrainerArgs raises ValueError unless the trainer is a CustomTrainerContainer; update the example to use CustomTrainerContainer (and show args consistent with that).

Suggested change
from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import TrainerArgs
trainer_client = TrainerClient()
job_id = trainer_client.train(
trainer=CustomTrainer(func=train_function),
from kubeflow.trainer import TrainerClient, CustomTrainerContainer
from kubeflow.trainer.options import TrainerArgs
trainer_client = TrainerClient()
job_id = trainer_client.train(
trainer=CustomTrainerContainer(image="my-image:latest"),

Copilot uses AI. Check for mistakes.
options=[TrainerArgs(["--lr", "0.001", "--batch-size", "32"])],
)

RuntimePatch
--------------

Apply structured patches to the TrainJob (``spec.runtimePatches``) Use this for advanced Kubernetes-level customisation such as adding init containers, volumes, or tolerations. Only supported on the ''Kubernetes backend''.
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RuntimePatch description runs two sentences together and uses inconsistent quoting (''Kubernetes backend''); add a period after the spec.runtimePatches clause and use consistent markup (e.g., Kubernetes backend or KubernetesBackend).

Suggested change
Apply structured patches to the TrainJob (``spec.runtimePatches``) Use this for advanced Kubernetes-level customisation such as adding init containers, volumes, or tolerations. Only supported on the ''Kubernetes backend''.
Apply structured patches to the TrainJob (``spec.runtimePatches``). Use this for advanced Kubernetes-level customisation such as adding init containers, volumes, or tolerations. Only supported on the **Kubernetes backend**.

Copilot uses AI. Check for mistakes.

.. code-block:: python

from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import (
RuntimePatch,
TrainingRuntimeSpecPatch,
JobSetTemplatePatch,
JobSetSpecPatch,
ReplicatedJobPatch,
JobTemplatePatch,
JobSpecPatch,
PodTemplatePatch,
PodSpecPatch,
ContainerPatch,
)

trainer_client = TrainerClient()

patch = RuntimePatch(
Comment on lines +81 to +139
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “TrainerCommand” section text says it overrides .spec.trainer.command and only works with CustomTrainerContainer, but the example below shows a RuntimePatch applied to a CustomTrainer; either update the example to demonstrate TrainerCommand with CustomTrainerContainer or move/rename this section to document RuntimePatch instead.

Copilot uses AI. Check for mistakes.
training_runtime_spec=TrainingRuntimeSpecPatch(
template=JobSetTemplatePatch(
spec=JobSetSpecPatch(
replicated_jobs=[
ReplicatedJobPatch(
name="node",
template=JobTemplatePatch(
spec=JobSpecPatch(
template=PodTemplatePatch(
spec=PodSpecPatch(
containers=[
ContainerPatch(
name="trainer",
env=[{
"name": "MY_VAR",
"value": "hello",
}],
)
]
)
)
)
),
)
]
)
)
)
)

job_id = trainer_client.train(
trainer=CustomTrainer(func=train_function),
options=[patch],
)

----

Combining Multiple Options
--------------------------

You can pass multiple options together in a single list:

.. code-block:: python

from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import Name, Labels, Annotations

trainer_client = TrainerClient()

job_id = trainer_client.train(
trainer=CustomTrainer(func=train_function),
options=[
Name("experiment-001"),
Labels({"project": "llm-finetune"}),
Annotations({"author": "alice"}),
],
)

----

API Reference
-------------

.. autoclass:: kubeflow.trainer.options.Name
:members:
:show-inheritance:
Expand Down
Loading