chore(docs): update AGENTS.md with OptimizerClient API reference and new SDK features by kanagaabishek · Pull Request #442 · kubeflow/sdk

kanagaabishek · 2026-03-31T06:18:30Z

What this PR does / why we need it

AGENTS.md listed the optimizer/ and hub/ directories in the repository map but provided
no documentation on how to actually use OptimizerClient or ModelRegistryClient. Contributors
and AI agents had no reference for method signatures, parameters, or usage patterns for these
clients. This PR closes that gap by adding a Client API Reference section.

Added:

TrainerClient method table with correct train() signature (runtime, initializer, trainer, options)
Documents get_job_logs() step parameter defaulting to NODE-0
Documents wait_for_job_status() correct timeout defaults (600s Trainer, 3600s Optimizer)
OptimizerClient method table covering all 8 public methods with descriptions
ModelRegistryClient section with all 9 methods (register, update ×3, get ×3, list ×2)
Working quick-start examples for all three clients verified against source code

Which issue(s) this PR fixes

Fixes #189

Note: Issue #189 was closed by another PR before this one was opened.
This PR adds sections that were missing from that PR: ModelRegistryClient
(all 9 methods), TrainerClient backend options table, and correct method
signatures verified against source. Happy to scope down or close if
maintainers prefer — just let me know.

Checklist

Docs included if any changes are user facing

google-oss-prow · 2026-03-31T06:18:36Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kramaranya for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2026-03-31T06:18:38Z

🎉 Welcome to the Kubeflow SDK! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Slack: Join our #kubeflow-ml-experience and #kubeflow-trainer Slack channels
Meetings: Attend the Kubeflow SDK and ML Experience bi-weekly meetings

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copilot

Pull request overview

Adds a “Client API Reference” section to AGENTS.md to document how contributors/agents should use the SDK’s main client entry points (Trainer, Optimizer, Model Registry) and their key methods.

Changes:

Documented TrainerClient backend options and key methods, plus a quick-start snippet.
Added an OptimizerClient method reference and a quick-start snippet for Katib-based HPO.
Added a ModelRegistryClient method reference and initialization/usage examples.

Copilot · 2026-03-31T06:22:36Z

AGENTS.md

+| `get_runtime_packages(runtime)` | Get packages available in a runtime. |
+| `train(runtime, initializer, trainer, options)` | Create a TrainJob. `trainer` accepts `CustomTrainer`, `CustomTrainerContainer`, or `BuiltinTrainer`. Defaults to `torch-distributed` runtime. Returns job name. |
+| `get_job(name)` | Get a TrainJob object by name. |
+| `get_job_logs(name, step, follow)` | Get logs from a specific step. `step` defaults to `NODE-0`. Use `follow=True` for realtime streaming. |


TrainerClient.get_job_logs() defaults step to node-0 (lowercase) via constants.NODE + "-0", so documenting the default as NODE-0 will cause copy/paste usage to fail.

Suggested change

| `get_job_logs(name, step, follow)` | Get logs from a specific step. `step` defaults to `NODE-0`. Use `follow=True` for realtime streaming. |

| `get_job_logs(name, step, follow)` | Get logs from a specific step. `step` defaults to `node-0`. Use `follow=True` for realtime streaming. |

Copilot · 2026-03-31T06:22:37Z

AGENTS.md

+**Quick example:**
+```python
+from kubeflow.trainer import TrainerClient
+from kubeflow.trainer.types import CustomTrainer


The quick-start imports CustomTrainer from kubeflow.trainer.types, but that package doesn't export CustomTrainer (it's re-exported from kubeflow.trainer), so this snippet will raise ImportError as written.

Suggested change

from kubeflow.trainer.types import CustomTrainer

from kubeflow.trainer import CustomTrainer

Copilot · 2026-03-31T06:22:37Z

AGENTS.md

+    import torch
+    # your training code here
+
+job_name = client.train(trainer=CustomTrainer(func=train_fn, num_workers=2))


CustomTrainer does not accept num_workers; the public dataclass field is num_nodes, so the example call to CustomTrainer(...) will fail unless updated.

Suggested change

job_name = client.train(trainer=CustomTrainer(func=train_fn, num_workers=2))

job_name = client.train(trainer=CustomTrainer(func=train_fn, num_nodes=2))

Copilot · 2026-03-31T06:22:37Z

AGENTS.md

+from kubeflow.optimizer import OptimizerClient
+from kubeflow.optimizer.types import Objective, Search
+
+client = OptimizerClient()
+


The Optimizer quick-start imports Objective and Search from kubeflow.optimizer.types, but kubeflow/optimizer/types/__init__.py doesn't export those symbols; they are re-exported from kubeflow.optimizer (or import from the specific type modules).

Suggested change

from kubeflow.optimizer import OptimizerClient

from kubeflow.optimizer.types import Objective, Search

client = OptimizerClient()

from kubeflow.optimizer import OptimizerClient, Objective, Search

client = OptimizerClient()

… features Closes kubeflow#189 - Add TrainerClient method table with correct train() signature (runtime/initializer/trainer/options) - Document get_job_logs() step parameter defaulting to NODE-0 - Document wait_for_job_status() with correct timeout defaults (600s Trainer, 3600s Optimizer) - Add OptimizerClient method table with all 8 public methods - Add ModelRegistryClient section with all 9 methods (register, update, get, list) - Add quick-start examples for all three clients Signed-off-by: kanagaabishek <abishek2981@gmail.com>

kanagaabishek · 2026-03-31T07:01:31Z

Addressed all Copilot review feedback:

Fixed step default from NODE-0 to node-0 (matches constants.NODE + "-0")
Fixed CustomTrainer import to kubeflow.trainer (not kubeflow.trainer.types)
Fixed num_workers to num_nodes in CustomTrainer example
Fixed Optimizer imports to single line from kubeflow.optimizer
Added DCO signoff to commit

Copilot AI review requested due to automatic review settings March 31, 2026 06:18

google-oss-prow bot requested review from andreyvelich, astefanutti and szaher March 31, 2026 06:18

google-oss-prow bot added the size/L label Mar 31, 2026

Copilot started reviewing on behalf of kanagaabishek March 31, 2026 06:19 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

kanagaabishek force-pushed the docs/fix-issue-189-optimizer-docs branch from 77f1c51 to a8be545 Compare March 31, 2026 06:26

kanagaabishek changed the title ~~docs: update AGENTS.md with OptimizerClient API reference and new SDK…~~ chore(docs): update AGENTS.md with OptimizerClient API reference and new SDK features Mar 31, 2026

kanagaabishek force-pushed the docs/fix-issue-189-optimizer-docs branch from a8be545 to 741fd31 Compare March 31, 2026 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(docs): update AGENTS.md with OptimizerClient API reference and new SDK features#442

chore(docs): update AGENTS.md with OptimizerClient API reference and new SDK features#442
kanagaabishek wants to merge 1 commit intokubeflow:mainfrom
kanagaabishek:docs/fix-issue-189-optimizer-docs

kanagaabishek commented Mar 31, 2026 •

edited

Loading

Uh oh!

google-oss-prow bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

kanagaabishek commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	\| `get_job_logs(name, step, follow)` \| Get logs from a specific step. `step` defaults to `NODE-0`. Use `follow=True` for realtime streaming. \|
	\| `get_job_logs(name, step, follow)` \| Get logs from a specific step. `step` defaults to `node-0`. Use `follow=True` for realtime streaming. \|

	from kubeflow.trainer.types import CustomTrainer
	from kubeflow.trainer import CustomTrainer

	job_name = client.train(trainer=CustomTrainer(func=train_fn, num_workers=2))
	job_name = client.train(trainer=CustomTrainer(func=train_fn, num_nodes=2))

Conversation

kanagaabishek commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it

Which issue(s) this PR fixes

Checklist

Uh oh!

google-oss-prow bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

kanagaabishek commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kanagaabishek commented Mar 31, 2026 •

edited

Loading