chore(docs): update AGENTS.md with OptimizerClient API reference and new SDK features#442
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow SDK! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
There was a problem hiding this comment.
Pull request overview
Adds a “Client API Reference” section to AGENTS.md to document how contributors/agents should use the SDK’s main client entry points (Trainer, Optimizer, Model Registry) and their key methods.
Changes:
- Documented
TrainerClientbackend options and key methods, plus a quick-start snippet. - Added an
OptimizerClientmethod reference and a quick-start snippet for Katib-based HPO. - Added a
ModelRegistryClientmethod reference and initialization/usage examples.
AGENTS.md
Outdated
| | `get_runtime_packages(runtime)` | Get packages available in a runtime. | | ||
| | `train(runtime, initializer, trainer, options)` | Create a TrainJob. `trainer` accepts `CustomTrainer`, `CustomTrainerContainer`, or `BuiltinTrainer`. Defaults to `torch-distributed` runtime. Returns job name. | | ||
| | `get_job(name)` | Get a TrainJob object by name. | | ||
| | `get_job_logs(name, step, follow)` | Get logs from a specific step. `step` defaults to `NODE-0`. Use `follow=True` for realtime streaming. | |
There was a problem hiding this comment.
TrainerClient.get_job_logs() defaults step to node-0 (lowercase) via constants.NODE + "-0", so documenting the default as NODE-0 will cause copy/paste usage to fail.
| | `get_job_logs(name, step, follow)` | Get logs from a specific step. `step` defaults to `NODE-0`. Use `follow=True` for realtime streaming. | | |
| | `get_job_logs(name, step, follow)` | Get logs from a specific step. `step` defaults to `node-0`. Use `follow=True` for realtime streaming. | |
AGENTS.md
Outdated
| **Quick example:** | ||
| ```python | ||
| from kubeflow.trainer import TrainerClient | ||
| from kubeflow.trainer.types import CustomTrainer |
There was a problem hiding this comment.
The quick-start imports CustomTrainer from kubeflow.trainer.types, but that package doesn't export CustomTrainer (it's re-exported from kubeflow.trainer), so this snippet will raise ImportError as written.
| from kubeflow.trainer.types import CustomTrainer | |
| from kubeflow.trainer import CustomTrainer |
AGENTS.md
Outdated
| import torch | ||
| # your training code here | ||
|
|
||
| job_name = client.train(trainer=CustomTrainer(func=train_fn, num_workers=2)) |
There was a problem hiding this comment.
CustomTrainer does not accept num_workers; the public dataclass field is num_nodes, so the example call to CustomTrainer(...) will fail unless updated.
| job_name = client.train(trainer=CustomTrainer(func=train_fn, num_workers=2)) | |
| job_name = client.train(trainer=CustomTrainer(func=train_fn, num_nodes=2)) |
AGENTS.md
Outdated
| from kubeflow.optimizer import OptimizerClient | ||
| from kubeflow.optimizer.types import Objective, Search | ||
|
|
||
| client = OptimizerClient() | ||
|
|
There was a problem hiding this comment.
The Optimizer quick-start imports Objective and Search from kubeflow.optimizer.types, but kubeflow/optimizer/types/__init__.py doesn't export those symbols; they are re-exported from kubeflow.optimizer (or import from the specific type modules).
| from kubeflow.optimizer import OptimizerClient | |
| from kubeflow.optimizer.types import Objective, Search | |
| client = OptimizerClient() | |
| from kubeflow.optimizer import OptimizerClient, Objective, Search | |
| client = OptimizerClient() |
77f1c51 to
a8be545
Compare
… features Closes kubeflow#189 - Add TrainerClient method table with correct train() signature (runtime/initializer/trainer/options) - Document get_job_logs() step parameter defaulting to NODE-0 - Document wait_for_job_status() with correct timeout defaults (600s Trainer, 3600s Optimizer) - Add OptimizerClient method table with all 8 public methods - Add ModelRegistryClient section with all 9 methods (register, update, get, list) - Add quick-start examples for all three clients Signed-off-by: kanagaabishek <abishek2981@gmail.com>
a8be545 to
741fd31
Compare
|
Addressed all Copilot review feedback:
|
What this PR does / why we need it
AGENTS.md listed the
optimizer/andhub/directories in the repository map but providedno documentation on how to actually use OptimizerClient or ModelRegistryClient. Contributors
and AI agents had no reference for method signatures, parameters, or usage patterns for these
clients. This PR closes that gap by adding a Client API Reference section.
Added:
train()signature (runtime,initializer,trainer,options)get_job_logs()stepparameter defaulting toNODE-0wait_for_job_status()correct timeout defaults (600s Trainer, 3600s Optimizer)Which issue(s) this PR fixes
Fixes #189
Checklist