bug(trainer): LocalProcess and Container backends raise ValueError instead of RuntimeError for job-not-found

### What happened?

`TrainerClient.get_job()`, `delete_job()`, `get_job_logs()`, and `wait_for_job_status()` document `RuntimeError` for operational failures like job-not-found. The Kubernetes backend follows this contract, but LocalProcess and Container backends raise `ValueError` instead.

This means callers writing `except RuntimeError` to handle missing jobs will not catch the actual exception when using local or container backends.

### Steps to Reproduce

```python
from kubeflow.trainer.backends.localprocess.backend import LocalProcessBackend

class Cfg:
    auto_remove = False
    execution_dir = "/tmp"

backend = LocalProcessBackend(Cfg())

try:
    backend.get_job("nonexistent")
except RuntimeError:
    print("caught")  # never reached
except ValueError as e:
    print(f"BUG: {e}")  # "No TrainJob with name nonexistent"
```

Same happens with `delete_job`, `get_job_logs`, `wait_for_job_status`.

### Affected locations

**LocalProcess backend** (`localprocess/backend.py`):
- `get_job` (line 172): `ValueError("No TrainJob with name {name}")`
- `get_job_logs` (line 201): same
- `wait_for_job_status` (line 227): same
- `delete_job` (line 256): same
- `get_runtime` (line 60): `ValueError("Runtime '{name}' not found.")`

**Container backend** (`container/backend.py`):
- `_get_job_containers` (line 509): `ValueError("No TrainJob with name {name}")`
- `__get_trainjob_from_containers` (lines 669, 674, 678, 687, 690): various `ValueError` for metadata/network/runtime lookup failures

**Kubernetes backend**: correctly raises `RuntimeError` for all of these.

### What did you expect to happen?

Operational failures (job not found, runtime not found) should raise `RuntimeError` consistently across all backends, matching the documented API contract.

Input validation errors (e.g. "CustomTrainer must be set", "polling_interval >= timeout") should remain `ValueError`.

### Proposed Fix

Change `raise ValueError("No TrainJob with name ...")` to `raise RuntimeError(...)` in the affected methods across both backends. Keep `ValueError` for actual input validation.

/kind bug
/area local

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(trainer): LocalProcess and Container backends raise ValueError instead of RuntimeError for job-not-found #433

What happened?

Steps to Reproduce

Affected locations

What did you expect to happen?

Proposed Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug(trainer): LocalProcess and Container backends raise ValueError instead of RuntimeError for job-not-found #433

Description

What happened?

Steps to Reproduce

Affected locations

What did you expect to happen?

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions