Skip to content

fix(trainer): LocalJob.logs removes print side-effect and enables real-time streaming#436

Open
G26karthik wants to merge 1 commit intokubeflow:mainfrom
G26karthik:fix/local-job-logs-streaming
Open

fix(trainer): LocalJob.logs removes print side-effect and enables real-time streaming#436
G26karthik wants to merge 1 commit intokubeflow:mainfrom
G26karthik:fix/local-job-logs-streaming

Conversation

@G26karthik
Copy link
Copy Markdown

What this PR does

Fixes #435

LocalJob.logs(follow=True) had two bugs:

  1. Hidden print() side-effect — it called print(chunk, end="", flush=True) internally, so any caller that iterated the result would see duplicate output (once printed directly to stdout, once from the returned list).
  2. Not actually streaming — it blocked until the job finished (by exhausting stream_logs()), then returned all lines as a single batch. Despite being named follow=True, no lines were yielded until the process completed.

Changes

kubeflow/trainer/backends/localprocess/job.py

  • Added from collections.abc import Iterator import
  • Converted logs() return type from list[str] to Iterator[str]
  • follow=False: yield from self._stdout.splitlines() (snapshot behavior preserved)
  • follow=True: yield from chunk.splitlines() for each chunk from stream_logs() — real-time, no side-effect

The existing backend.py caller (yield from _step.job.logs(follow=follow)) is fully compatible with a generator, so no changes needed there.

kubeflow/trainer/backends/localprocess/backend_test.py

  • Added imports for LocalJob, LocalBackendJobs, LocalBackendStep
  • test_get_job_logs_follow_false_yields_lines: verifies follow=False yields correct individual lines
  • test_get_job_logs_follow_true_no_print_side_effect: uses capsys + patch.object on stream_logs to assert no stdout side-effect and correct line streaming

Testing

make test-python

Signed-off-by: G26Karthik karthik26092005@gmail.com

Copilot AI review requested due to automatic review settings March 30, 2026 03:36
@google-oss-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes LocalJob.logs(follow=True) in the localprocess backend so it behaves like a true streaming iterator and no longer prints directly to stdout, aligning TrainerClient.get_job_logs(..., follow=True) with the documented “iterator of log lines” behavior.

Changes:

  • Remove the hidden print(..., flush=True) side-effect from LocalJob.logs(follow=True).
  • Change LocalJob.logs() to return an Iterator[str] and yield log lines incrementally when follow=True.
  • Add unit tests covering follow=False line yielding and asserting no stdout side-effect for follow=True.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
kubeflow/trainer/backends/localprocess/job.py Refactors LocalJob.logs() to yield log lines (and stream when follow=True) without printing.
kubeflow/trainer/backends/localprocess/backend_test.py Adds tests to validate non-follow behavior and ensure streaming doesn’t print to stdout.

@G26karthik G26karthik force-pushed the fix/local-job-logs-streaming branch 2 times, most recently from aa59baf to 1a19c55 Compare March 30, 2026 03:52
…l-time streaming

When follow=True, logs() was calling print() directly to stdout,
causing duplicate output for any caller that also printed the lines.
Additionally, it blocked until the job finished before returning,
making the follow=True path effectively non-streaming.

Fix: convert logs() to an Iterator[str] generator. The follow=True
path yields lines from stream_logs() chunks in real-time without
any side-effect. The follow=False path yields from
self._stdout.splitlines() as before.

Tests: added two new pytest cases to verify follow=False yields
correct lines and follow=True produces no stdout side-effect.

Fixes kubeflow#435

Signed-off-by: G26Karthik <karthik26092005@gmail.com>
@G26karthik
Copy link
Copy Markdown
Author

Thanks for the review @copilot-pull-request-reviewer — great catch on the snapshot timing issue.

Applied the suggestion: logs(follow=False) now takes an eager snapshot under self._lock and returns iter(snapshot) directly (no longer a generator). The follow=True streaming path is extracted to a private _follow_logs() generator helper.

Also added a dedicated test test_get_job_logs_follow_false_is_eager_snapshot that mutates _stdout after calling get_job_logs() and asserts the returned iterator does not include the new line — directly covering the race condition you described.

@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Mar 30, 2026
@G26karthik
Copy link
Copy Markdown
Author

@andreyvelich
Can you please review this pr and corresponding issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(trainer): LocalJob.logs(follow=True) prints directly to stdout instead of yielding to caller

2 participants