Skip to content

[Feature] Strip ANSI codes from run logs and store them as plain text instead of bytes #2876

Merged
peterschmidt85 merged 9 commits intomasterfrom
2841-ansistrip
Jul 11, 2025
Merged

[Feature] Strip ANSI codes from run logs and store them as plain text instead of bytes #2876
peterschmidt85 merged 9 commits intomasterfrom
2841-ansistrip

Conversation

@peterschmidt85
Copy link
Copy Markdown
Contributor

@peterschmidt85 peterschmidt85 commented Jul 4, 2025

Fixes #2841

Out of the scope:

  • Showing timestamps, descending support with UI, --start in UI, etc

^ This will be done in a separate PR

@peterschmidt85 peterschmidt85 linked an issue Jul 4, 2025 that may be closed by this pull request
11 tasks
@peterschmidt85 peterschmidt85 requested review from r4victor and un-def July 6, 2025 10:31
@peterschmidt85 peterschmidt85 marked this pull request as ready for review July 6, 2025 10:31
"time"

"github.com/creack/pty"
"github.com/dstackai/ansistrip"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefit having this in a separate repo vs adding to dstack?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansistrip docs should mention what it does with non-ascii, non-utf-8 sequences.

select {
case <-ctx.Done():
log.Error(ctx, "Job canceled")
stripper.Close()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing any benefit calling stripper.Close() from multiple places. Just close it with defer like other cleanups?

}()

logger := io.MultiWriter(runnerLogFile, os.Stdout, ex.runnerLogs)
stripper := ansistrip.NewWriter(ex.runnerLogs, AnsiStripFlushInterval, AnsiStripMaxDelay)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we actually need stripper for runner logs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use utf-8, we would have to use base64. That's why it's much easier to reuse the same code for both job and runner logs.

return {
"timestamp": runner_log_event.timestamp,
"message": b64encode_raw_message(runner_log_event.message),
"message": runner_log_event.message.decode(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.decode() will raise an exception if bytes are invalid utf-8. If it's handled by the runner, please leave a comment since it's very non-obvious when the runner itself returns base64 encoded bytes instead of strings.

Comment on lines -227 to +226
yield base64.b64decode(log.message)
yield log.message.encode()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to return bytes anymore – the HTTP API change is backward incompatible anyway since it no longer returns base64-encoded bytes, so we may break the Python API just as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest that we change this in a separate PR where I want to refactor the method and have two separate methods for attached and detached logs.

@peterschmidt85
Copy link
Copy Markdown
Contributor Author

@r4victor Just to confirm, this PR breaks backward compatibility. After upgrading, old logs won't be accessible.

… instead of bytes #2841

- Update `ansistrip` dependency to v0.0.6 supporting `MaxBufferSize`.
- Updated `RunExecutor.Run` in `executor.go` ( code cleanup)
… instead of bytes #2841

- Updated UI not to use base64
@peterschmidt85 peterschmidt85 requested a review from olgenn July 8, 2025 19:16
@r4victor
Copy link
Copy Markdown
Collaborator

@r4victor Just to confirm, this PR breaks backward compatibility. After upgrading, old logs won't be accessible.

What happens to the old logs exactly? I don't see the code filtering them out, so users will see old logs in base64 in the UI / CLI?

… instead of bytes #2841

Fix possible OOM when reading large log files
… instead of bytes #2841

Fix log decoding issues (in case of binary output)
@peterschmidt85 peterschmidt85 requested a review from jvstme July 11, 2025 11:10
@peterschmidt85 peterschmidt85 merged commit 3c9a9d5 into master Jul 11, 2025
26 checks passed
@peterschmidt85 peterschmidt85 deleted the 2841-ansistrip branch July 11, 2025 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Browsable, queryable, and searchable logs

4 participants