kubernetes: surface log-unavailable reason and observability link in placeholder#280
Closed
morgan-wowk wants to merge 1 commit into
Closed
Conversation
Collaborator
Author
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
d97adab to
40d9fef
Compare
…placeholder
When log acquisition fails, replace the empty return value with a human-
readable message that includes the pod name, namespace, and — when
TANGLE_LOG_SEARCH_URL_TEMPLATE is set — a direct link to the pod's logs in
the configured observability platform.
The URL template supports two placeholders substituted at runtime:
{pod_name} — Kubernetes pod name
{start_time} — relative start derived from started_at (e.g. "now-125m",
adding 5 min of padding); falls back to "now-1440m" (24 h)
if the start time is not available in memory.
Both started_at values (LaunchedKubernetesContainer from pod container state,
LaunchedKubernetesJob from job status) are in-memory reads — no additional
database queries are required to compute the time range.
The placeholder is stored in GCS via upload_log and returned verbatim by the
log-read API, so it surfaces wherever logs are displayed without any frontend
or schema changes.
40d9fef to
e5a78f1
Compare
Collaborator
Author
|
We chose a different solution as indicated on the issue #281 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Stacked on #279.
When log acquisition fails (broken UTF-8, truncated response, broken JSON from the Kubernetes API), instead of returning an empty string the launcher stores a human-readable placeholder containing the pod name, namespace, and — when
TANGLE_LOG_SEARCH_URL_TEMPLATEis configured — a direct link to the pod's logs in the deployment's observability platform.The placeholder is stored in GCS via
upload_logand returned verbatim by the log-read API, so it surfaces wherever logs are displayed without any frontend or schema changes.Time range — absolute ISO 8601 timestamps
The link uses a fixed, absolute time window so it stays accurate regardless of when the link is clicked (a relative
now-Xmwould drift and become useless after retention).Both
started_atandended_atare already in memory — no DB queries:started_atsourceended_atsourceLaunchedKubernetesContainerself._debug_pod.statuscontainer stateself._debug_pod.statusterminated stateLaunchedKubernetesJobself._debug_job.status.start_timelast_transition_timeThe window is
started_at − 5 min→ended_at + 5 min, matching the padding used by the tangle-ui overlay schema. Falls back tonow − 24 h→nowwhen timestamps are unavailable (pod still pending, or status not yet populated).OSS design
TANGLE_LOG_SEARCH_URL_TEMPLATEis a generic env var with threestr.replaceplaceholders:{pod_name}{start_iso8601_ms}started_at − 5 minas2026-06-17T20:24:11.000Z, ornow − 24 has fallback{end_iso8601_ms}ended_at + 5 minas2026-06-17T22:36:44.000Z, ornowas fallbackNo observability-platform-specific naming or logic in the OSS code. Deployments that set the env var get a direct link; deployments that omit it get the pod name and namespace only.
Example output
Without
TANGLE_LOG_SEARCH_URL_TEMPLATEset (any OSS deployment):With
TANGLE_LOG_SEARCH_URL_TEMPLATEset (e.g. Shopify's Observe deployment):The link opens Observe pre-filtered to that pod, on a 2h 17m window (the job's actual runtime ± 5 min padding).
Deployment config
The Observe URL template is set for production and staging in
infrastructure/applications/oasis-backend/{production,staging}/app.yaml— see Shopify/infrastructure#52749.