Skip to content

Recover logs from crashed and terminated pods#179

Open
sebasnallar wants to merge 1 commit into
betafrom
feat/dead-pod-logs
Open

Recover logs from crashed and terminated pods#179
sebasnallar wants to merge 1 commit into
betafrom
feat/dead-pod-logs

Conversation

@sebasnallar

Copy link
Copy Markdown
Contributor

Two layered fixes for the case where a pod's container has died but the log workflow returns nothing:

  1. kube-logger-go now fetches the previous container instance's logs (PodLogOptions.Previous) on the first page for any pod whose application container has restarted or has a terminated last state. The kubelet retains exactly one prior instance per container, so this surfaces the crash output that kubectl logs alone hides while the replacement container is starting. Previous-container fetching is gated on the absence of a per-pod pagination cursor, so paginated requests stay unchanged: no extra round-trips, no duplicated lines (the existing timestamp-based dedup in processor handles overlap), and no token format change.

  2. Deployment template flips terminationMessagePolicy from File to FallbackToLogsOnError on the application and traffic containers. When the file at /dev/termination-log is empty and the container exits non-zero, the kubelet auto-populates status.containerStatuses[].lastState.terminated.message with the tail (~80 kB) of stdout/stderr, which survives container restarts inside the Pod object itself. Apps that already write to the termination file are unaffected.

Together these cover CrashLoopBackOff and recently-terminated containers without any new in-cluster tooling. Pods that have been fully garbage-collected still require an external log aggregator.

Two layered fixes for the case where a pod's container has died but the
log workflow returns nothing:

1. kube-logger-go now fetches the previous container instance's logs
   (PodLogOptions.Previous) on the first page for any pod whose
   application container has restarted or has a terminated last state.
   The kubelet retains exactly one prior instance per container, so this
   surfaces the crash output that `kubectl logs` alone hides while the
   replacement container is starting. Previous-container fetching is
   gated on the absence of a per-pod pagination cursor, so paginated
   requests stay unchanged: no extra round-trips, no duplicated lines
   (the existing timestamp-based dedup in processor handles overlap),
   and no token format change.

2. Deployment template flips terminationMessagePolicy from File to
   FallbackToLogsOnError on the application and traffic containers.
   When the file at /dev/termination-log is empty and the container
   exits non-zero, the kubelet auto-populates
   status.containerStatuses[].lastState.terminated.message with the
   tail (~80 kB) of stdout/stderr, which survives container restarts
   inside the Pod object itself. Apps that already write to the
   termination file are unaffected.

Together these cover CrashLoopBackOff and recently-terminated
containers without any new in-cluster tooling. Pods that have been
fully garbage-collected still require an external log aggregator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant