[release-4.21] OCPBUGS-85270: fsync static pod cert and manifest writes for crash durability#2204
Conversation
atomicdir.Sync writes files to a staging directory, atomically swaps it with the target directory via renameat2(RENAME_EXCHANGE), then deletes the old data. Without fsync, file data lives only in the kernel page cache. On ungraceful shutdown the journal replays the swap and deletion (metadata), but the file data was never flushed, leaving truncated or empty files. Introduce an fsutil package with WriteFileFsync (write + fsync file + fsync parent directory) and Fsync (fsync a path) primitives. Use WriteFileFsync for all file writes so each file is individually durable, and fsync both parent directories after the swap to persist which inode each directory name points to.
writePod uses bare os.WriteFile plus a delete-then-write pattern for kubelet manifests. On ungraceful shutdown, the delete is journaled but the new file data may not have reached disk, leaving the manifest missing. Replace os.WriteFile with fsutil.WriteFileFsync, which writes, fsyncs the file, and fsyncs the parent directory in a single call, ensuring both the resource copy and the kubelet manifest are durable before the function returns.
|
@openshift-cherrypick-robot: Detected clone of Jira Issue OCPBUGS-84258 with correct target version. Will retitle the PR to link to the clone. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-85270, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-cherrypick-robot: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/jira refresh |
|
@sanchezl: This pull request references Jira Issue OCPBUGS-85270, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sanchezl: This pull request references Jira Issue OCPBUGS-85270, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sanchezl: This pull request references Jira Issue OCPBUGS-85270, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sanchezl: This pull request references Jira Issue OCPBUGS-85270, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: openshift-cherrypick-robot, p0lyn0mial, sanchezl The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@sanchezl: This pull request references Jira Issue OCPBUGS-85270, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This is an automated cherry-pick of #2176
/assign openshift-ci-robot