Skip to content

fix(prepare): enable containerd device_ownership_from_security_context for CDI block imports#47

Closed
Andrei Kvapil (kvaps) wants to merge 1 commit into
mainfrom
fix/containerd-device-ownership-cdi
Closed

fix(prepare): enable containerd device_ownership_from_security_context for CDI block imports#47
Andrei Kvapil (kvaps) wants to merge 1 commit into
mainfrom
fix/containerd-device-ownership-cdi

Conversation

@kvaps

@kvaps Andrei Kvapil (kvaps) commented Jun 1, 2026

Copy link
Copy Markdown
Member

Summary

KubeVirt's CDI importer writes VM disk images into raw block volumes from a non-root pod (runAsUser: 107). containerd only chowns the block device to the pod's SecurityContext when device_ownership_from_security_context is enabled on the CRI plugin, and k3s ships it disabled. Without it the importer fails with blockdev: cannot open /dev/cdi-block-volume: Permission denied, the DataVolume is stuck in ImportInProgress, and every VM referencing the disk stays Pending — one of the silent "VMs stuck in Pending" failure modes the README already warns about. The prepare playbooks configure every other node prerequisite but never touched the containerd CRI config; this adds it.

Changes

  • All three prepare playbooks (ubuntu/rhel/suse) drop a k3s containerd config (config-v3.toml.d/10-cozystack-cri.toml) setting device_ownership_from_security_context = true, gated behind cozystack_enable_kubevirt.
  • Drop-in directory overridable via cozystack_k3s_containerd_dropin_dir for containerd 1.x clusters.
  • Handler restarts k3s on re-runs against a running cluster (no-op on the full pipeline, where the file is read at first k3s start).
  • README: new "Node Prerequisites" subsection documenting the requirement.
  • CHANGELOG + galaxy.yml version bump to 1.4.3.

Test plan

  • ansible-lint passes (production profile, 0 failures / 0 warnings)
  • ansible-test sanity passes — not run (no plugin changes)
  • Tested on a live cluster (k3s + containerd 2.x): with the drop-in absent the CDI block import crash-looped on Permission denied for 5h; after applying the equivalent drop-in + k3s restart the DataVolume import resumed and completed (Succeeded, PVC Bound), and the VM booted with the disk attached.
  • Idempotency: copy/file tasks are idempotent; the handler fires only when the drop-in content changes.

All three prepare playbooks pass ansible-playbook --syntax-check.

Summary by CodeRabbit

Release Notes v1.4.3

  • Bug Fixes

    • Fixed CDI block imports hanging when writing disk images to raw block volumes as a non-root pod. This resolves issues where DataVolume imports would remain stuck in ImportInProgress state and prevent associated VMs from reaching Pending status (when KubeVirt is enabled).
  • Documentation

    • Updated changelog and configuration documentation to reflect new containerd settings for CDI block import support.

…t for CDI block imports

KubeVirt's CDI importer writes VM disk images into raw block volumes
from a non-root pod. containerd only chowns the block device to the
pod's SecurityContext when device_ownership_from_security_context is
enabled on the CRI plugin, and k3s ships it disabled. Without it the
importer fails with 'cannot open /dev/cdi-block-volume: Permission
denied', the DataVolume hangs in ImportInProgress, and VMs referencing
the disk stay Pending.

Add a k3s containerd drop-in (config-v3.toml.d/10-cozystack-cri.toml)
to all three prepare playbooks, gated behind cozystack_enable_kubevirt
and overridable via cozystack_k3s_containerd_dropin_dir.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <andrei.kvapil@aenix.io>
@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR releases v1.4.3 with a fix for KubeVirt CDI block import failures. Playbooks now enable a containerd CRI configuration setting device_ownership_from_security_context across RHEL, SUSE, and Ubuntu distributions, with release documentation and version updates.

Changes

KubeVirt CDI containerd CRI configuration

Layer / File(s) Summary
Release documentation and version
CHANGELOG.rst, README.md, galaxy.yml
Release notes for v1.4.3 document the containerd CRI setting required for CDI block imports when KubeVirt is enabled, with technical guidance on drop-in paths and failure modes; version bumped to 1.4.3.
Playbook handlers and containerd configuration
examples/rhel/prepare-rhel.yml, examples/suse/prepare-suse.yml, examples/ubuntu/prepare-ubuntu.yml
Each distro playbook adds a k3s systemd restart handler and conditional KubeVirt-gated tasks that create the containerd drop-in directory and write TOML configuration enabling device_ownership_from_security_context for the CRI runtime plugin, with handler notification to apply the changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A rabbit hops through container dreams,
Where CDI imports flow like streams—
With ownership context set just right,
Block volumes dance in KubeVirt light!
K3s restarts to seal the deal,
Cozy stacks with hardware's heal. 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main fix: enabling a containerd CRI setting to resolve CDI block import failures.
Description check ✅ Passed The description comprehensively covers the template requirements: clear summary of the problem, detailed change list, and complete test plan with results for all specified checks.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/containerd-device-ownership-cdi

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables device_ownership_from_security_context on the containerd CRI plugin for KubeVirt CDI block imports across RHEL, SUSE, and Ubuntu playbooks, and updates the documentation and changelog. Feedback on these changes suggests avoiding the use of failed_when: false on the k3s restart handlers, as it silences legitimate failures; instead, you should use ansible.builtin.stat to verify if the services exist before conditionally restarting them. Additionally, to fully support containerd 1.x clusters as documented, the containerd config version and plugin name should be parameterized with variables rather than hardcoded.

Comment on lines +125 to +136
- name: Restart k3s to apply containerd config
ansible.builtin.systemd:
name: "{{ item }}"
state: restarted
loop:
- k3s
- k3s-agent
# Only the unit matching this node's role exists; the other is
# absent, and on the full-pipeline run prepare executes before
# k3s is installed (the drop-in is then read at first k3s start).
# failed_when: false tolerates both — a missing unit is not an error.
failed_when: false

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using failed_when: false on the k3s restart handler is dangerous because it silences legitimate failures (e.g., if k3s fails to start due to a syntax error in the newly written containerd config). A safer approach is to check if the k3s services are installed using ansible.builtin.stat in the tasks, and then conditionally restart them in the handler only if they exist. This allows the playbook to fail loudly if an installed k3s service fails to restart.

    - name: Restart k3s to apply containerd config
      ansible.builtin.systemd:
        name: "{{ item.item }}"
        state: restarted
      loop: "{{ k3s_services_stat.results | default([]) }}"
      when: item.stat.exists | default(false)

Comment on lines +120 to +131
- name: Restart k3s to apply containerd config
ansible.builtin.systemd:
name: "{{ item }}"
state: restarted
loop:
- k3s
- k3s-agent
# Only the unit matching this node's role exists; the other is
# absent, and on the full-pipeline run prepare executes before
# k3s is installed (the drop-in is then read at first k3s start).
# failed_when: false tolerates both — a missing unit is not an error.
failed_when: false

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using failed_when: false on the k3s restart handler is dangerous because it silences legitimate failures (e.g., if k3s fails to start due to a syntax error in the newly written containerd config). A safer approach is to check if the k3s services are installed using ansible.builtin.stat in the tasks, and then conditionally restart them in the handler only if they exist. This allows the playbook to fail loudly if an installed k3s service fails to restart.

    - name: Restart k3s to apply containerd config
      ansible.builtin.systemd:
        name: "{{ item.item }}"
        state: restarted
      loop: "{{ k3s_services_stat.results | default([]) }}"
      when: item.stat.exists | default(false)

Comment on lines +141 to +152
- name: Restart k3s to apply containerd config
ansible.builtin.systemd:
name: "{{ item }}"
state: restarted
loop:
- k3s
- k3s-agent
# Only the unit matching this node's role exists; the other is
# absent, and on the full-pipeline run prepare executes before
# k3s is installed (the drop-in is then read at first k3s start).
# failed_when: false tolerates both — a missing unit is not an error.
failed_when: false

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using failed_when: false on the k3s restart handler is dangerous because it silences legitimate failures (e.g., if k3s fails to start due to a syntax error in the newly written containerd config). A safer approach is to check if the k3s services are installed using ansible.builtin.stat in the tasks, and then conditionally restart them in the handler only if they exist. This allows the playbook to fail loudly if an installed k3s service fails to restart.

    - name: Restart k3s to apply containerd config
      ansible.builtin.systemd:
        name: "{{ item.item }}"
        state: restarted
      loop: "{{ k3s_services_stat.results | default([]) }}"
      when: item.stat.exists | default(false)

Comment on lines +220 to +225
- name: Ensure k3s containerd config drop-in directory exists
ansible.builtin.file:
path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}"
state: directory
mode: "0755"
when: cozystack_enable_kubevirt | default(true) | bool

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To safely restart k3s only when it is actually installed, we can add a task to check if the systemd unit files exist using ansible.builtin.stat before ensuring the drop-in directory exists.

    - name: Check if k3s services are installed
      ansible.builtin.stat:
        path: "/etc/systemd/system/{{ item }}.service"
      loop:
        - k3s
        - k3s-agent
      register: k3s_services_stat
      when: cozystack_enable_kubevirt | default(true) | bool

    - name: Ensure k3s containerd config drop-in directory exists
      ansible.builtin.file:
        path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}"
        state: directory
        mode: "0755"
      when: cozystack_enable_kubevirt | default(true) | bool

Comment on lines +227 to +237
- name: Enable device_ownership_from_security_context for CDI block imports
ansible.builtin.copy:
dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
mode: "0644"
content: |
version = 3

[plugins.'io.containerd.cri.v1.runtime']
device_ownership_from_security_context = true
when: cozystack_enable_kubevirt | default(true) | bool
notify: Restart k3s to apply containerd config

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The README and CHANGELOG state that containerd 1.x clusters are supported by overriding the drop-in directory and adjusting the plugin table. However, the config version (version = 3) and plugin name (io.containerd.cri.v1.runtime) are currently hardcoded in the task. To make this actually configurable for containerd 1.x, we should use variables with sensible defaults.

    - name: Enable device_ownership_from_security_context for CDI block imports
      ansible.builtin.copy:
        dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
        mode: "0644"
        content: |
          version = {{ cozystack_k3s_containerd_config_version | default(3) }}

          [plugins.'{{ cozystack_k3s_containerd_cri_plugin | default("io.containerd.cri.v1.runtime") }}']
            device_ownership_from_security_context = true
      when: cozystack_enable_kubevirt | default(true) | bool
      notify: Restart k3s to apply containerd config

Comment on lines +215 to +220
- name: Ensure k3s containerd config drop-in directory exists
ansible.builtin.file:
path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}"
state: directory
mode: "0755"
when: cozystack_enable_kubevirt | default(true) | bool

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To safely restart k3s only when it is actually installed, we can add a task to check if the systemd unit files exist using ansible.builtin.stat before ensuring the drop-in directory exists.

    - name: Check if k3s services are installed
      ansible.builtin.stat:
        path: "/etc/systemd/system/{{ item }}.service"
      loop:
        - k3s
        - k3s-agent
      register: k3s_services_stat
      when: cozystack_enable_kubevirt | default(true) | bool

    - name: Ensure k3s containerd config drop-in directory exists
      ansible.builtin.file:
        path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}"
        state: directory
        mode: "0755"
      when: cozystack_enable_kubevirt | default(true) | bool

Comment on lines +222 to +232
- name: Enable device_ownership_from_security_context for CDI block imports
ansible.builtin.copy:
dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
mode: "0644"
content: |
version = 3

[plugins.'io.containerd.cri.v1.runtime']
device_ownership_from_security_context = true
when: cozystack_enable_kubevirt | default(true) | bool
notify: Restart k3s to apply containerd config

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The README and CHANGELOG state that containerd 1.x clusters are supported by overriding the drop-in directory and adjusting the plugin table. However, the config version (version = 3) and plugin name (io.containerd.cri.v1.runtime) are currently hardcoded in the task. To make this actually configurable for containerd 1.x, we should use variables with sensible defaults.

    - name: Enable device_ownership_from_security_context for CDI block imports
      ansible.builtin.copy:
        dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
        mode: "0644"
        content: |
          version = {{ cozystack_k3s_containerd_config_version | default(3) }}

          [plugins.'{{ cozystack_k3s_containerd_cri_plugin | default("io.containerd.cri.v1.runtime") }}']
            device_ownership_from_security_context = true
      when: cozystack_enable_kubevirt | default(true) | bool
      notify: Restart k3s to apply containerd config

Comment on lines +261 to +266
- name: Ensure k3s containerd config drop-in directory exists
ansible.builtin.file:
path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}"
state: directory
mode: "0755"
when: cozystack_enable_kubevirt | default(true) | bool

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To safely restart k3s only when it is actually installed, we can add a task to check if the systemd unit files exist using ansible.builtin.stat before ensuring the drop-in directory exists.

    - name: Check if k3s services are installed
      ansible.builtin.stat:
        path: "/etc/systemd/system/{{ item }}.service"
      loop:
        - k3s
        - k3s-agent
      register: k3s_services_stat
      when: cozystack_enable_kubevirt | default(true) | bool

    - name: Ensure k3s containerd config drop-in directory exists
      ansible.builtin.file:
        path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}"
        state: directory
        mode: "0755"
      when: cozystack_enable_kubevirt | default(true) | bool

Comment on lines +268 to +278
- name: Enable device_ownership_from_security_context for CDI block imports
ansible.builtin.copy:
dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
mode: "0644"
content: |
version = 3

[plugins.'io.containerd.cri.v1.runtime']
device_ownership_from_security_context = true
when: cozystack_enable_kubevirt | default(true) | bool
notify: Restart k3s to apply containerd config

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The README and CHANGELOG state that containerd 1.x clusters are supported by overriding the drop-in directory and adjusting the plugin table. However, the config version (version = 3) and plugin name (io.containerd.cri.v1.runtime) are currently hardcoded in the task. To make this actually configurable for containerd 1.x, we should use variables with sensible defaults.

    - name: Enable device_ownership_from_security_context for CDI block imports
      ansible.builtin.copy:
        dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
        mode: "0644"
        content: |
          version = {{ cozystack_k3s_containerd_config_version | default(3) }}

          [plugins.'{{ cozystack_k3s_containerd_cri_plugin | default("io.containerd.cri.v1.runtime") }}']
            device_ownership_from_security_context = true
      when: cozystack_enable_kubevirt | default(true) | bool
      notify: Restart k3s to apply containerd config

@kvaps Andrei Kvapil (kvaps) marked this pull request as ready for review June 1, 2026 17:22

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
galaxy.yml (1)

4-4: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Version bump violates collection versioning policy.

(Duplicate of the concern raised in CHANGELOG.rst)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@galaxy.yml` at line 4, The version line "version: 1.4.3" in galaxy.yml
violates the collection versioning policy; revert or adjust this version to
comply with the policy (e.g., keep previous version or follow the permitted
increment scheme) and align CHANGELOG.rst with the intended release changes
instead of forcing a version bump; update the version value in galaxy.yml (the
"version" key) to the correct policy-compliant version and ensure CHANGELOG.rst
reflects any release notes rather than being used to justify a noncompliant
bump.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.rst`:
- Around line 6-24: The CHANGELOG entry bumped the collection version to v1.4.3
which violates the policy that collection version mirrors
roles/cozystack/defaults/main.yml:cozystack_chart_version; revert the version
header change (restore v1.4.2 or remove the explicit bump) in CHANGELOG.rst so
the bugfix entry is not presented as a collection version bump unless you also
update cozystack_chart_version in roles/cozystack/defaults/main.yml to 1.4.3;
ensure the changelog text for the containerd fix remains but do not change the
collection version number.

---

Duplicate comments:
In `@galaxy.yml`:
- Line 4: The version line "version: 1.4.3" in galaxy.yml violates the
collection versioning policy; revert or adjust this version to comply with the
policy (e.g., keep previous version or follow the permitted increment scheme)
and align CHANGELOG.rst with the intended release changes instead of forcing a
version bump; update the version value in galaxy.yml (the "version" key) to the
correct policy-compliant version and ensure CHANGELOG.rst reflects any release
notes rather than being used to justify a noncompliant bump.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f11114e0-a9e7-4015-90c4-2dfaad96a1eb

📥 Commits

Reviewing files that changed from the base of the PR and between d068844 and 28f800b.

📒 Files selected for processing (6)
  • CHANGELOG.rst
  • README.md
  • examples/rhel/prepare-rhel.yml
  • examples/suse/prepare-suse.yml
  • examples/ubuntu/prepare-ubuntu.yml
  • galaxy.yml

Comment thread CHANGELOG.rst
Comment on lines +6 to +24
v1.4.3
======

Bugfixes
--------

- Prepare playbooks now enable
``device_ownership_from_security_context`` on the containerd CRI
plugin (k3s drop-in
``config-v3.toml.d/10-cozystack-cri.toml``). KubeVirt's CDI importer
writes disk images into raw block volumes as a non-root pod, which
requires containerd to chown the block device to the pod's
SecurityContext; k3s disables this by default. Without it the
importer failed with ``blockdev: cannot open /dev/cdi-block-volume:
Permission denied``, the ``DataVolume`` hung in ``ImportInProgress``,
and VMs referencing the disk stayed ``Pending``. Gated behind
``cozystack_enable_kubevirt``; drop-in directory overridable via
``cozystack_k3s_containerd_dropin_dir`` for containerd 1.x clusters.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Version bump violates collection versioning policy.

The collection version is being bumped from 1.4.2 to 1.4.3 for a bugfix (containerd CRI configuration), but the coding guidelines state: "Collection version is inherited from cozystack/cozystack and tracks the upstream Cozystack chart version (the value in roles/cozystack/defaults/main.yml:cozystack_chart_version). Do NOT bump the collection version just because a PR adds features or fixes bugs."

Unless cozystack_chart_version in roles/cozystack/defaults/main.yml has also been updated to 1.4.3 (which is not shown in the provided files), this version bump should be reverted. The fix should be released as part of the next version that syncs with an upstream Cozystack chart release.

As per coding guidelines: Collection version is inherited from cozystack/cozystack and tracks the upstream Cozystack chart version (the value in roles/cozystack/defaults/main.yml:cozystack_chart_version). Do NOT bump the collection version just because a PR adds features or fixes bugs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.rst` around lines 6 - 24, The CHANGELOG entry bumped the collection
version to v1.4.3 which violates the policy that collection version mirrors
roles/cozystack/defaults/main.yml:cozystack_chart_version; revert the version
header change (restore v1.4.2 or remove the explicit bump) in CHANGELOG.rst so
the bugfix entry is not presented as a collection version bump unless you also
update cozystack_chart_version in roles/cozystack/defaults/main.yml to 1.4.3;
ensure the changelog text for the containerd fix remains but do not change the
collection version number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants