Skip to content

Conversation

@ianm-nv
Copy link
Collaborator

@ianm-nv ianm-nv commented Jan 28, 2026

BugLink

[ Impact ]

ARM Confidential Compute Architecture (CCA) provides hardware-enforced isolation for confidential virtual machines called "Realms" on ARM64 platforms. This patch series enables CCA support for NVIDIA Vera platforms.

This series is based on the ARM KVM RME host support patches (v10), rebased for the 6.17 kernel:
https://lore.kernel.org/linux-coco/20250820145606.180644-1-steven.price@arm.com/

This series enables:
-KVM host support for creating and managing Realms via the Realm Management Extension (RME)
-MECID (Memory Encryption Context ID) for improved isolation between Realms
-Required CCA kernel configuration options

[ Test Plan ]

Deploy and test on NVIDIA Vera platform with RMM firmware
Verify Realm guest VMs boot and run successfully
CCA testing requires specialized hardware and firmware. Testing performed by NVIDIA CCA team.

[ Where problems could occur ]

Bugs in the KVM/RME integration could cause Realm guest failures or host instability. Issues would be limited to CCA-enabled platforms running Realm workloads.

[ Other Info ]

Patch summary:
43 patches for upstream v10 KVM/RME host support - marked as SAUCE because not in upstream kernel yet.
3 upstream cherry-picks:
arm64: realm: ioremap: Allow mapping memory as encrypted
arm64: acpi: Enable ACPI CCEL support
arm64: Enable EFI secret area Securityfs support
4 SAUCE patches:
arm64: RME: Fix UBSAN shift-out-of-bounds in kvm_realm_unmap_range
arm64: RME: Add MECID support
arm64: RME: Add bounds check
[Config] Update ARM CCA annotations

Suzuki K Poulose and others added 29 commits January 28, 2026 17:51
BugLink: https://bugs.launchpad.net/bugs/2139249

Fix a potential build error (like below, when asm/kvm_emulate.h gets
included after the kvm/arm_psci.h) by including the missing header file
in kvm/arm_psci.h:

./include/kvm/arm_psci.h: In function ‘kvm_psci_version’:
./include/kvm/arm_psci.h:29:13: error: implicit declaration of function
   ‘vcpu_has_feature’; did you mean ‘cpu_have_feature’? [-Werror=implicit-function-declaration]
   29 |         if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_PSCI_0_2)) {
	         |             ^~~~~~~~~~~~~~~~
			       |             cpu_have_feature

Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-2-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

If the host attempts to access granules that have been delegated for use
in a realm these accesses will be caught and will trigger a Granule
Protection Fault (GPF).

A fault during a page walk signals a bug in the kernel and is handled by
oopsing the kernel. A non-page walk fault could be caused by user space
having access to a page which has been delegated to the kernel and will
trigger a SIGBUS to allow debugging why user space is trying to access a
delegated page.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-3-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM (Realm Management Monitor) provides functionality that can be
accessed by SMC calls from the host.

The SMC definitions are based on DEN0137[1] version 1.0-rel0

[1] https://developer.arm.com/documentation/den0137/1-0rel0/

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-4-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The wrappers make the call sites easier to read and deal with the
boiler plate of handling the error codes from the RMM.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-5-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Query the RMI version number and check if it is a compatible version. A
static key is also provided to signal that a supported RMM is available.

Functions are provided to query if a VM or VCPU is a realm (or rec)
which currently will always return false.

Later patches make use of struct realm and the states as the ioctls
interfaces are added to support realm and REC creation and destruction.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/all/20250820145606.180644-6-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

There is one (multiplexed) CAP which can be used to create, populate and
then activate the realm.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-7-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Add the KVM_CAP_ARM_RME_CREATE_RD ioctl to create a realm. This involves
delegating pages to the RMM to hold the Realm Descriptor (RD) and for
the base level of the Realm Translation Tables (RTT). A VMID also need
to be picked, since the RMM has a separate VMID address space a
dedicated allocator is added for this purpose.

KVM_CAP_ARM_RME_CONFIG_REALM is provided to allow configuring the realm
before it is created. Configuration options can be classified as:

 1. Parameters specific to the Realm stage2 (e.g. IPA Size, vmid, stage2
    entry level, entry level RTTs, number of RTTs in start level, LPA2)
    Most of these are not measured by RMM and comes from KVM book
    keeping.

 2. Parameters controlling "Arm Architecture features for the VM". (e.g.
    SVE VL, PMU counters, number of HW BRPs/WPs), configured by the VMM
    using the "user ID register write" mechanism. These will be
    supported in the later patches.

 3. Parameters are not part of the core Arm architecture but defined
    by the RMM spec (e.g. Hash algorithm for measurement,
    Personalisation value). These are programmed via
    KVM_CAP_ARM_RME_CONFIG_REALM.

For the IPA size there is the possibility that the RMM supports a
different size to the IPA size supported by KVM for normal guests. At
the moment the 'normal limit' is exposed by KVM_CAP_ARM_VM_IPA_SIZE and
the IPA size is configured by the bottom bits of vm_type in
KVM_CREATE_VM. This means that it isn't easy for the VMM to discover
what IPA sizes are supported for Realm guests. Since the IPA is part of
the measurement of the realm guest the current expectation is that the
VMM will be required to pick the IPA size demanded by attestation and
therefore simply failing if this isn't available is fine. An option
would be to expose a new capability ioctl to obtain the RMM's maximum
IPA size if this is needed in the future.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-8-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…alm guests

BugLink: https://bugs.launchpad.net/bugs/2139249

RMM v1.0 provides no mechanism for the host to perform debug operations
on the guest. So don't expose KVM_CAP_SET_GUEST_DEBUG and report 0
breakpoints and 0 watch points.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-9-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…tion

BugLink: https://bugs.launchpad.net/bugs/2139249

Previously machine type was used purely for specifying the physical
address size of the guest. Reserve the higher bits to specify an ARM
specific machine type and declare a new type 'KVM_VM_TYPE_ARM_REALM'
used to create a realm guest.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-10-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM owns the stage 2 page tables for a realm, and KVM must request
that the RMM creates/destroys entries as necessary. The physical pages
to store the page tables are delegated to the realm as required, and can
be undelegated when no longer used.

Creating new RTTs is the easy part, tearing down is a little more
tricky. The result of realm_rtt_destroy() can be used to effectively
walk the tree and destroy the entries (undelegating pages that were
given to the realm).

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-11-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM maintains a data structure known as the Realm Execution Context
(or REC). It is similar to struct kvm_vcpu and tracks the state of the
virtual CPUs. KVM must delegate memory and request the structures are
created when vCPUs are created, and suitably tear down on destruction.

RECs must also be supplied with addition pages - auxiliary (or AUX)
granules - for storing the larger registers state (e.g. for SVE). The
number of AUX granules for a REC depends on the parameters with which
the Realm was created - the RMM makes this information available via the
RMI_REC_AUX_COUNT call performed after creating the Realm Descriptor (RD).

Note that only some of register state for the REC can be set by KVM, the
rest is defined by the RMM (zeroed). The register state then cannot be
changed by KVM after the REC is created (except when the guest
explicitly requests this e.g. by performing a PSCI call). The RMM also
requires that the VMM creates RECs in ascending order of the MPIDR.

See Realm Management Monitor specification (DEN0137) for more information:
https://developer.arm.com/documentation/den0137/

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-12-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…t registers

BugLink: https://bugs.launchpad.net/bugs/2139249

Currently the number of list registers available is stored in a global
(kvm_vgic_global_state.nr_lr). With Arm CCA the RMM is permitted to
reserve list registers for its own use and so the number of available
list registers can be fewer for a realm VM. Provide a wrapper function
to fetch the global in preparation for restricting nr_lr when dealing
with a realm VM.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-13-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM provides emulation of a VGIC to the realm guest but delegates
much of the handling to the host. Implement support in KVM for
saving/restoring state to/from the REC structure.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-14-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM keeps track of the timer while the realm REC is running, but on
exit to the normal world KVM is responsible for handling the timers.

The RMM doesn't provide a mechanism to set the counter offset, so don't
expose KVM_CAP_COUNTER_OFFSET for a realm VM.

A later patch adds the support for propagating the timer values from the
exit data structure and calling kvm_realm_timers_update().

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-15-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Each page within the protected region of the realm guest can be marked
as either RAM or EMPTY. Allow the VMM to control this before the guest
has started and provide the equivalent functions to change this (with
the guest's approval) at runtime.

When transitioning from RIPAS RAM (1) to RIPAS EMPTY (0) the memory is
unmapped from the guest and undelegated allowing the memory to be reused
by the host. When transitioning to RIPAS RAM the actual population of
the leaf RTTs is done later on stage 2 fault, however it may be
necessary to allocate additional RTTs to allow the RMM track the RIPAS
for the requested range.

When freeing a block mapping it is necessary to temporarily unfold the
RTT which requires delegating an extra page to the RMM, this page can
then be recovered once the contents of the block mapping have been
freed.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-16-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Entering a realm is done using a SMC call to the RMM. On exit the
exit-codes need to be handled slightly differently to the normal KVM
path so define our own functions for realm enter/exit and hook them
in if the guest is a realm guest.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-17-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The guest can request that a region of it's protected address space is
switched between RIPAS_RAM and RIPAS_EMPTY (and back) using
RSI_IPA_STATE_SET. This causes a guest exit with the
RMI_EXIT_RIPAS_CHANGE code. We treat this as a request to convert a
protected region to unprotected (or back), exiting to the VMM to make
the necessary changes to the guest_memfd and memslot mappings. On the
next entry the RIPAS changes are committed by making RMI_RTT_SET_RIPAS
calls.

The VMM may wish to reject the RIPAS change requested by the guest. For
now it can only do with by no longer scheduling the VCPU as we don't
currently have a usecase for returning that rejection to the guest, but
by postponing the RMI_RTT_SET_RIPAS changes to entry we leave the door
open for adding a new ioctl in the future for this purpose.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-18-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

MMIO emulation for a realm cannot be done directly with the VM's
registers as they are protected from the host. However, for emulatable
data aborts, the RMM uses GPRS[0] to provide the read/written value.
We can transfer this from/to the equivalent VCPU's register entry and
then depend on the generic MMIO handling code in KVM.

For a MMIO read, the value is placed in the shared RecExit structure
during kvm_handle_mmio_return() rather than in the VCPU's register
entry.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-19-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The VMM needs to populate the realm with some data before starting (e.g.
a kernel and initrd). This is measured by the RMM and used as part of
the attestation later on.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-20-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

At runtime if the realm guest accesses memory which hasn't yet been
mapped then KVM needs to either populate the region or fault the guest.

For memory in the lower (protected) region of IPA a fresh page is
provided to the RMM which will zero the contents. For memory in the
upper (shared) region of IPA, the memory from the memslot is mapped
into the realm VM non secure.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-21-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

When loading a realm VCPU much of the work is handled by the RMM so only
some of the actions are required. Rearrange kvm_arch_vcpu_load()
slightly so we can bail out early for a realm guest.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-22-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM only allows setting the GPRS (x0-x30) and PC for a realm
guest. Check this in kvm_arm_set_reg() so that the VMM can receive a
suitable error return if other registers are written to.

The RMM makes similar restrictions for reading of the guest's registers
(this is *confidential* compute after all), however we don't impose the
restriction here. This allows the VMM to read (stale) values from the
registers which might be useful to read back the initial values even if
the RMM doesn't provide the latest version. For migration of a realm VM,
a new interface will be needed so that the VMM can receive an
(encrypted) blob of the VM's state.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-23-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM needs to be informed of the target REC when a PSCI call is made
with an MPIDR argument. Expose an ioctl to the userspace in case the PSCI
is handled by it.

Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-24-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM doesn't allow injection of a undefined exception into a realm
guest. Add a WARN to catch if this ever happens.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-25-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

It doesn't make much sense as a realm guest wouldn't want to trust the
host. It will also need some extra work to ensure that KVM will only
attempt to write into a shared memory region. So for now just disable
it.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Steven Price <steven.price@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-26-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Extend KVM_SET_VCPU_EVENTS to support realms, where KVM cannot set the
system registers, and the RMM must perform it on next REC entry.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-27-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Forward RSI_HOST_CALLS to KVM's HVC handler.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-28-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Given we have different types of VMs supported, check the
support for SVE for the given instance of the VM to accurately
report the status.

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-29-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Guest_memfd doesn't yet natively support huge pages, and there are
currently difficulties for a VMM to manage huge pages efficiently so for
now always split up mappings to PTE (4k).

The two issues that need progressing before supporting huge pages for
realms are:

 1. guest_memfd needs to be able to allocate from an appropriate
    allocator which can provide huge pages.

 2. The VMM needs to be able to repurpose private memory for a shared
    mapping when the guest VM requests memory is transitioned. Because
    this can happen at a 4k granularity it isn't possible to
    free/reallocate while huge pages are in use. Allowing the VMM to
    mmap() the shared portion of a huge page would allow the huge page
    to be recreated when the memory is unshared and made protected again.

These two issues are not specific to realms and don't affect the realm
API, so for now just break everything down to 4k pages in the RMM
controlled stage 2. Future work can add huge page support without
changing the uAPI.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-30-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
@ianm-nv
Copy link
Collaborator Author

ianm-nv commented Jan 28, 2026

@nvmochs
Copy link
Collaborator

nvmochs commented Jan 29, 2026

@ianm-nv A bit of feedback...

  • I did a range diff against the v10 origin branch and (excluding the one patch with the context adjustment) confirmed the only deltas were in the commit messages (as expected). I also confirmed the upstream picks were clean.

  • No issues with the 3 non-config SAUCE patches.

  • For "NVIDIA: VR: SAUCE: [Config] Update annotations for ARM CCA”, isn’t CONFIG_ARM_CCA_GUEST already set in the master annotations?

  • Random question: Any kernel page size restrictions for this host CCA support?

  • There is going to be conflicts with PR [linux-nvidia-6.17]: Backport nvgrace-gpu hugepfnmap, ecc patches and miscellaneous cleanups #287 (I recalled that PR touching some of the same files and just tried picking your PR to that branch). There may also be conflicts with other PRs as well since the -next branch is in need of an update. Wondering how to best facilitate this to ease the burden on whoever is merging this content. =)

@clsotog
Copy link
Collaborator

clsotog commented Jan 29, 2026

Quick questions:

  • I did not understand for this commit e2e20df why the wording of the comments is not the same as the lore discussion.

  • I have never run this code but for this commit a4ecd32 does realm_create_rd can be call multiple times? Is there a chance we can saw those 2 warning over and over at dmesg?

Steven Price and others added 22 commits January 29, 2026 15:46
BugLink: https://bugs.launchpad.net/bugs/2139249

Physical device assignment is not yet supported by the RMM, so it
doesn't make much sense to allow device mappings within the realm.
Prevent them when the guest is a realm.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(backported from https://lore.kernel.org/20250820145606.180644-31-steven.price@arm.com/)
[context adjustment because of commit 6f43d59]
Signed-off-by: Ian May <ianm@nvidia.com>
…sical IRQ

BugLink: https://bugs.launchpad.net/bugs/2139249

Arm CCA assigns the physical PMU device to the guest running in realm
world, however the IRQs are routed via the host. To enter a realm guest
while a PMU IRQ is pending it is necessary to block the physical IRQ to
prevent an immediate exit. Provide a mechanism in the PMU driver for KVM
to control the physical IRQ.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-32-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Use the PMU registers from the RmiRecExit structure to identify when an
overflow interrupt is due and inject it into the guest. Also hook up the
configuration option for enabling the PMU within the guest.

When entering a realm guest with a PMU interrupt pending, it is
necessary to disable the physical interrupt. Otherwise when the RMM
restores the PMU state the physical interrupt will trigger causing an
immediate exit back to the host. The guest is expected to acknowledge
the interrupt causing a host exit (to update the GIC state) which gives
the opportunity to re-enable the physical interrupt before the next PMU
event.

Number of PMU counters is configured by the VMM by writing to PMCR.N.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-33-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…ests

BugLink: https://bugs.launchpad.net/bugs/2139249

For protected memory read only isn't supported by the RMM. While it may
be possible to support read only for unprotected memory, this isn't
supported at the present time.

Note that this does mean that ROM (or flash) data cannot be emulated
correctly by the VMM as the stage 2 mappings are either always
read/write or are trapped as MMIO (so don't support operations where the
syndrome information doesn't allow emulation, e.g. load/store pair).

This restriction can be lifted in the future by allowing the unprotected
stage 2 mappings to be made read only.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-34-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…tchpoints to userspace

BugLink: https://bugs.launchpad.net/bugs/2139249

The RMM describes the maximum number of BPs/WPs available to the guest
in the Feature Register 0. Propagate those numbers into ID_AA64DFR0_EL1,
which is visible to userspace. A VMM needs this information in order to
set up realm parameters.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-35-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…ONE_REG

BugLink: https://bugs.launchpad.net/bugs/2139249

Allow userspace to configure the number of breakpoints and watchpoints
of a Realm VM through KVM_SET_ONE_REG ID_AA64DFR0_EL1.

The KVM sys_reg handler checks the user value against the maximum value
given by RMM (arm64_check_features() gets it from the
read_sanitised_id_aa64dfr0_el1() reset handler).

Userspace discovers that it can write these fields by issuing a
KVM_ARM_GET_REG_WRITABLE_MASKS ioctl.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-36-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…supported by RMM

BugLink: https://bugs.launchpad.net/bugs/2139249

Provide an accurate number of available PMU counters to userspace when
setting up a Realm.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-37-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

RMM provides the maximum vector length it supports for a guest in its
feature register. Make it visible to the rest of KVM and to userspace
via KVM_REG_ARM64_SVE_VLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-38-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…Realm

BugLink: https://bugs.launchpad.net/bugs/2139249

Obtain the max vector length configured by userspace on the vCPUs, and
write it into the Realm parameters. By default the vCPU is configured
with the max vector length reported by RMM, and userspace can reduce it
with a write to KVM_REG_ARM64_SVE_VLS.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-39-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…RME RECs

BugLink: https://bugs.launchpad.net/bugs/2139249

KVM_GET_REG_LIST should not be called before SVE is finalized. The ioctl
handler currently returns -EPERM in this case. But because it uses
kvm_arm_vcpu_is_finalized(), it now also rejects the call for
unfinalized REC even though finalizing the REC can only be done late,
after Realm descriptor creation.

Move the check to copy_sve_reg_indices(). One adverse side effect of
this change is that a KVM_GET_REG_LIST call that only probes for the
array size will now succeed even if SVE is not finalized, but that seems
harmless since the following KVM_GET_REG_LIST with the full array will
fail.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-40-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Userspace can set a few registers with KVM_SET_ONE_REG (9 GP registers
at runtime, and 3 system registers during initialization). Update the
register list returned by KVM_GET_REG_LIST.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-41-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Select KVM_GENERIC_PRIVATE_MEM and provide the necessary support
functions.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-42-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Increment KVM_VCPU_MAX_FEATURES to expose the new capability to user
space.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-43-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Add the ioctl to activate a realm and set the static branch to enable
access to the realm functionality if the RMM is detected.

Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
(cherry picked from https://lore.kernel.org/20250820145606.180644-44-steven.price@arm.com/)
Signed-off-by: Ian May <ianm@nvidia.com>
…ealm_unmap_range

BugLink: https://bugs.launchpad.net/bugs/2139249

Move the ia_bits bounds calculation after the kvm_realm_is_created()
check to avoid accessing realm->ia_bits before the realm is created.

When the realm hasn't been created yet, ia_bits is 0, causing
BIT(realm->ia_bits - 1) to evaluate BIT(-1) which wraps to
BIT(4294967295), triggering a UBSAN shift-out-of-bounds warning:

  UBSAN: shift-out-of-bounds in arch/arm64/kvm/rme.c:805:8
  shift exponent 4294967295 is too large for 64-bit type 'long unsigned int'
  ...
  kvm_realm_unmap_range+0x1c4/0x1e8
  kvm_arch_post_set_memory_attributes+0x58/0xd8
  kvm_vm_set_mem_attributes+0x37c/0x600

Fixes: 3b3ecd0c3dc7 ("NVIDIA: VR: SAUCE: arm64: RME: Allow VMM to set RIPAS")
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Add Memory Encryption Context ID (MECID) support for Realms to provide
better isolation between them when RMM supports MEC.

- Bitmap-based private MECID allocation per Realm
- Reference-counted shared MECID for backward compatibility
- Userspace config via KVM_CAP_ARM_RME_CONFIG_REALM ioctl
- MEC capability query interface (no arm.c changes needed)
- Graceful fallback: MECID 0 when RMM lacks MEC support
- Unconfigured realms default to shared MECID

State managed via struct mecid_state with clear locking semantics.
Policy enum: MEC_POLICY_{UNCONFIGURED,PRIVATE,SHARED}.

Signed-off-by: Raghu Krishnamurthy <raghupathyk@nvidia.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Add validation to ensure that the number of auxiliary granules returned
by RMM via rmi_rec_aux_count() does not exceed the maximum allowed value
of REC_PARAMS_AUX_GRANULES (16).

This prevents potential buffer overflow in the aux_pages array which is
statically defined with REC_PARAMS_AUX_GRANULES elements in struct
realm_rec.

If the RMM returns a value greater than 16, the realm creation is aborted
with proper cleanup to maintain system integrity.

Signed-off-by: Raghu Krishnamurthy <raghupathyk@nvidia.com>
Signed-off-by: Ian May <ianm@nvidia.com>
…meter

BugLink: https://bugs.launchpad.net/bugs/2139249

Add a module parameter to expose the KVM_CAP_ARM_RME capability number
via sysfs. This allows userspace (QEMU) to discover the correct
capability number at runtime rather than relying on hardcoded values
that may become stale when capability numbers shift due to other
patches being merged.

The value is exposed at:
  /sys/module/kvm/parameters/kvm_cap_arm_rme

Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

For ioremap(), so far we only checked if it was a device (RIPAS_DEV) to choose
an encrypted vs decrypted mapping. However, we may have firmware reserved memory
regions exposed to the OS (e.g., EFI Coco Secret Securityfs, ACPI CCEL).
We need to make sure that anything that is RIPAS_RAM (i.e., Guest
protected memory with RMM guarantees) are also mapped as encrypted.

Rephrasing the above, anything that is not RIPAS_EMPTY is guaranteed to be
protected by the RMM. Thus we choose encrypted mapping for anything that is not
RIPAS_EMPTY. While at it, rename the helper function

  __arm64_is_protected_mmio => arm64_rsi_is_protected

to clearly indicate that this not an arm64 generic helper, but something to do
with Realms.

Cc: Sami Mujawar <sami.mujawar@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Tested-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit fa84e53)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Add support for ACPI CCEL by handling the EfiACPIMemoryNVS type memory.
As per UEFI specifications NVS memory is reserved for Firmware use even
after exiting boot services. Thus map the region as read-only.

Cc: Sami Mujawar <sami.mujawar@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Gavin Shan <gshan@redhat.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit d02c2e4)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249

Enable EFI COCO secrets support. Provide the ioremap_encrypted() support required
by the driver.

Cc: Sami Mujawar <sami.mujawar@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 9e8a3df)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Ian May <ianm@nvidia.com>
@ianm-nv ianm-nv force-pushed the 24.04_linux-nvidia-6.17-next+cca/latest branch from 060fb12 to aa740a4 Compare January 30, 2026 19:37
@ianm-nv
Copy link
Collaborator Author

ianm-nv commented Jan 30, 2026

@nvmochs

@ianm-nv A bit of feedback...

  • I did a range diff against the v10 origin branch and (excluding the one patch with the context adjustment) confirmed the only deltas were in the commit messages (as expected). I also confirmed the upstream picks were clean.
  • No issues with the 3 non-config SAUCE patches.
  • For "NVIDIA: VR: SAUCE: [Config] Update annotations for ARM CCA”, isn’t CONFIG_ARM_CCA_GUEST already set in the master annotations?

Thanks, I've removed CONFIG_ARM_CCA_GUEST from debian.nvidia-6.17/config/annotations

  • Random question: Any kernel page size restrictions for this host CCA support?

Here is the location for the page size check, and I have confirmed this works as expected when booting the 4k vs the 64k kernel.

https://github.com/NVIDIA/NV-Kernels/pull/298/changes#diff-e9ce34a8974a92109e89d2bdbc8f16c55c8f1e0422dfd510a002b2c8e1a0f310R1944-R1948

void kvm_init_rme(void)
{
if (PAGE_SIZE != SZ_4K)
/* Only 4k page size on the host is supported */
return;

My recommendation would be to wait until we merge the conflicting PRs and then I can rebase and update the PR and hopefully be ready to send.

@nvmochs
Copy link
Collaborator

nvmochs commented Jan 30, 2026

My recommendation would be to wait until we merge the conflicting PRs and then I can rebase and update the PR and hopefully be ready to send.

I'm okay with that. I saw a Canonical ack on the dmabuf PR earlier today so hopefully they will be merged soon.

@ianm-nv
Copy link
Collaborator Author

ianm-nv commented Jan 30, 2026

Quick questions:

  • I did not understand for this commit e2e20df why the wording of the comments is not the same as the lore discussion.

Good catch! Comments have been fixed and pushed.

  • I have never run this code but for this commit a4ecd32 does realm_create_rd can be call multiple times? Is there a chance we can saw those 2 warning over and over at dmesg?

realm_create_rd is called once for every Realm VM launch. The threat of spamming dmesg should be low. Since this warning does indicate a RMM firmware bug, I think it is safer to keep as a WARN_ON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants