-
Notifications
You must be signed in to change notification settings - Fork 54
[linux-nvidia-6.17] Add ARM CCA host support #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 24.04_linux-nvidia-6.17-next
Are you sure you want to change the base?
[linux-nvidia-6.17] Add ARM CCA host support #298
Conversation
BugLink: https://bugs.launchpad.net/bugs/2139249 Fix a potential build error (like below, when asm/kvm_emulate.h gets included after the kvm/arm_psci.h) by including the missing header file in kvm/arm_psci.h: ./include/kvm/arm_psci.h: In function ‘kvm_psci_version’: ./include/kvm/arm_psci.h:29:13: error: implicit declaration of function ‘vcpu_has_feature’; did you mean ‘cpu_have_feature’? [-Werror=implicit-function-declaration] 29 | if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_PSCI_0_2)) { | ^~~~~~~~~~~~~~~~ | cpu_have_feature Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-2-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 If the host attempts to access granules that have been delegated for use in a realm these accesses will be caught and will trigger a Granule Protection Fault (GPF). A fault during a page walk signals a bug in the kernel and is handled by oopsing the kernel. A non-page walk fault could be caused by user space having access to a page which has been delegated to the kernel and will trigger a SIGBUS to allow debugging why user space is trying to access a delegated page. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-3-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The RMM (Realm Management Monitor) provides functionality that can be accessed by SMC calls from the host. The SMC definitions are based on DEN0137[1] version 1.0-rel0 [1] https://developer.arm.com/documentation/den0137/1-0rel0/ Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-4-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The wrappers make the call sites easier to read and deal with the boiler plate of handling the error codes from the RMM. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-5-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Query the RMI version number and check if it is a compatible version. A static key is also provided to signal that a supported RMM is available. Functions are provided to query if a VM or VCPU is a realm (or rec) which currently will always return false. Later patches make use of struct realm and the states as the ioctls interfaces are added to support realm and REC creation and destruction. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/all/20250820145606.180644-6-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 There is one (multiplexed) CAP which can be used to create, populate and then activate the realm. Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-7-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Add the KVM_CAP_ARM_RME_CREATE_RD ioctl to create a realm. This involves delegating pages to the RMM to hold the Realm Descriptor (RD) and for the base level of the Realm Translation Tables (RTT). A VMID also need to be picked, since the RMM has a separate VMID address space a dedicated allocator is added for this purpose. KVM_CAP_ARM_RME_CONFIG_REALM is provided to allow configuring the realm before it is created. Configuration options can be classified as: 1. Parameters specific to the Realm stage2 (e.g. IPA Size, vmid, stage2 entry level, entry level RTTs, number of RTTs in start level, LPA2) Most of these are not measured by RMM and comes from KVM book keeping. 2. Parameters controlling "Arm Architecture features for the VM". (e.g. SVE VL, PMU counters, number of HW BRPs/WPs), configured by the VMM using the "user ID register write" mechanism. These will be supported in the later patches. 3. Parameters are not part of the core Arm architecture but defined by the RMM spec (e.g. Hash algorithm for measurement, Personalisation value). These are programmed via KVM_CAP_ARM_RME_CONFIG_REALM. For the IPA size there is the possibility that the RMM supports a different size to the IPA size supported by KVM for normal guests. At the moment the 'normal limit' is exposed by KVM_CAP_ARM_VM_IPA_SIZE and the IPA size is configured by the bottom bits of vm_type in KVM_CREATE_VM. This means that it isn't easy for the VMM to discover what IPA sizes are supported for Realm guests. Since the IPA is part of the measurement of the realm guest the current expectation is that the VMM will be required to pick the IPA size demanded by attestation and therefore simply failing if this isn't available is fine. An option would be to expose a new capability ioctl to obtain the RMM's maximum IPA size if this is needed in the future. Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-8-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…alm guests BugLink: https://bugs.launchpad.net/bugs/2139249 RMM v1.0 provides no mechanism for the host to perform debug operations on the guest. So don't expose KVM_CAP_SET_GUEST_DEBUG and report 0 breakpoints and 0 watch points. Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-9-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…tion BugLink: https://bugs.launchpad.net/bugs/2139249 Previously machine type was used purely for specifying the physical address size of the guest. Reserve the higher bits to specify an ARM specific machine type and declare a new type 'KVM_VM_TYPE_ARM_REALM' used to create a realm guest. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-10-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The RMM owns the stage 2 page tables for a realm, and KVM must request that the RMM creates/destroys entries as necessary. The physical pages to store the page tables are delegated to the realm as required, and can be undelegated when no longer used. Creating new RTTs is the easy part, tearing down is a little more tricky. The result of realm_rtt_destroy() can be used to effectively walk the tree and destroy the entries (undelegating pages that were given to the realm). Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-11-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The RMM maintains a data structure known as the Realm Execution Context (or REC). It is similar to struct kvm_vcpu and tracks the state of the virtual CPUs. KVM must delegate memory and request the structures are created when vCPUs are created, and suitably tear down on destruction. RECs must also be supplied with addition pages - auxiliary (or AUX) granules - for storing the larger registers state (e.g. for SVE). The number of AUX granules for a REC depends on the parameters with which the Realm was created - the RMM makes this information available via the RMI_REC_AUX_COUNT call performed after creating the Realm Descriptor (RD). Note that only some of register state for the REC can be set by KVM, the rest is defined by the RMM (zeroed). The register state then cannot be changed by KVM after the REC is created (except when the guest explicitly requests this e.g. by performing a PSCI call). The RMM also requires that the VMM creates RECs in ascending order of the MPIDR. See Realm Management Monitor specification (DEN0137) for more information: https://developer.arm.com/documentation/den0137/ Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-12-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…t registers BugLink: https://bugs.launchpad.net/bugs/2139249 Currently the number of list registers available is stored in a global (kvm_vgic_global_state.nr_lr). With Arm CCA the RMM is permitted to reserve list registers for its own use and so the number of available list registers can be fewer for a realm VM. Provide a wrapper function to fetch the global in preparation for restricting nr_lr when dealing with a realm VM. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-13-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The RMM provides emulation of a VGIC to the realm guest but delegates much of the handling to the host. Implement support in KVM for saving/restoring state to/from the REC structure. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-14-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The RMM keeps track of the timer while the realm REC is running, but on exit to the normal world KVM is responsible for handling the timers. The RMM doesn't provide a mechanism to set the counter offset, so don't expose KVM_CAP_COUNTER_OFFSET for a realm VM. A later patch adds the support for propagating the timer values from the exit data structure and calling kvm_realm_timers_update(). Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-15-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Each page within the protected region of the realm guest can be marked as either RAM or EMPTY. Allow the VMM to control this before the guest has started and provide the equivalent functions to change this (with the guest's approval) at runtime. When transitioning from RIPAS RAM (1) to RIPAS EMPTY (0) the memory is unmapped from the guest and undelegated allowing the memory to be reused by the host. When transitioning to RIPAS RAM the actual population of the leaf RTTs is done later on stage 2 fault, however it may be necessary to allocate additional RTTs to allow the RMM track the RIPAS for the requested range. When freeing a block mapping it is necessary to temporarily unfold the RTT which requires delegating an extra page to the RMM, this page can then be recovered once the contents of the block mapping have been freed. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-16-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Entering a realm is done using a SMC call to the RMM. On exit the exit-codes need to be handled slightly differently to the normal KVM path so define our own functions for realm enter/exit and hook them in if the guest is a realm guest. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-17-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The guest can request that a region of it's protected address space is switched between RIPAS_RAM and RIPAS_EMPTY (and back) using RSI_IPA_STATE_SET. This causes a guest exit with the RMI_EXIT_RIPAS_CHANGE code. We treat this as a request to convert a protected region to unprotected (or back), exiting to the VMM to make the necessary changes to the guest_memfd and memslot mappings. On the next entry the RIPAS changes are committed by making RMI_RTT_SET_RIPAS calls. The VMM may wish to reject the RIPAS change requested by the guest. For now it can only do with by no longer scheduling the VCPU as we don't currently have a usecase for returning that rejection to the guest, but by postponing the RMI_RTT_SET_RIPAS changes to entry we leave the door open for adding a new ioctl in the future for this purpose. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-18-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 MMIO emulation for a realm cannot be done directly with the VM's registers as they are protected from the host. However, for emulatable data aborts, the RMM uses GPRS[0] to provide the read/written value. We can transfer this from/to the equivalent VCPU's register entry and then depend on the generic MMIO handling code in KVM. For a MMIO read, the value is placed in the shared RecExit structure during kvm_handle_mmio_return() rather than in the VCPU's register entry. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-19-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The VMM needs to populate the realm with some data before starting (e.g. a kernel and initrd). This is measured by the RMM and used as part of the attestation later on. Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-20-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 At runtime if the realm guest accesses memory which hasn't yet been mapped then KVM needs to either populate the region or fault the guest. For memory in the lower (protected) region of IPA a fresh page is provided to the RMM which will zero the contents. For memory in the upper (shared) region of IPA, the memory from the memslot is mapped into the realm VM non secure. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-21-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 When loading a realm VCPU much of the work is handled by the RMM so only some of the actions are required. Rearrange kvm_arch_vcpu_load() slightly so we can bail out early for a realm guest. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-22-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The RMM only allows setting the GPRS (x0-x30) and PC for a realm guest. Check this in kvm_arm_set_reg() so that the VMM can receive a suitable error return if other registers are written to. The RMM makes similar restrictions for reading of the guest's registers (this is *confidential* compute after all), however we don't impose the restriction here. This allows the VMM to read (stale) values from the registers which might be useful to read back the initial values even if the RMM doesn't provide the latest version. For migration of a realm VM, a new interface will be needed so that the VMM can receive an (encrypted) blob of the VM's state. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-23-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The RMM needs to be informed of the target REC when a PSCI call is made with an MPIDR argument. Expose an ioctl to the userspace in case the PSCI is handled by it. Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-24-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 The RMM doesn't allow injection of a undefined exception into a realm guest. Add a WARN to catch if this ever happens. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-25-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 It doesn't make much sense as a realm guest wouldn't want to trust the host. It will also need some extra work to ensure that KVM will only attempt to write into a shared memory region. So for now just disable it. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-26-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Extend KVM_SET_VCPU_EVENTS to support realms, where KVM cannot set the system registers, and the RMM must perform it on next REC entry. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-27-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Forward RSI_HOST_CALLS to KVM's HVC handler. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-28-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Given we have different types of VMs supported, check the support for SVE for the given instance of the VM to accurately report the status. Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-29-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Guest_memfd doesn't yet natively support huge pages, and there are currently difficulties for a VMM to manage huge pages efficiently so for now always split up mappings to PTE (4k). The two issues that need progressing before supporting huge pages for realms are: 1. guest_memfd needs to be able to allocate from an appropriate allocator which can provide huge pages. 2. The VMM needs to be able to repurpose private memory for a shared mapping when the guest VM requests memory is transitioned. Because this can happen at a 4k granularity it isn't possible to free/reallocate while huge pages are in use. Allowing the VMM to mmap() the shared portion of a huge page would allow the huge page to be recreated when the memory is unshared and made protected again. These two issues are not specific to realms and don't affect the realm API, so for now just break everything down to 4k pages in the RMM controlled stage 2. Future work can add huge page support without changing the uAPI. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-30-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
|
Here is some info on how to test Realms |
|
@ianm-nv A bit of feedback...
|
|
Quick questions: |
BugLink: https://bugs.launchpad.net/bugs/2139249 Physical device assignment is not yet supported by the RMM, so it doesn't make much sense to allow device mappings within the realm. Prevent them when the guest is a realm. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (backported from https://lore.kernel.org/20250820145606.180644-31-steven.price@arm.com/) [context adjustment because of commit 6f43d59] Signed-off-by: Ian May <ianm@nvidia.com>
…sical IRQ BugLink: https://bugs.launchpad.net/bugs/2139249 Arm CCA assigns the physical PMU device to the guest running in realm world, however the IRQs are routed via the host. To enter a realm guest while a PMU IRQ is pending it is necessary to block the physical IRQ to prevent an immediate exit. Provide a mechanism in the PMU driver for KVM to control the physical IRQ. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-32-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Use the PMU registers from the RmiRecExit structure to identify when an overflow interrupt is due and inject it into the guest. Also hook up the configuration option for enabling the PMU within the guest. When entering a realm guest with a PMU interrupt pending, it is necessary to disable the physical interrupt. Otherwise when the RMM restores the PMU state the physical interrupt will trigger causing an immediate exit back to the host. The guest is expected to acknowledge the interrupt causing a host exit (to update the GIC state) which gives the opportunity to re-enable the physical interrupt before the next PMU event. Number of PMU counters is configured by the VMM by writing to PMCR.N. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-33-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…ests BugLink: https://bugs.launchpad.net/bugs/2139249 For protected memory read only isn't supported by the RMM. While it may be possible to support read only for unprotected memory, this isn't supported at the present time. Note that this does mean that ROM (or flash) data cannot be emulated correctly by the VMM as the stage 2 mappings are either always read/write or are trapped as MMIO (so don't support operations where the syndrome information doesn't allow emulation, e.g. load/store pair). This restriction can be lifted in the future by allowing the unprotected stage 2 mappings to be made read only. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-34-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…tchpoints to userspace BugLink: https://bugs.launchpad.net/bugs/2139249 The RMM describes the maximum number of BPs/WPs available to the guest in the Feature Register 0. Propagate those numbers into ID_AA64DFR0_EL1, which is visible to userspace. A VMM needs this information in order to set up realm parameters. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-35-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…ONE_REG BugLink: https://bugs.launchpad.net/bugs/2139249 Allow userspace to configure the number of breakpoints and watchpoints of a Realm VM through KVM_SET_ONE_REG ID_AA64DFR0_EL1. The KVM sys_reg handler checks the user value against the maximum value given by RMM (arm64_check_features() gets it from the read_sanitised_id_aa64dfr0_el1() reset handler). Userspace discovers that it can write these fields by issuing a KVM_ARM_GET_REG_WRITABLE_MASKS ioctl. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-36-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…supported by RMM BugLink: https://bugs.launchpad.net/bugs/2139249 Provide an accurate number of available PMU counters to userspace when setting up a Realm. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-37-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 RMM provides the maximum vector length it supports for a guest in its feature register. Make it visible to the rest of KVM and to userspace via KVM_REG_ARM64_SVE_VLS. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-38-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…Realm BugLink: https://bugs.launchpad.net/bugs/2139249 Obtain the max vector length configured by userspace on the vCPUs, and write it into the Realm parameters. By default the vCPU is configured with the max vector length reported by RMM, and userspace can reduce it with a write to KVM_REG_ARM64_SVE_VLS. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-39-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…RME RECs BugLink: https://bugs.launchpad.net/bugs/2139249 KVM_GET_REG_LIST should not be called before SVE is finalized. The ioctl handler currently returns -EPERM in this case. But because it uses kvm_arm_vcpu_is_finalized(), it now also rejects the call for unfinalized REC even though finalizing the REC can only be done late, after Realm descriptor creation. Move the check to copy_sve_reg_indices(). One adverse side effect of this change is that a KVM_GET_REG_LIST call that only probes for the array size will now succeed even if SVE is not finalized, but that seems harmless since the following KVM_GET_REG_LIST with the full array will fail. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-40-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Userspace can set a few registers with KVM_SET_ONE_REG (9 GP registers at runtime, and 3 system registers during initialization). Update the register list returned by KVM_GET_REG_LIST. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-41-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Select KVM_GENERIC_PRIVATE_MEM and provide the necessary support functions. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-42-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Increment KVM_VCPU_MAX_FEATURES to expose the new capability to user space. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-43-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Add the ioctl to activate a realm and set the static branch to enable access to the realm functionality if the RMM is detected. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> (cherry picked from https://lore.kernel.org/20250820145606.180644-44-steven.price@arm.com/) Signed-off-by: Ian May <ianm@nvidia.com>
…ealm_unmap_range BugLink: https://bugs.launchpad.net/bugs/2139249 Move the ia_bits bounds calculation after the kvm_realm_is_created() check to avoid accessing realm->ia_bits before the realm is created. When the realm hasn't been created yet, ia_bits is 0, causing BIT(realm->ia_bits - 1) to evaluate BIT(-1) which wraps to BIT(4294967295), triggering a UBSAN shift-out-of-bounds warning: UBSAN: shift-out-of-bounds in arch/arm64/kvm/rme.c:805:8 shift exponent 4294967295 is too large for 64-bit type 'long unsigned int' ... kvm_realm_unmap_range+0x1c4/0x1e8 kvm_arch_post_set_memory_attributes+0x58/0xd8 kvm_vm_set_mem_attributes+0x37c/0x600 Fixes: 3b3ecd0c3dc7 ("NVIDIA: VR: SAUCE: arm64: RME: Allow VMM to set RIPAS") Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Add Memory Encryption Context ID (MECID) support for Realms to provide better isolation between them when RMM supports MEC. - Bitmap-based private MECID allocation per Realm - Reference-counted shared MECID for backward compatibility - Userspace config via KVM_CAP_ARM_RME_CONFIG_REALM ioctl - MEC capability query interface (no arm.c changes needed) - Graceful fallback: MECID 0 when RMM lacks MEC support - Unconfigured realms default to shared MECID State managed via struct mecid_state with clear locking semantics. Policy enum: MEC_POLICY_{UNCONFIGURED,PRIVATE,SHARED}. Signed-off-by: Raghu Krishnamurthy <raghupathyk@nvidia.com> Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Add validation to ensure that the number of auxiliary granules returned by RMM via rmi_rec_aux_count() does not exceed the maximum allowed value of REC_PARAMS_AUX_GRANULES (16). This prevents potential buffer overflow in the aux_pages array which is statically defined with REC_PARAMS_AUX_GRANULES elements in struct realm_rec. If the RMM returns a value greater than 16, the realm creation is aborted with proper cleanup to maintain system integrity. Signed-off-by: Raghu Krishnamurthy <raghupathyk@nvidia.com> Signed-off-by: Ian May <ianm@nvidia.com>
…meter BugLink: https://bugs.launchpad.net/bugs/2139249 Add a module parameter to expose the KVM_CAP_ARM_RME capability number via sysfs. This allows userspace (QEMU) to discover the correct capability number at runtime rather than relying on hardcoded values that may become stale when capability numbers shift due to other patches being merged. The value is exposed at: /sys/module/kvm/parameters/kvm_cap_arm_rme Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 For ioremap(), so far we only checked if it was a device (RIPAS_DEV) to choose an encrypted vs decrypted mapping. However, we may have firmware reserved memory regions exposed to the OS (e.g., EFI Coco Secret Securityfs, ACPI CCEL). We need to make sure that anything that is RIPAS_RAM (i.e., Guest protected memory with RMM guarantees) are also mapped as encrypted. Rephrasing the above, anything that is not RIPAS_EMPTY is guaranteed to be protected by the RMM. Thus we choose encrypted mapping for anything that is not RIPAS_EMPTY. While at it, rename the helper function __arm64_is_protected_mmio => arm64_rsi_is_protected to clearly indicate that this not an arm64 generic helper, but something to do with Realms. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit fa84e53) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Add support for ACPI CCEL by handling the EfiACPIMemoryNVS type memory. As per UEFI specifications NVS memory is reserved for Firmware use even after exiting boot services. Thus map the region as read-only. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit d02c2e4) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Enable EFI COCO secrets support. Provide the ioremap_encrypted() support required by the driver. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit 9e8a3df) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ian May <ianm@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139249 Signed-off-by: Ian May <ianm@nvidia.com>
060fb12 to
aa740a4
Compare
Thanks, I've removed CONFIG_ARM_CCA_GUEST from debian.nvidia-6.17/config/annotations
Here is the location for the page size check, and I have confirmed this works as expected when booting the 4k vs the 64k kernel. void kvm_init_rme(void)
My recommendation would be to wait until we merge the conflicting PRs and then I can rebase and update the PR and hopefully be ready to send. |
I'm okay with that. I saw a Canonical ack on the dmabuf PR earlier today so hopefully they will be merged soon. |
Good catch! Comments have been fixed and pushed.
realm_create_rd is called once for every Realm VM launch. The threat of spamming dmesg should be low. Since this warning does indicate a RMM firmware bug, I think it is safer to keep as a WARN_ON. |
BugLink
[ Impact ]
ARM Confidential Compute Architecture (CCA) provides hardware-enforced isolation for confidential virtual machines called "Realms" on ARM64 platforms. This patch series enables CCA support for NVIDIA Vera platforms.
This series is based on the ARM KVM RME host support patches (v10), rebased for the 6.17 kernel:
https://lore.kernel.org/linux-coco/20250820145606.180644-1-steven.price@arm.com/
This series enables:
-KVM host support for creating and managing Realms via the Realm Management Extension (RME)
-MECID (Memory Encryption Context ID) for improved isolation between Realms
-Required CCA kernel configuration options
[ Test Plan ]
Deploy and test on NVIDIA Vera platform with RMM firmware
Verify Realm guest VMs boot and run successfully
CCA testing requires specialized hardware and firmware. Testing performed by NVIDIA CCA team.
[ Where problems could occur ]
Bugs in the KVM/RME integration could cause Realm guest failures or host instability. Issues would be limited to CCA-enabled platforms running Realm workloads.
[ Other Info ]
Patch summary:
43 patches for upstream v10 KVM/RME host support - marked as SAUCE because not in upstream kernel yet.
3 upstream cherry-picks:
arm64: realm: ioremap: Allow mapping memory as encrypted
arm64: acpi: Enable ACPI CCEL support
arm64: Enable EFI secret area Securityfs support
4 SAUCE patches:
arm64: RME: Fix UBSAN shift-out-of-bounds in kvm_realm_unmap_range
arm64: RME: Add MECID support
arm64: RME: Add bounds check
[Config] Update ARM CCA annotations