[PW_SID:958089] context_tracking,x86: Defer some IPIs until a user->kernel transition#353
[PW_SID:958089] context_tracking,x86: Defer some IPIs until a user->kernel transition#353linux-riscv-bot wants to merge 26 commits into
Conversation
call_dest_name() does not get passed the file pointer of validate_call(),
which means its invocation of insn_reloc() will always return NULL. Make it
take a file pointer.
While at it, make sure call_dest_name() uses arch_dest_reloc_offset(),
otherwise it gets the pv_ops[] offset wrong.
Fabricating an intentional warning shows the change; previously:
vmlinux.o: warning: objtool: __flush_tlb_all_noinstr+0x4: call to {dynamic}() leaves .noinstr.text section
now:
vmlinux.o: warning: objtool: __flush_tlb_all_noinstr+0x4: call to pv_ops[1]() leaves .noinstr.text section
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
I had to look into objtool itself to understand what this warning was about; make it more explicit. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
A later commit will reduce the size of the RCU watching counter to free up some bits for another purpose. Paul suggested adding a config option to test the extreme case where the counter is reduced to its minimum usable width for rcutorture to poke at, so do that. Make it only configurable under RCU_EXPERT. While at it, add a comment to explain the layout of context_tracking->state. Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
We now have an RCU_EXPERT config for testing small-sized RCU dynticks counter: CONFIG_RCU_DYNTICKS_TORTURE. Modify scenario TREE04 to exercise to use this config in order to test a ridiculously small counter (2 bits). Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Deferring a code patching IPI is unsafe if the patched code is in a noinstr region. In that case the text poke code must trigger an immediate IPI to all CPUs, which can rudely interrupt an isolated NO_HZ CPU running in userspace. Some noinstr static branches may really need to be patched at runtime, despite the resulting disruption. Add DEFINE_STATIC_KEY_*_NOINSTR() variants for those. They don't do anything special yet; that will come later. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Deferring a code patching IPI is unsafe if the patched code is in a noinstr region. In that case the text poke code must trigger an immediate IPI to all CPUs, which can rudely interrupt an isolated NO_HZ CPU running in userspace. If a noinstr static call only needs to be patched during boot, its key can be made ro-after-init to ensure it will never be patched at runtime. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Later commits will cause objtool to warn about static calls being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. pv_sched_clock is updated in: o __init vmware_paravirt_ops_setup() o __init xen_init_time_common() o kvm_sched_clock_init() <- __init kvmclock_init() o hv_setup_sched_clock() <- __init hv_init_tsc_clocksource() IOW purely init context, and can thus be marked as __ro_after_init. Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Later commits will cause objtool to warn about static calls being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. x86_idle is updated in: o xen_set_default_idle() <- __init xen_arch_setup() o __init select_idle_routine() IOW purely init context, and can thus be marked as __ro_after_init. Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The static call is only ever updated in __init pv_time_init() __init xen_init_time_common() __init vmware_paravirt_ops_setup() __init xen_time_setup_guest( so mark it appropriately as __ro_after_init. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The static call is only ever updated in: __init pv_time_init() __init xen_time_setup_guest() so mark it appropriately as __ro_after_init. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The static call is only ever updated in __init pv_time_init() __init xen_time_setup_guest() so mark it appropriately as __ro_after_init. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The static call is only ever updated in __init pv_time_init() __init xen_time_setup_guest() so mark it appropriately as __ro_after_init. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The static call is only ever updated in __init xen_time_setup_guest() so mark it appropriately as __ro_after_init. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Later commits will cause objtool to warn about static calls being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. perf_lopwr_cb is used in .noinstr code, but is only ever updated in __init amd_brs_lopwr_init(), and can thus be marked as __ro_after_init. Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
sched_clock_running is only ever enabled in the __init functions sched_clock_init() and sched_clock_init_late(), and is never disabled. Mark it __ro_after_init. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The static key is only ever enabled in __init hv_init_evmcs() so mark it appropriately as __ro_after_init. Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Later commits will cause objtool to warn about static keys being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. mds_idle_clear is used in .noinstr code, and can be modified at runtime (SMT hotplug). Suppressing the text_poke_sync() IPI has little benefits for this key, as hotplug implies eventually going through takedown_cpu() -> stop_machine_cpuslocked() which is going to cause interference on all online CPUs anyway. Mark it to let objtool know not to warn about it. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Later commits will cause objtool to warn about static keys being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. __sched_clock_stable is used in .noinstr code, and can be modified at runtime (e.g. time_cpufreq_notifier()). Suppressing the text_poke_sync() IPI has little benefits for this key, as NOHZ_FULL is incompatible with an unstable TSC anyway. Mark it to let objtool know not to warn about it. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…lowed in .noinstr Later commits will cause objtool to warn about static keys being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. These keys are used in .noinstr code, and can be modified at runtime (/proc/kernel/vmx* write). However it is not expected that they will be flipped during latency-sensitive operations, and thus shouldn't be a source of interference wrt the text patching IPI. Mark it to let objtool know not to warn about it. Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Later commits will cause objtool to warn about static keys being used in .noinstr sections in order to safely defer instruction patching IPIs targeted at NOHZ_FULL CPUs. stack_erasing_bypass is used in .noinstr code, and can be modified at runtime (proc/sys/kernel/stack_erasing write). However it is not expected that it will be flipped during latency-sensitive operations, and thus shouldn't be a source of interference wrt the text patching IPI. Mark it to let objtool know not to warn about it. Reported-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Warn about static branches/calls in noinstr regions, unless the corresponding key is RO-after-init or has been manually whitelisted with DEFINE_STATIC_KEY_*_NOINSTR((). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> [Added NULL check for insn_call_dest() return value] Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
The text_size bit referred to by the comment has been removed as of commit ac3b432 ("module: replace module_layout with module_memory") and is thus no longer relevant. Remove it and comment about the contents of the masks array instead. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
As pointed out by Sean [1], is_kernel_noinstr_text() will return false for an address contained within a module's .noinstr.text section. A later patch will require checking whether a text address is noinstr, and this can unfortunately be the case of modules - KVM is one such case. A module's .noinstr.text section is already tracked as of commit 66e9b07 ("kprobes: Prevent probes in .noinstr.text section") for kprobe blacklisting purposes, but via an ad-hoc mechanism. Add a MOD_NOINSTR_TEXT mem_type, and reorganize __layout_sections() so that it maps all the sections in a single invocation. [1]: http://lore.kernel.org/r/Z4qQL89GZ_gk0vpu@google.com Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
smp_call_function() & friends have the unfortunate habit of sending IPIs to isolated, NOHZ_FULL, in-userspace CPUs, as they blindly target all online CPUs. Some callsites can be bent into doing the right, such as done by commit: cc9e303 ("x86/cpu: Disable frequency requests via aperfmperf IPI for nohz_full CPUs") Unfortunately, not all SMP callbacks can be omitted in this fashion. However, some of them only affect execution in kernelspace, which means they don't have to be executed *immediately* if the target CPU is in userspace: stashing the callback and executing it upon the next kernel entry would suffice. x86 kernel instruction patching or kernel TLB invalidation are prime examples of it. Reduce the RCU dynticks counter width to free up some bits to be used as a deferred callback bitmask. Add some build-time checks to validate that setup. Presence of CT_RCU_WATCHING in the ct_state prevents queuing deferred work. Later commits introduce the bit:callback mappings. Link: https://lore.kernel.org/all/20210929151723.162004989@infradead.org/ Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
text_poke_bp_batch() sends IPIs to all online CPUs to synchronize them vs the newly patched instruction. CPUs that are executing in userspace do not need this synchronization to happen immediately, and this is actually harmful interference for NOHZ_FULL CPUs. As the synchronization IPIs are sent using a blocking call, returning from text_poke_bp_batch() implies all CPUs will observe the patched instruction(s), and this should be preserved even if the IPI is deferred. In other words, to safely defer this synchronization, any kernel instruction leading to the execution of the deferred instruction sync (ct_work_flush()) must *not* be mutable (patchable) at runtime. This means we must pay attention to mutable instructions in the early entry code: - alternatives - static keys - static calls - all sorts of probes (kprobes/ftrace/bpf/???) The early entry code leading to ct_work_flush() is noinstr, which gets rid of the probes. Alternatives are safe, because it's boot-time patching (before SMP is even brought up) which is before any IPI deferral can happen. This leaves us with static keys and static calls. Any static key used in early entry code should be only forever-enabled at boot time, IOW __ro_after_init (pretty much like alternatives). Exceptions are explicitly marked as allowed in .noinstr and will always generate an IPI when flipped. The same applies to static calls - they should be only updated at boot time, or manually marked as an exception. Objtool is now able to point at static keys/calls that don't respect this, and all static keys/calls used in early entry code have now been verified as behaving appropriately. Leverage the new context_tracking infrastructure to defer sync_core() IPIs to a target CPU's next kernel entry. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
|
Patch 1: "[v5,01/25] objtool: Make validate_call() recognize indirect calls to pv_ops[]" |
|
Patch 1: "[v5,01/25] objtool: Make validate_call() recognize indirect calls to pv_ops[]" |
|
Patch 1: "[v5,01/25] objtool: Make validate_call() recognize indirect calls to pv_ops[]" |
|
Patch 1: "[v5,01/25] objtool: Make validate_call() recognize indirect calls to pv_ops[]" |
|
Patch 23: "[v5,23/25] module: Add MOD_NOINSTR_TEXT mem_type" |
|
Patch 23: "[v5,23/25] module: Add MOD_NOINSTR_TEXT mem_type" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 24: "[v5,24/25] context-tracking: Introduce work deferral infrastructure" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
|
Patch 25: "[v5,25/25] context_tracking,x86: Defer kernel text patching IPIs" |
4d9ad71 to
625be03
Compare
PR for series 958089 applied to workflow__riscv__fixes
Name: context_tracking,x86: Defer some IPIs until a user->kernel transition
URL: https://patchwork.kernel.org/project/linux-riscv/list/?series=958089
Version: 5