- Mar 05, 2023
-
-
Linus Torvalds authored
Commit aa47a7c2 ("lib/cpumask: deprecate nr_cpumask_bits") resulted in the cpumask operations potentially becoming hugely less efficient, because suddenly the cpumask was always considered to be variable-sized. The optimization was then later added back in a limited form by commit 6f9c07be ("lib/cpumask: add FORCE_NR_CPUS config option"), but that FORCE_NR_CPUS option is not useful in a generic kernel and more of a special case for embedded situations with fixed hardware. Instead, just re-introduce the optimization, with some changes. Instead of depending on CPUMASK_OFFSTACK being false, and then always using the full constant cpumask width, this introduces three different cpumask "sizes": - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids. This is used for situations where we should use the exact size. - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it fits in a single word and the bitmap operations thus end up able to trigger the "small_const_nbits()" optimizations. This is used for the operations that have optimized single-word cases that get inlined, notably the bit find and scanning functions. - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it is an sufficiently small constant that makes simple "copy" and "clear" operations more efficient. This is arbitrarily set at four words or less. As a an example of this situation, without this fixed size optimization, cpumask_clear() will generate code like movl nr_cpu_ids(%rip), %edx addq $63, %rdx shrq $3, %rdx andl $-8, %edx callq memset@PLT on x86-64, because it would calculate the "exact" number of longwords that need to be cleared. In contrast, with this patch, using a MAX_CPU of 64 (which is quite a reasonable value to use), the above becomes a single movq $0,cpumask instruction instead, because instead of caring to figure out exactly how many CPU's the system has, it just knows that the cpumask will be a single word and can just clear it all. Note that this does end up tightening the rules a bit from the original version in another way: operations that set bits in the cpumask are now limited to the actual nr_cpu_ids limit, whereas we used to do the nr_cpumask_bits thing almost everywhere in the cpumask code. But if you just clear bits, or scan for bits, we can use the simpler compile-time constants. In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()' which were not useful, and which fundamentally have to be limited to 'nr_cpu_ids'. Better remove them now than have somebody introduce use of them later. Of course, on x86-64 with MAXSMP there is no sane small compile-time constant for the cpumask sizes, and we end up using the actual CPU bits, and will generate the above kind of horrors regardless. Please don't use MAXSMP unless you really expect to have machines with thousands of cores. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Masahiro Yamada authored
include/linux/compiler-intel.h had no update in the past 3 years. We often forget about the third C compiler to build the kernel. For example, commit a0a12c3e ("asm goto: eradicate CC_HAS_ASM_GOTO") only mentioned GCC and Clang. init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC, and nobody has reported any issue. I guess the Intel Compiler support is broken, and nobody is caring about it. Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is deprecated: $ icc -v icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message. icc version 2021.7.0 (gcc version 12.1.0 compatibility) Arnd Bergmann provided a link to the article, "Intel C/C++ compilers complete adoption of LLVM". lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept untouched for better sync with https://github.com/facebook/zstd Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org> Acked-by:
Arnd Bergmann <arnd@arndb.de> Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Reviewed-by:
Nathan Chancellor <nathan@kernel.org> Reviewed-by:
Miguel Ojeda <ojeda@kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 03, 2023
-
-
Marco Elver authored
Now that memcpy/memset/memmove are no longer overridden by KASAN, we can just use the normal symbol names in uninstrumented files. Drop the preprocessor redefinitions. Link: https://lkml.kernel.org/r/20230224085942.1791837-4-elver@google.com Fixes: 69d4c0d3 ("entry, kasan, x86: Disallow overriding mem*() functions") Signed-off-by:
Marco Elver <elver@google.com> Reviewed-by:
Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linux Kernel Functional Testing <lkft@linaro.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Mar 02, 2023
-
-
Al Viro authored
openrisc equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
nios2 equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
microblaze equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
ia64 equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
sparc equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
alpha equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
parisc equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Tested-by:
Helge Deller <deller@gmx.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
hexagon equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Acked-by:
Brian Cain <bcain@quicinc.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
riscv equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Tested-by:
Björn Töpel <bjorn@kernel.org> Tested-by:
Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
m68k equivalent of 26178ec1 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access. Tested-by:
Finn Thain <fthain@linux-m68k.org> Tested-by:
Geert Uytterhoeven <geert@linux-m68k.org> Acked-by:
Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Vasily Gorbik authored
Recent test_kprobe_missed kprobes kunit test uncovers the following problem. Once kprobe is triggered from another kprobe (kprobe reenter), all future kprobes on this cpu are considered as kprobe reenter, thus pre_handler and post_handler are not being called and kprobes are counted as "missed". Commit b9599798 ("[S390] kprobes: activation and deactivation") introduced a simpler scheme for kprobes (de)activation and status tracking by using push_kprobe/pop_kprobe, which supposed to work for both initial kprobe entry as well as kprobe reentry and helps to avoid handling those two cases differently. The problem is that a sequence of calls in case of kprobes reenter: push_kprobe() <- NULL (current_kprobe) push_kprobe() <- kprobe1 (current_kprobe) pop_kprobe() -> kprobe1 (current_kprobe) pop_kprobe() -> kprobe1 (current_kprobe) leaves "kprobe1" as "current_kprobe" on this cpu, instead of setting it to NULL. In fact push_kprobe/pop_kprobe can only store a single state (there is just one prev_kprobe in kprobe_ctlblk). Which is a hack but sufficient, there is no need to have another prev_kprobe just to store NULL. To make a simple and backportable fix simply reset "prev_kprobe" when kprobe is poped from this "stack". No need to worry about "kprobe_status" in this case, because its value is only checked when current_kprobe != NULL. Cc: stable@vger.kernel.org Fixes: b9599798 ("[S390] kprobes: activation and deactivation") Reviewed-by:
Heiko Carstens <hca@linux.ibm.com> Signed-off-by:
Vasily Gorbik <gor@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Vasily Gorbik authored
Recent test_kprobe_missed kprobes kunit test uncovers the following error (reported when CONFIG_DEBUG_ATOMIC_SLEEP is enabled): BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 662, name: kunit_try_catch preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 no locks held by kunit_try_catch/662. irq event stamp: 280 hardirqs last enabled at (279): [<00000003e60a3d42>] __do_pgm_check+0x17a/0x1c0 hardirqs last disabled at (280): [<00000003e3bd774a>] kprobe_exceptions_notify+0x27a/0x318 softirqs last enabled at (0): [<00000003e3c5c890>] copy_process+0x14a8/0x4c80 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 46 PID: 662 Comm: kunit_try_catch Tainted: G N 6.2.0-173644-g44c18d77f0c0 #2 Hardware name: IBM 3931 A01 704 (LPAR) Call Trace: [<00000003e60a3a00>] dump_stack_lvl+0x120/0x198 [<00000003e3d02e82>] __might_resched+0x60a/0x668 [<00000003e60b9908>] __mutex_lock+0xc0/0x14e0 [<00000003e60bad5a>] mutex_lock_nested+0x32/0x40 [<00000003e3f7b460>] unregister_kprobe+0x30/0xd8 [<00000003e51b2602>] test_kprobe_missed+0xf2/0x268 [<00000003e51b5406>] kunit_try_run_case+0x10e/0x290 [<00000003e51b7dfa>] kunit_generic_run_threadfn_adapter+0x62/0xb8 [<00000003e3ce30f8>] kthread+0x2d0/0x398 [<00000003e3b96afa>] __ret_from_fork+0x8a/0xe8 [<00000003e60ccada>] ret_from_fork+0xa/0x40 The reason for this error report is that kprobes handling code failed to restore irqs. The problem is that when kprobe is triggered from another kprobe post_handler current sequence of enable_singlestep / disable_singlestep is the following: enable_singlestep <- original kprobe (saves kprobe_saved_imask) enable_singlestep <- kprobe triggered from post_handler (clobbers kprobe_saved_imask) disable_singlestep <- kprobe triggered from post_handler (restores kprobe_saved_imask) disable_singlestep <- original kprobe (restores wrong clobbered kprobe_saved_imask) There is just one kprobe_ctlblk per cpu and both calls saves and loads irq mask to kprobe_saved_imask. To fix the problem simply move resume_execution (which calls disable_singlestep) before calling post_handler. This also fixes the problem that post_handler is called with pt_regs which were not yet adjusted after single-stepping. Cc: stable@vger.kernel.org Fixes: 4ba069b8 ("[S390] add kprobes support.") Reviewed-by:
Heiko Carstens <hca@linux.ibm.com> Signed-off-by:
Vasily Gorbik <gor@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Alexandre Ghiti authored
Increase COMMAND_LINE_SIZE as the current default value is too low for syzbot kernel command line. There has been considerable discussion on this patch that has led to a larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all ports. That's not quite done yet, but it's gotten far enough we're confident this is not a uABI change so this is safe. Reported-by:
Dmitry Vyukov <dvyukov@google.com> Signed-off-by:
Alexandre Ghiti <alex@ghiti.fr> Link: https://lore.kernel.org/r/20210316193420.904-1-alex@ghiti.fr [Palmer: it's not uabi] Link: https://lore.kernel.org/linux-riscv/874b8076-b0d1-4aaa-bcd8-05d523060152@app.fastmail.com/#t Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
- Mar 01, 2023
-
-
Heiko Carstens authored
Keep the config S390 select list sorted. Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Gerald Schaefer authored
Commit f05f62d0 ("s390/vmem: get rid of memory segment list") reshuffled the call to vmem_add_mapping() in __segment_load(), which now overwrites rc after it was set to contain the segment type code. As result, __segment_load() will now always return 0 on success, which corresponds to the segment type code SEG_TYPE_SW, i.e. a writeable segment. This results in a kernel crash when loading a read-only segment as dcssblk block device, and trying to write to it. Instead of reshuffling code again, make sure to return the segment type on success, and also describe this rather delicate and unexpected logic in the function comment. Also initialize new segtype variable with invalid value, to prevent possible future confusion. Fixes: f05f62d0 ("s390/vmem: get rid of memory segment list") Cc: <stable@vger.kernel.org> # 5.9+ Signed-off-by:
Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by:
Heiko Carstens <hca@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Björn Töpel authored
The Zbb optimized strncmp has two parts; a fast path that does XLEN/8B per iteration, and a slow that does one byte per iteration. The idea is to compare aligned XLEN chunks for most of strings, and do the remainder tail in the slow path. The Zbb strncmp has two issues in the fast path: Incorrect remainder handling (wrong compare): Assume that the string length is 9. On 64b systems, the fast path should do one iteration, and one iteration in the slow path. Instead, both were done in the fast path, which lead to incorrect results. An example: strncmp("/dev/vda", "/dev/", 5); Correct by changing "bgt" to "bge". Missing NULL checks in the second string: This could lead to incorrect results for: strncmp("/dev/vda", "/dev/vda\0", 8); Correct by adding an additional check. Fixes: b6fcdb19 ("RISC-V: add zbb support to string functions") Suggested-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Signed-off-by:
Björn Töpel <bjorn@rivosinc.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Tested-by:
Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20230228184211.1585641-1-bjorn@kernel.org Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
- Feb 28, 2023
-
-
Jiaxun Yang authored
Clang is unable to handle the situation that a chunk of inline assembly ends with a compat branch instruction and then compiler generates another control transfer instruction immediately after this compat branch. The later instruction will end up in forbidden slot and cause exception. Workaround by add a option to control the use of compact branch. Currently it's selected by CC_IS_CLANG and hopefully we can change it to a version check in future if clang manages to fix it. Fix boot on boston board. Link: https://github.com/llvm/llvm-project/issues/61045 Signed-off-by:
Jiaxun Yang <jiaxun.yang@flygoat.com> Acked-by:
Nathan Chancellor <nathan@kernel.org> Acked-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
-
Sergio Paracuellos authored
To allow to access system controller registers from watchdog driver code add a phandle in the watchdog 'wdt' node. This avoid using arch dependent operations in driver code. Reviewed-by:
Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by:
Sergio Paracuellos <sergio.paracuellos@gmail.com> Acked-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
-
Sergio Paracuellos authored
Watchdog nodes must use 'watchdog' for node name. When a 'make dtbs_check' is performed the following warning appears: wdt@100: $nodename:0: 'wdt@100' does not match '^watchdog(@.*|-[0-9a-f])?$' Fix this warning up properly renaming the node into 'watchdog'. Reviewed-by:
Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by:
Sergio Paracuellos <sergio.paracuellos@gmail.com> Acked-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
-
Heiko Stuebner authored
Adapt the suggestions for the assembly string functions that Andrew suggested but that I didn't manage to include into the series that got applied. This includes improvements to two comments, removal of unneeded labels and moving one instruction slightly higher to contradict an explanatory comment. Suggested-by:
Andrew Jones <ajones@ventanamicro.com> Signed-off-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by:
Andrew Jones <ajones@ventanamicro.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230208225328.1636017-3-heiko@sntech.de Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Vasily Gorbik authored
Produce arch/s390/boot/vmlinux.map link map for the decompressor, when CONFIG_VMLINUX_MAP option is enabled. Link map is quite useful during making kernel changes related to how the decompressor is composed and debugging linker scripts. Acked-by:
Heiko Carstens <hca@linux.ibm.com> Signed-off-by:
Vasily Gorbik <gor@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Heiko Carstens authored
Clear CPU state (e.g. all TLB entries, prefetched instructions, etc.) of the target CPU, however without clearing register contents before starting any work on it. This puts the target CPU in a more defined state compared to the current Stop + Restart sigp orders. Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Alexander Gordeev authored
If a machine check interrupt hits while user process is running __s390_handle_mcck() helper function is called directly from the interrupt handler and terminates the current process by calling make_task_dead() routine. The make_task_dead() is not allowed to be called from interrupt context which forces the machine check handler switch to the kernel stack and enable local interrupts first. The __s390_handle_mcck() could also be called to service pending work, but this time from the external interrupts handler. It is the machine check handler that establishes the work and schedules the external interrupt, therefore the machine check interrupt itself should be disabled while reading out the corresponding variable: local_mcck_disable(); mcck = *this_cpu_ptr(&cpu_mcck); memset(this_cpu_ptr(&cpu_mcck), 0, sizeof(mcck)); local_mcck_enable(); However, local_mcck_disable() does not have effect when __s390_handle_mcck() is called directly form the machine check handler, since the machine check interrupt is still disabled. Therefore, it is not the opening bracket to the following local_mcck_enable() function. Simplify the user process termination flow by scheduling the external interrupt and killing the affected process from the interrupt context. Assume a kernel-generated signal is always delivered and ignore a value returned by do_send_sig_info() funciton. Reviewed-by:
Heiko Carstens <hca@linux.ibm.com> Reviewed-by:
Sven Schnelle <svens@linux.ibm.com> Signed-off-by:
Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Heiko Carstens authored
Use READ_ONCE_ALIGNED_128() to read the previous value in front of a 128-bit cmpxchg loop, instead of (mis-)using a 128-bit cmpxchg operation to do the same. This makes the code more readable and is faster. Link: https://lore.kernel.org/r/20230224100237.3247871-3-hca@linux.ibm.com Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Heiko Carstens authored
Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for fast block concurrent (atomic) 128-bit accesses. The used lpq instruction requires 128-bit alignment. This is also the reason why the compiler doesn't emit this instruction if __READ_ONCE() is used for 128-bit accesses. Link: https://lore.kernel.org/r/20230224100237.3247871-2-hca@linux.ibm.com Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Ard Biesheuvel authored
Our virtual KASLR displacement is a randomly chosen multiple of 2 MiB plus an offset that is equal to the physical placement modulo 2 MiB. This arrangement ensures that we can always use 2 MiB block mappings (or contiguous PTE mappings for 16k or 64k pages) to map the kernel. This means that a KASLR offset of less than 2 MiB is simply the product of this physical displacement, and no randomization has actually taken place. Currently, we use 'kaslr_offset() > 0' to decide whether or not randomization has occurred, and so we misidentify this case. If the kernel image placement is not randomized, modules are allocated from a dedicated region below the kernel mapping, which is only used for modules and not for other vmalloc() or vmap() calls. When randomization is enabled, the kernel image is vmap()'ed randomly inside the vmalloc region, and modules are allocated in the vicinity of this mapping to ensure that relative references are always in range. However, unlike the dedicated module region below the vmalloc region, this region is not reserved exclusively for modules, and so ordinary vmalloc() calls may end up overlapping with it. This should rarely happen, given that vmalloc allocates bottom up, although it cannot be ruled out entirely. The misidentified case results in a placement of the kernel image within 2 MiB of its default address. However, the logic that randomizes the module region is still invoked, and this could result in the module region overlapping with the start of the vmalloc region, instead of using the dedicated region below it. If this happens, a single large vmalloc() or vmap() call will use up the entire region, and leave no space for loading modules after that. Since commit 82046702 ("efi/libstub/arm64: Replace 'preferred' offset with alignment check"), this is much more likely to occur on systems that boot via EFI but lack an implementation of the EFI RNG protocol, as in that case, the EFI stub will decide to leave the image where it found it, and the EFI firmware uses 64k alignment only. Fix this, by correctly identifying the case where the virtual displacement is a result of the physical displacement only. Signed-off-by:
Ard Biesheuvel <ardb@kernel.org> Reviewed-by:
Mark Brown <broonie@kernel.org> Acked-by:
Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230223204101.1500373-1-ardb@kernel.org Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
Florian reports that when building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, he sees "Misaligned patch-site" warnings at boot, e.g. | Misaligned patch-site bcm2836_arm_irqchip_handle_irq+0x0/0x88 | WARNING: CPU: 0 PID: 0 at arch/arm64/kernel/ftrace.c:120 ftrace_call_adjust+0x4c/0x70 This is because GCC will silently ignore `-falign-functions=N` when passed `-Os`, resulting in functions not being aligned as we expect. This is a known issue, and to account for this we modified the kernel to avoid `-Os` generally. Unfortunately we forgot to account for CONFIG_CC_OPTIMIZE_FOR_SIZE. Forbid the use of CALL_OPS with CONFIG_CC_OPTIMIZE_FOR_SIZE=y to prevent this issue. All exising ftrace features will work as before, though without the performance benefit of CALL_OPS. Reported-by:
Florian Fainelli <f.fainelli@gmail.com> Link: http://lore.kernel.org/linux-arm-kernel/2d9284c3-3805-402b-5423-520ced56d047@gmail.com Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Stefan Wahren <stefan.wahren@i2se.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will@kernel.org> Tested-by:
Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230227115819.365630-1-mark.rutland@arm.com Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
Michael Ellerman authored
Although powerpc now has objtool mcount support, it's not enabled in all configurations due to dependencies. On those configurations, with some linkers (binutils 2.37 at least), it's still possible to hit the dreaded "recordmcount bug", eg. errors such as: CC kernel/kexec_file.o Cannot find symbol for section 10: .text.unlikely. kernel/kexec_file.o: failed make[1]: *** [scripts/Makefile.build:287 : kernel/kexec_file.o] Error 1 Those errors are much more prevalent when building with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, because it places every function in a separate section. CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is marked experimental and is not enabled in any powerpc defconfigs or by major distros. Although it does have at least some users on 32-bit where kernel size tends to be more important. Avoid the build errors by blocking CONFIG_LD_DEAD_CODE_DATA_ELIMINATION when the build is using recordmcount, rather than objtool. In practice that means for 64-bit big endian builds, or 64-bit clang builds - both because they lack CONFIG_MPROFILE_KERNEL. On 32-bit objtool is always used, so CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is still available there. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230221130331.2714199-1-mpe@ellerman.id.au
-
Michael Ellerman authored
When KASAN/KCSAN are enabled clang generates .text.asan/tsan sections. Because they are not mentioned in the linker script warnings are generated, and when orphan handling is set to error that becomes a build error, eg: ld.lld: error: vmlinux.a(init/main.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' ld.lld: error: vmlinux.a(init/version.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor' Fix it by adding the sections to our linker script, similar to the generic change made in 84837881 ("vmlinux.lds.h: Handle clang's module.{c,d}tor sections"). Reviewed-by:
Nathan Chancellor <nathan@kernel.org> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230222060037.2897169-1-mpe@ellerman.id.au
-
- Feb 27, 2023
-
-
Arınç ÜNAL authored
Currently, out of every Ralink SoC, only the dt-binding of the MT7621 SoC uses pinctrl. Because of this, PINCTRL is not selected at all. Make SOC_MT7621 select PINCTRL. Remove PINCTRL_MT7621, enabling it for the MT7621 SoC will be handled under the PINCTRL_MT7621 option. Signed-off-by:
Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by:
Sergio Paracuellos <sergio.paracuellos@gmail.com> Signed-off-by:
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
-
Arınç ÜNAL authored
All MIPS processors on the Ralink SoCs implement the MIPS32 Release 2 Architecture. Remove SYS_HAS_CPU_MIPS32_R1. Signed-off-by:
Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by:
Sergio Paracuellos <sergio.paracuellos@gmail.com> Signed-off-by:
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
-
Jiaxun Yang authored
In c0_compare_int_usable we clear compare interrupt by write value just read out from counter to compare register. However sometimes if those all instructions are graduated together then it's possible that at the time compare register is written, the counter haven't progressed, thus the interrupt is triggered again. It also applies to QEMU that instructions is executed significantly faster then counter. Offset the value used to clear interrupt by one to prevent that happen. Signed-off-by:
Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by:
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
-
Jiaxun Yang authored
CP0_CMGCRBASE is not always available on CPS enabled system such as early proAptiv. For early SMP bring up where we can't safely access memeory, we patch the entry of CPS NMI vector to inject CMGCR address directly into register during early core bringup. For VPE bringup as the core is already coherenct at that point we just read the variable to obtain the address. Signed-off-by:
Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by:
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
-
KP Singh authored
When plain IBRS is enabled (not enhanced IBRS), the logic in spectre_v2_user_select_mitigation() determines that STIBP is not needed. The IBRS bit implicitly protects against cross-thread branch target injection. However, with legacy IBRS, the IBRS bit is cleared on returning to userspace for performance reasons which leaves userspace threads vulnerable to cross-thread branch target injection against which STIBP protects. Exclude IBRS from the spectre_v2_in_ibrs_mode() check to allow for enabling STIBP (through seccomp/prctl() by default or always-on, if selected by spectre_v2_user kernel cmdline parameter). [ bp: Massage. ] Fixes: 7c693f54 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS") Reported-by:
José Oliveira <joseloliveira11@gmail.com> Reported-by:
Rodrigo Branco <rodrigo@kernelhacking.com> Signed-off-by:
KP Singh <kpsingh@kernel.org> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230220120127.1975241-1-kpsingh@kernel.org Link: https://lore.kernel.org/r/20230221184908.2349578-1-kpsingh@kernel.org
-
Harald Freudenberger authored
Introduce a new ap queue status register wrapper union to access register wide values. So the inline assembler only sees register wide values but the surrounding code may use a more structured view of the same value and a reader of the code (and the compiler) gets a clear understanding about the mapping between fields and register values. All the changes to access the ap queue status are local to the inline functions within ap.h. However, the struct ap_qirq_ctrl has been replaces by a union for same reason and this needed slight adaptions in the calling code. Suggested-by:
Halil Pasic <pasic@linux.ibm.com> Suggested-by:
Andreas Arnez <arnez@linux.ibm.com> Signed-off-by:
Harald Freudenberger <freude@linux.ibm.com> Acked-by:
Heiko Carstens <hca@linux.ibm.com> Reviewed-by:
Holger Dengler <dengler@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Nico Boehr authored
When a machine check is received while in SIE, it is reinjected into the guest in some cases. The respective code needs to access the sie_block, which is taken from the backed up R14. Since reinjection only occurs while we are in SIE (i.e. between the labels sie_entry and sie_leave in entry.S and thus if CIF_MCCK_GUEST is set), the backed up R14 will always contain a physical address in s390_backup_mcck_info. This currently works, because virtual and physical addresses are the same. Add phys_to_virt() to resolve the virtual-physical confusion. Signed-off-by:
Nico Boehr <nrb@linux.ibm.com> Reviewed-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by:
Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20230216121208.4390-2-nrb@linux.ibm.com Signed-off-by:
Janosch Frank <frankja@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-
Vasily Gorbik authored
Currently there are several kernel command line parameters which are only parsed and handled in decompressor and not known to the kernel. This leads to the following error message during kernel boot: Unknown kernel command line parameters "mem=3G nokaslr", will be passed to user space. To avoid confusion, register those parameters with an empty stub so that kernel does not complain about them. Reported-by:
Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by:
Heiko Carstens <hca@linux.ibm.com> Signed-off-by:
Vasily Gorbik <gor@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com>
-