Skip to content
  1. Apr 25, 2019
    • Ingo Molnar's avatar
      x86/paravirt: Match paravirt patchlet field definition ordering to initialization ordering · fc93dfd9
      Ingo Molnar authored
      Here's the objdump -D output of the PATCH_XXL data table:
      
      0000000000000010 <patch_data_xxl>:
        10:   fa                      cli
        11:   fb                      sti
        12:   57                      push   %rdi
        13:   9d                      popfq
        14:   9c                      pushfq
        15:   58                      pop    %rax
        16:   0f 20 d0                mov    %cr2,%rax
        19:   0f 20 d8                mov    %cr3,%rax
        1c:   0f 22 df                mov    %rdi,%cr3
        1f:   0f 09                   wbinvd
        21:   0f 01 f8                swapgs
        24:   48 0f 07                sysretq
        27:   0f 01 f8                swapgs
        2a:   48 89 f8                mov    %rdi,%rax
      
      Note how this doesn't match up to the source code:
      
      static const struct patch_xxl patch_data_xxl = {
              .irq_irq_disable        = { 0xfa },             // cli
              .irq_irq_enable         = { 0xfb },             // sti
              .irq_save_fl            = { 0x9c, 0x58 },       // pushf; pop %[re]ax
              .mmu_read_cr2           = { 0x0f, 0x20, 0xd0 }, // mov %cr2, %[re]ax
              .mmu_read_cr3           = { 0x0f, 0x20, 0xd8 }, // mov %cr3, %[re]ax
              .irq_restore_fl         = { 0x57, 0x9d },       // push %rdi; popfq
              .mmu_write_cr3          = { 0x0f, 0x22, 0xdf }, // mov %rdi, %cr3
              .cpu_wbinvd             = { 0x0f, 0x09 },       // wbinvd
              .cpu_usergs_sysret64    = { 0x0f, 0x01, 0xf8,
                                          0x48, 0x0f, 0x07 }, // swapgs; sysretq
              .cpu_swapgs             = { 0x0f, 0x01, 0xf8 }, // swapgs
              .mov64                  = { 0x48, 0x89, 0xf8 }, // mov %rdi, %rax
              .irq_restore_fl         = { 0x50, 0x9d },       // push %eax; popf
              .mmu_write_cr3          = { 0x0f, 0x22, 0xd8 }, // mov %eax, %cr3
              .cpu_iret               = { 0xcf },             // iret
      };
      
      Note how they are reordered: in the generated code .irq_restore_fl comes
      before .irq_save_fl, etc. This is because the field ordering in struct
      patch_xxl does not match the initialization ordering of patch_data_xxl.
      
      Match up the initialization order with the definition order - this makes
      the disassembly easily reviewable:
      
      0000000000000010 <patch_data_xxl>:
        10:   fa                      cli
        11:   fb                      sti
        12:   9c                      pushfq
        13:   58                      pop    %rax
        14:   0f 20 d0                mov    %cr2,%rax
        17:   0f 20 d8                mov    %cr3,%rax
        1a:   0f 22 df                mov    %rdi,%cr3
        1d:   57                      push   %rdi
        1e:   9d                      popfq
        1f:   0f 09                   wbinvd
        21:   0f 01 f8                swapgs
        24:   48 0f 07                sysretq
        27:   0f 01 f8                swapgs
        2a:   48 89 f8                mov    %rdi,%rax
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190425081012.GA115378@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fc93dfd9
    • Thomas Gleixner's avatar
      x86/paravirt: Replace the paravirt patch asm magic · 0b9d2fc1
      Thomas Gleixner authored
      
      
      The magic macro DEF_NATIVE() in the paravirt patching code uses inline
      assembly to generate a data table for patching in the native instructions.
      
      While clever this is falling apart with LTO and even aside of LTO the
      construct is just working by chance according to GCC folks.
      
      Aside of that the tables are constant data and not some form of magic
      text.
      
      As these constructs are not subject to frequent changes it is not a
      maintenance issue to convert them to regular data tables which are
      initialized with hex bytes.
      
      Create a new set of macros and data structures to store the instruction
      sequences and convert the code over.
      
      Reported-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: http://lkml.kernel.org/r/20190424134223.690835713@linutronix.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0b9d2fc1
    • Thomas Gleixner's avatar
      x86/paravirt: Unify the 32/64 bit paravirt patching code · fb2af071
      Thomas Gleixner authored
      
      
      Large parts of these two files are identical. Merge them together.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: http://lkml.kernel.org/r/20190424134223.603491680@linutronix.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fb2af071
    • Ingo Molnar's avatar
      x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call() · 11e86dc7
      Ingo Molnar authored
      paravirt_patch_call() currently handles patching failures inconsistently:
      we generate a warning in the retpoline case, but don't in other cases where
      we might end up with a non-working kernel as well.
      
      So just convert it all to a BUG_ON(), these patching calls are *not* supposed
      to fail, and if they do we want to know it immediately.
      
      This also makes the kernel smaller and removes an #ifdef ugly.
      
      I tried it with a richly paravirt-enabled kernel and no patching bugs
      were detected.
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190425095039.GC115378@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      11e86dc7
    • Ingo Molnar's avatar
      x86/paravirt: Detect over-sized patching bugs in paravirt_patch_insns() · 2777cae2
      Ingo Molnar authored
      So paravirt_patch_insns() contains this gem of logic:
      
      unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
                                    const char *start, const char *end)
      {
              unsigned insn_len = end - start;
      
              if (insn_len > len || start == NULL)
                      insn_len = len;
              else
                      memcpy(insnbuf, start, insn_len);
      
              return insn_len;
      }
      
      Note how 'len' (size of the original instruction) is checked against the new
      instruction, and silently discarded with no warning printed whatsoever.
      
      This crashes the kernel in funny ways if the patching template is buggy,
      and usually in much later places.
      
      Instead do a direct BUG_ON(), there's no way to continue successfully at that point.
      
      I've tested this patch, with the vanilla kernel check never triggers, and
      if I intentionally increase the size of one of the patch templates to a
      too high value the assert triggers:
      
      [    0.164385] kernel BUG at arch/x86/kernel/paravirt.c:167!
      
      Without this patch a broken kernel randomly crashes in later places,
      after the silent patching failure.
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190425091717.GA72229@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2777cae2
    • Thomas Gleixner's avatar
      x86/paravirt: Remove bogus extern declarations · e0519640
      Thomas Gleixner authored
      
      
      These functions are already declared in asm/paravirt.h
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Link: http://lkml.kernel.org/r/20190424134223.501598258@linutronix.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e0519640
  2. Apr 19, 2019
  3. Apr 18, 2019
    • Nathan Chancellor's avatar
      arm64: futex: Restore oldval initialization to work around buggy compilers · ff8acf92
      Nathan Chancellor authored
      Commit 045afc24 ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with
      non-zero result value") removed oldval's zero initialization in
      arch_futex_atomic_op_inuser because it is not necessary. Unfortunately,
      Android's arm64 GCC 4.9.4 [1] does not agree:
      
      ../kernel/futex.c: In function 'do_futex':
      ../kernel/futex.c:1658:17: warning: 'oldval' may be used uninitialized
      in this function [-Wmaybe-uninitialized]
         return oldval == cmparg;
                       ^
      In file included from ../kernel/futex.c:73:0:
      ../arch/arm64/include/asm/futex.h:53:6: note: 'oldval' was declared here
        int oldval, ret, tmp;
            ^
      
      GCC fails to follow that when ret is non-zero, futex_atomic_op_inuser
      returns right away, avoiding the uninitialized use that it claims.
      Restoring the zero initialization works around this issue.
      
      [1]: https://android.googlesource.com/platform/prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/
      
      
      
      Cc: stable@vger.kernel.org
      Fixes: 045afc24 ("arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value")
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      ff8acf92
    • Kim Phillips's avatar
      perf/x86/amd: Add event map for AMD Family 17h · 3fe3331b
      Kim Phillips authored
      Family 17h differs from prior families by:
      
       - Does not support an L2 cache miss event
       - It has re-enumerated PMC counters for:
         - L2 cache references
         - front & back end stalled cycles
      
      So we add a new amd_f17h_perfmon_event_map[] so that the generic
      perf event names will resolve to the correct h/w events on
      family 17h and above processors.
      
      Reference sections 2.1.13.3.3 (stalls) and 2.1.13.3.6 (L2):
      
        https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
      
      
      
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors")
      [ Improved the formatting a bit. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3fe3331b
    • Baoquan He's avatar
      x86/mm/KASLR: Fix the size of the direct mapping section · ec393710
      Baoquan He authored
      
      
      kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
      the maximum amount of system RAM supported. The size of the direct
      mapping section is obtained from the smaller one of the below two
      values:
      
        (actual system RAM size + padding size) vs (max system RAM size supported)
      
      This calculation is wrong since commit
      
        b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").
      
      In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
      the kernel is using 4-level or 5-level page tables. Thus, it will always
      use 4 PB as the maximum amount of system RAM, even in 4-level paging
      mode where it should actually be 64 TB.
      
      Thus, the size of the direct mapping section will always
      be the sum of the actual system RAM size plus the padding size.
      
      Even when the amount of system RAM is 64 TB, the following layout will
      still be used. Obviously KALSR will be weakened significantly.
      
         |____|_______actual RAM_______|_padding_|______the rest_______|
         0            64TB                                            ~120TB
      
      Instead, it should be like this:
      
         |____|_______actual RAM_______|_________the rest______________|
         0            64TB                                            ~120TB
      
      The size of padding region is controlled by
      CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.
      
      The above issue only exists when
      CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
      which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
      using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.
      
      Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.
      
       [ bp: Massage commit message. ]
      
      Fixes: b83ce5ee ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarThomas Garnier <thgarnie@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: frank.ramsay@hpe.com
      Cc: herbert@gondor.apana.org.au
      Cc: kirill@shutemov.name
      Cc: mike.travis@hpe.com
      Cc: thgarnie@google.com
      Cc: x86-ml <x86@kernel.org>
      Cc: yamada.masahiro@socionext.com
      Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
      ec393710
  4. Apr 17, 2019
  5. Apr 16, 2019
  6. Apr 15, 2019
    • Aurelien Jarno's avatar
      MIPS: scall64-o32: Fix indirect syscall number load · 79b4a9cf
      Aurelien Jarno authored
      
      
      Commit 4c21b8fd (MIPS: seccomp: Handle indirect system calls (o32))
      added indirect syscall detection for O32 processes running on MIPS64,
      but it did not work correctly for big endian kernel/processes. The
      reason is that the syscall number is loaded from ARG1 using the lw
      instruction while this is a 64-bit value, so zero is loaded instead of
      the syscall number.
      
      Fix the code by using the ld instruction instead. When running a 32-bit
      processes on a 64 bit CPU, the values are properly sign-extended, so it
      ensures the value passed to syscall_trace_enter is correct.
      
      Recent systemd versions with seccomp enabled whitelist the getpid
      syscall for their internal  processes (e.g. systemd-journald), but call
      it through syscall(SYS_getpid). This fix therefore allows O32 big endian
      systems with a 64-bit kernel to run recent systemd versions.
      
      Signed-off-by: default avatarAurelien Jarno <aurelien@aurel32.net>
      Cc: <stable@vger.kernel.org> # v3.15+
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      79b4a9cf
    • Arnd Bergmann's avatar
      arch: add pidfd and io_uring syscalls everywhere · 39036cd2
      Arnd Bergmann authored
      
      
      Add the io_uring and pidfd_send_signal system calls to all architectures.
      
      These system calls are designed to handle both native and compat tasks,
      so all entries are the same across architectures, only arm-compat and
      the generic tale still use an old format.
      
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> (s390)
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      39036cd2
Loading