Skip to content
  1. May 02, 2017
  2. Apr 30, 2017
  3. Apr 28, 2017
    • Baoquan He's avatar
      x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails · da63b6b2
      Baoquan He authored
      
      
      Dave found that a kdump kernel with KASLR enabled will reset to the BIOS
      immediately if physical randomization failed to find a new position for
      the kernel. A kernel with the 'nokaslr' option works in this case.
      
      The reason is that KASLR will install a new page table for the identity
      mapping, while it missed building it for the original kernel location
      if KASLR physical randomization fails.
      
      This only happens in the kexec/kdump kernel, because the identity mapping
      has been built for kexec/kdump in the 1st kernel for the whole memory by
      calling init_pgtable(). Here if physical randomizaiton fails, it won't build
      the identity mapping for the original area of the kernel but change to a
      new page table '_pgtable'. Then the kernel will triple fault immediately
      caused by no identity mappings.
      
      The normal kernel won't see this bug, because it comes here via startup_32()
      and CR3 will be set to _pgtable already. In startup_32() the identity
      mapping is built for the 0~4G area. In KASLR we just append to the existing
      area instead of entirely overwriting it for on-demand identity mapping
      building. So the identity mapping for the original area of kernel is still
      there.
      
      To fix it we just switch to the new identity mapping page table when physical
      KASLR succeeds. Otherwise we keep the old page table unchanged just like
      "nokaslr" does.
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1493278940-5885-1-git-send-email-bhe@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      da63b6b2
  4. Apr 26, 2017
    • Al Viro's avatar
      2fefc97b
    • Al Viro's avatar
      CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now · 701cac61
      Al Viro authored
      
      
      all architectures converted
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      701cac61
    • Andy Lutomirski's avatar
      x86/mm: Fix flush_tlb_page() on Xen · dbd68d8e
      Andy Lutomirski authored
      
      
      flush_tlb_page() passes a bogus range to flush_tlb_others() and
      expects the latter to fix it up.  native_flush_tlb_others() has the
      fixup but Xen's version doesn't.  Move the fixup to
      flush_tlb_others().
      
      AFAICS the only real effect is that, without this fix, Xen would
      flush everything instead of just the one page on remote vCPUs in
      when flush_tlb_page() was called.
      
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: e7b52ffd ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range")
      Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dbd68d8e
    • Andy Lutomirski's avatar
      x86/mm: Make flush_tlb_mm_range() more predictable · ce27374f
      Andy Lutomirski authored
      
      
      I'm about to rewrite the function almost completely, but first I
      want to get a functional change out of the way.  Currently, if
      flush_tlb_mm_range() does not flush the local TLB at all, it will
      never do individual page flushes on remote CPUs.  This seems to be
      an accident, and preserving it will be awkward.  Let's change it
      first so that any regressions in the rewrite will be easier to
      bisect and so that the rewrite can attempt to change no visible
      behavior at all.
      
      The fix is simple: we can simply avoid short-circuiting the
      calculation of base_pages_to_flush.
      
      As a side effect, this also eliminates a potential corner case: if
      tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
      could have ended up flushing the entire address space one page at a
      time.
      
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ce27374f
    • Andy Lutomirski's avatar
      x86/mm: Remove flush_tlb() and flush_tlb_current_task() · 29961b59
      Andy Lutomirski authored
      
      
      I was trying to figure out what how flush_tlb_current_task() would
      possibly work correctly if current->mm != current->active_mm, but I
      realized I could spare myself the effort: it has no callers except
      the unused flush_tlb() macro.
      
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      29961b59
    • Andy Lutomirski's avatar
      x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() · 9ccee237
      Andy Lutomirski authored
      
      
      mark_screen_rdonly() is the last remaining caller of flush_tlb().
      flush_tlb_mm_range() is potentially faster and isn't obsolete.
      
      Compile-tested only because I don't know whether software that uses
      this mechanism even exists.
      
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9ccee237
    • Kirill A. Shutemov's avatar
      x86/mm/64: Fix crash in remove_pagetable() · e6ab9c4d
      Kirill A. Shutemov authored
      
      
      remove_pagetable() does page walk using p*d_page_vaddr() plus cast.
      It's not canonical approach -- we usually use p*d_offset() for that.
      
      It works fine as long as all page table levels are present. We broke the
      invariant by introducing folded p4d page table level.
      
      As result, remove_pagetable() interprets PMD as PUD and it leads to
      crash:
      
      	BUG: unable to handle kernel paging request at ffff880300000000
      	IP: memchr_inv+0x60/0x110
      	PGD 317d067
      	P4D 317d067
      	PUD 3180067
      	PMD 33f102067
      	PTE 8000000300000060
      
      Let's fix this by using p*d_offset() instead of p*d_page_vaddr() for
      page walk.
      
      Reported-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Fixes: f2a6a705 ("x86: Convert the rest of the code to support p4d_t")
      Link: http://lkml.kernel.org/r/20170425092557.21852-1-kirill.shutemov@linux.intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e6ab9c4d
    • Josh Poimboeuf's avatar
      x86/unwind: Dump all stacks in unwind_dump() · 262fa734
      Josh Poimboeuf authored
      
      
      Currently unwind_dump() dumps only the most recently accessed stack.
      But it has a few issues.
      
      In some cases, 'first_sp' can get out of sync with 'stack_info', causing
      unwind_dump() to start from the wrong address, flood the printk buffer,
      and eventually read a bad address.
      
      In other cases, dumping only the most recently accessed stack doesn't
      give enough data to diagnose the error.
      
      Fix both issues by dumping *all* stacks involved in the trace, not just
      the last one.
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 8b5e99f0 ("x86/unwind: Dump stack data on warnings")
      Link: http://lkml.kernel.org/r/016d6a9810d7d1bfc87ef8c0e6ee041c6744c909.1493171120.git.jpoimboe@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      262fa734
    • Josh Poimboeuf's avatar
      x86/unwind: Silence more entry-code related warnings · b0d50c7b
      Josh Poimboeuf authored
      
      
      Borislav Petkov reported the following unwinder warning:
      
        WARNING: kernel stack regs at ffffc9000024fea8 in udevadm:92 has bad 'bp' value 00007fffc4614d30
        unwind stack type:0 next_sp:          (null) mask:0x6 graph_idx:0
        ffffc9000024fea8: 000055a6100e9b38 (0x55a6100e9b38)
        ffffc9000024feb0: 000055a6100e9b35 (0x55a6100e9b35)
        ffffc9000024feb8: 000055a6100e9f68 (0x55a6100e9f68)
        ffffc9000024fec0: 000055a6100e9f50 (0x55a6100e9f50)
        ffffc9000024fec8: 00007fffc4614d30 (0x7fffc4614d30)
        ffffc9000024fed0: 000055a6100eaf50 (0x55a6100eaf50)
        ffffc9000024fed8: 0000000000000000 ...
        ffffc9000024fee0: 0000000000000100 (0x100)
        ffffc9000024fee8: ffff8801187df488 (0xffff8801187df488)
        ffffc9000024fef0: 00007ffffffff000 (0x7ffffffff000)
        ffffc9000024fef8: 0000000000000000 ...
        ffffc9000024ff10: ffffc9000024fe98 (0xffffc9000024fe98)
        ffffc9000024ff18: 00007fffc4614d00 (0x7fffc4614d00)
        ffffc9000024ff20: ffffffffffffff10 (0xffffffffffffff10)
        ffffc9000024ff28: ffffffff811c6c1f (SyS_newlstat+0xf/0x10)
        ffffc9000024ff30: 0000000000000010 (0x10)
        ffffc9000024ff38: 0000000000000296 (0x296)
        ffffc9000024ff40: ffffc9000024ff50 (0xffffc9000024ff50)
        ffffc9000024ff48: 0000000000000018 (0x18)
        ffffc9000024ff50: ffffffff816b2e6a (entry_SYSCALL_64_fastpath+0x18/0xa8)
        ...
      
      It unwinded from an interrupt which came in right after entry code
      called into a C syscall handler, before it had a chance to set up the
      frame pointer, so regs->bp still had its user space value.
      
      Add a check to silence warnings in such a case, where an interrupt
      has occurred and regs->sp is almost at the end of the stack.
      
      Reported-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: c32c47c6 ("x86/unwind: Warn on bad frame pointer")
      Link: http://lkml.kernel.org/r/c695f0d0d4c2cfe6542b90e2d0520e11eb901eb5.1493171120.git.jpoimboe@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b0d50c7b
  5. Apr 23, 2017
    • Ingo Molnar's avatar
      Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" · 6dd29b3d
      Ingo Molnar authored
      
      
      This reverts commit 2947ba05.
      
      Dan Williams reported dax-pmem kernel warnings with the following signature:
      
         WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
         percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic
      
      ... and bisected it to this commit, which suggests possible memory corruption
      caused by the x86 fast-GUP conversion.
      
      He also pointed out:
      
       "
        This is similar to the backtrace when we were not properly handling
        pud faults and was fixed with this commit: 220ced16 "mm: fix
        get_user_pages() vs device-dax pud mappings"
      
        I've found some missing _devmap checks in the generic
        get_user_pages_fast() path, but this does not fix the regression
        [...]
       "
      
      So given that there are known bugs, and a pretty robust looking bisection
      points to this commit suggesting that are unknown bugs in the conversion
      as well, revert it for the time being - we'll re-try in v4.13.
      
      Reported-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dann.frazier@canonical.com
      Cc: dave.hansen@intel.com
      Cc: steve.capper@linaro.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6dd29b3d
  6. Apr 21, 2017
    • Steven Rostedt (VMware)'s avatar
      x86/ftrace: Fix ebp in ftrace_regs_caller that screws up unwinder · dc912c30
      Steven Rostedt (VMware) authored
      
      
      Fengguang Wu's zero day bot triggered a stack unwinder dump. This can
      be easily triggered when CONFIG_FRAME_POINTERS is enabled and -mfentry
      is in use on x86_32.
      
       ># cd /sys/kernel/debug/tracing
       ># echo 'p:schedule schedule' > kprobe_events
       ># echo stacktrace > events/kprobes/schedule/trigger
      
      This is because the code that implemented fentry in the ftrace_regs_caller
      tried to use the least amount of #ifdefs, and modified ebp when
      CC_USE_FENTRY was defined to point to the parent ip as it does when
      CC_USE_FENTRY is not defined. But when CONFIG_FRAME_POINTERS is set, it
      corrupts the ebp register for this frame while doing the tracing.
      
      NOTE, it does not corrupt ebp in any other way. It is just a bad frame
      pointer when calling into the tracing infrastructure. The original ebp is
      restored before returning from the fentry call. But if a stack trace is
      performed inside the tracing, the unwinder will notice the bad ebp.
      
      Instead of toying with ebp with CC_USING_FENTRY, just slap the parent ip
      into the second parameter (%edx), and have an #else that does it the
      original way.
      
      The unwinder will unfortunately miss the function being traced, as the
      stack frame is not set up yet for it, as it is for x86_64. But fixing that
      is a bit more complex and did not work before anyway.
      
      This has been tested with and without FRAME_POINTERS being set while using
      -mfentry, as well as using an older compiler that uses mcount.
      
      Analyzed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Fixes: 644e0e8d ("x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set")
      Reported-by: default avatarkernel test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Link: https://lists.01.org/pipermail/lkp/2017-April/006165.html
      Link: http://lkml.kernel.org/r/20170420172236.7af7f6e5@gandalf.local.home
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      dc912c30
  7. Apr 20, 2017
Loading