Skip to content
  1. Sep 30, 2015
    • Thomas Gleixner's avatar
      x86/process: Add proper bound checks in 64bit get_wchan() · eddd3826
      Thomas Gleixner authored
      Dmitry Vyukov reported the following using trinity and the memory
      error detector AddressSanitizer
      (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
      
      ).
      
      [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
      address ffff88002e280000
      [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
      the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
      [ 124.578633] Accessed by thread T10915:
      [ 124.579295] inlined in describe_heap_address
      ./arch/x86/mm/asan/report.c:164
      [ 124.579295] #0 ffffffff810dd277 in asan_report_error
      ./arch/x86/mm/asan/report.c:278
      [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
      ./arch/x86/mm/asan/asan.c:37
      [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
      [ 124.581893] #3 ffffffff8107c093 in get_wchan
      ./arch/x86/kernel/process_64.c:444
      
      The address checks in the 64bit implementation of get_wchan() are
      wrong in several ways:
      
       - The lower bound of the stack is not the start of the stack
         page. It's the start of the stack page plus sizeof (struct
         thread_info)
      
       - The upper bound must be:
      
             top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
      
         The 2 * sizeof(unsigned long) is required because the stack pointer
         points at the frame pointer. The layout on the stack is: ... IP FP
         ... IP FP. So we need to make sure that both IP and FP are in the
         bounds.
      
      Fix the bound checks and get rid of the mix of numeric constants, u64
      and unsigned long. Making all unsigned long allows us to use the same
      function for 32bit as well.
      
      Use READ_ONCE() when accessing the stack. This does not prevent a
      concurrent wakeup of the task and the stack changing, but at least it
      avoids TOCTOU.
      
      Also check task state at the end of the loop. Again that does not
      prevent concurrent changes, but it avoids walking for nothing.
      
      Add proper comments while at it.
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      eddd3826
    • Vitaly Kuznetsov's avatar
      x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case · 1e034743
      Vitaly Kuznetsov authored
      
      
      Recent changes in the Hyper-V driver:
      
        b4370df2 ("Drivers: hv: vmbus: add special crash handler")
      
      broke the build when CONFIG_KEXEC_CORE is not set:
      
        arch/x86/built-in.o: In function `hv_machine_crash_shutdown':
        arch/x86/kernel/cpu/mshyperv.c:112: undefined reference to `native_machine_crash_shutdown'
      
      Decorate all kexec related code with #ifdef CONFIG_KEXEC_CORE.
      
      Reported-by: default avatarJim Davis <jim.epost@gmail.com>
      Reported-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: devel@linuxdriverproject.org
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1443002577-25370-1-git-send-email-vkuznets@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1e034743
  2. Sep 23, 2015
  3. Sep 22, 2015
    • Andy Lutomirski's avatar
      x86/paravirt: Replace the paravirt nop with a bona fide empty function · fc57a7c6
      Andy Lutomirski authored
      
      
      PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
      example, trimmed for readability):
      
          ff 15 00 00 00 00       callq  *0x0(%rip)        # 2796 <nmi+0x6>
                    2792: R_X86_64_PC32     pv_irq_ops+0x2c
      
      That's a call through a function pointer to regular C function that
      does nothing on native boots, but that function isn't protected
      against kprobes, isn't marked notrace, and is certainly not
      guaranteed to preserve any registers if the compiler is feeling
      perverse.  This is bad news for a CLBR_NONE operation.
      
      Of course, if everything works correctly, once paravirt ops are
      patched, it gets nopped out, but what if we hit this code before
      paravirt ops are patched in?  This can potentially cause breakage
      that is very difficult to debug.
      
      A more subtle failure is possible here, too: if _paravirt_nop uses
      the stack at all (even just to push RBP), it will overwrite the "NMI
      executing" variable if it's called in the NMI prologue.
      
      The Xen case, perhaps surprisingly, is fine, because it's already
      written in asm.
      
      Fix all of the cases that default to paravirt_nop (including
      adjust_exception_frame) with a big hammer: replace paravirt_nop with
      an asm function that is just a ret instruction.
      
      The Xen case may have other problems, so document them.
      
      This is part of a fix for some random crashes that Sasha saw.
      
      Reported-and-tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      fc57a7c6
  4. Sep 17, 2015
  5. Sep 16, 2015
  6. Sep 14, 2015
    • Shaohua Li's avatar
      x86/apic: Serialize LVTT and TSC_DEADLINE writes · 5d7c631d
      Shaohua Li authored
      
      
      The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
      MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
      guaranteed that the write to LVTT has reached the APIC before the
      TSC_DEADLINE MSR is written. In such a case the write to the MSR is
      ignored and as a consequence the local timer interrupt never fires.
      
      The SDM decribes this issue for xAPIC and x2APIC modes. The
      serialization methods recommended by the SDM differ.
      
      xAPIC:
       "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
        2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
        3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
        4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
      
      x2APIC:
       "To allow for efficient access to the APIC registers in x2APIC mode,
        the serializing semantics of WRMSR are relaxed when writing to the
        APIC registers. Thus, system software should not use 'WRMSR to APIC
        registers in x2APIC mode' as a serializing instruction. Read and write
        accesses to the APIC registers will occur in program order. A WRMSR to
        an APIC register may complete before all preceding stores are globally
        visible; software can prevent this by inserting a serializing
        instruction, an SFENCE, or an MFENCE before the WRMSR."
      
      The xAPIC method is to just wait for the memory mapped write to hit
      the LVTT by checking whether the MSR write has reached the hardware.
      There is no reason why a proper MFENCE after the memory mapped write would
      not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
      xAPIC case as well.
      
      Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
      unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
      support.
      
      [ tglx: Massaged the changelog ]
      
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: <Kernel-team@fb.com>
      Cc: <lenb@kernel.org>
      Cc: <fenghua.yu@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org #v3.7+
      Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5d7c631d
    • Thomas Gleixner's avatar
      x86/ioapic: Force affinity setting in setup_ioapic_dest() · 4857c91f
      Thomas Gleixner authored
      
      
      The recent ioapic cleanups changed the affinity setting in
      setup_ioapic_dest() from a direct write to the hardware to the delayed
      affinity setup via irq_set_affinity().
      
      That results in a warning from chained_irq_exit():
      WARNING: CPU: 0 PID: 5 at kernel/irq/migration.c:32 irq_move_masked_irq
      [<ffffffff810a0a88>] irq_move_masked_irq+0xb8/0xc0
      [<ffffffff8103c161>] ioapic_ack_level+0x111/0x130
      [<ffffffff812bbfe8>] intel_gpio_irq_handler+0x148/0x1c0
      
      The reason is that irq_set_affinity() does not write directly to the
      hardware. It marks the affinity setting as pending and executes it
      from the next interrupt. The chained handler infrastructure does not
      take the irq descriptor lock for performance reasons because such a
      chained interrupt is not visible to any interfaces. So the delayed
      affinity setting triggers the warning in irq_move_masked_irq().
      
      Restore the old behaviour by calling the set_affinity function of the
      ioapic chip in setup_ioapic_dest(). This is safe as none of the
      interrupts can be on the fly at this point.
      
      Fixes: aa5cb97f 'x86/irq: Remove x86_io_apic_ops.set_affinity and related interfaces'
      Reported-and-tested-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: jarkko.nikula@linux.intel.com
      4857c91f
    • Jan Beulich's avatar
      x86/ldt: Fix small LDT allocation for Xen · f454b478
      Jan Beulich authored
      
      
      While the following commit:
      
        37868fe1 ("x86/ldt: Make modify_ldt synchronous")
      
      added a nice comment explaining that Xen needs page-aligned
      whole page chunks for guest descriptor tables, it then
      nevertheless used kzalloc() on the small size path.
      
      As I'm unaware of guarantees for kmalloc(PAGE_SIZE, ) to return
      page-aligned memory blocks, I believe this needs to be switched
      back to __get_free_page() (or better get_zeroed_page()).
      
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/55E735D6020000780009F1E6@prv-mh.provo.novell.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f454b478
  7. Sep 13, 2015
  8. Sep 11, 2015
  9. Sep 10, 2015
    • Christoph Hellwig's avatar
      dma-mapping: consolidate dma_set_mask · 452e06af
      Christoph Hellwig authored
      
      
      Almost everyone implements dma_set_mask the same way, although some time
      that's hidden in ->set_dma_mask methods.
      
      This patch consolidates those into a common implementation that either
      calls ->set_dma_mask if present or otherwise uses the default
      implementation.  Some architectures used to only call ->set_dma_mask
      after the initial checks, and those instance have been fixed to do the
      full work.  h8300 implemented dma_set_mask bogusly as a no-ops and has
      been fixed.
      
      Unfortunately some architectures overload unrelated semantics like changing
      the dma_ops into it so we still need to allow for an architecture override
      for now.
      
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      452e06af
    • Christoph Hellwig's avatar
      dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} · 6894258e
      Christoph Hellwig authored
      
      
      Since 2009 we have a nice asm-generic header implementing lots of DMA API
      functions for architectures using struct dma_map_ops, but unfortunately
      it's still missing a lot of APIs that all architectures still have to
      duplicate.
      
      This series consolidates the remaining functions, although we still need
      arch opt outs for two of them as a few architectures have very
      non-standard implementations.
      
      This patch (of 5):
      
      The coherent DMA allocator works the same over all architectures supporting
      dma_map operations.
      
      This patch consolidates them and converges the minor differences:
      
       - the debug_dma helpers are now called from all architectures, including
         those that were previously missing them
       - dma_alloc_from_coherent and dma_release_from_coherent are now always
         called from the generic alloc/free routines instead of the ops
         dma-mapping-common.h always includes dma-coherent.h to get the defintions
         for them, or the stubs if the architecture doesn't support this feature
       - checks for ->alloc / ->free presence are removed.  There is only one
         magic instead of dma_map_ops without them (mic_dma_ops) and that one
         is x86 only anyway.
      
      Besides that only x86 needs special treatment to replace a default devices
      if none is passed and tweak the gfp_flags.  An optional arch hook is provided
      for that.
      
      [linux@roeck-us.net: fix build]
      [jcmvbkbc@gmail.com: fix xtensa]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6894258e
    • Dave Young's avatar
      kexec: split kexec_load syscall from kexec core code · 2965faa5
      Dave Young authored
      
      
      There are two kexec load syscalls, kexec_load another and kexec_file_load.
       kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
      split kexec_load syscall code to kernel/kexec.c.
      
      And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
      use kexec_file_load only, or vice verse.
      
      The original requirement is from Ted Ts'o, he want kexec kernel signature
      being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
      kexec_load syscall can bypass the checking.
      
      Vivek Goyal proposed to create a common kconfig option so user can compile
      in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
      KEXEC_CORE so that old config files still work.
      
      Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
      architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
      KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
      kexec_load syscall.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2965faa5
  10. Sep 08, 2015
    • Mark Salter's avatar
      x86: use generic early mem copy · 5dd2c4bd
      Mark Salter authored
      
      
      The early_ioremap library now has a generic copy_from_early_mem()
      function.  Use the generic copy function for x86 relocate_initrd().
      
      [akpm@linux-foundation.org: remove MAX_MAP_CHUNK define, per Yinghai Lu]
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5dd2c4bd
  11. Sep 05, 2015
    • Andy Lutomirski's avatar
      x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0 · 76fc5e7b
      Andy Lutomirski authored
      
      
      vm86 exposes an interesting attack surface against the entry
      code. Since vm86 is mostly useless anyway if mmap_min_addr != 0,
      just turn it off in that case.
      
      There are some reports that vbetool can work despite setting
      mmap_min_addr to zero.  This shouldn't break that use case,
      as CAP_SYS_RAWIO already overrides mmap_min_addr.
      
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      76fc5e7b
  12. Sep 04, 2015
  13. Sep 03, 2015
    • Thomas Gleixner's avatar
      x86/alternatives: Make optimize_nops() interrupt safe and synced · 66c117d7
      Thomas Gleixner authored
      
      
      Richard reported the following crash:
      
      [    0.036000] BUG: unable to handle kernel paging request at 55501e06
      [    0.036000] IP: [<c0aae48b>] common_interrupt+0xb/0x38
      [    0.036000] Call Trace:
      [    0.036000]  [<c0409c80>] ? add_nops+0x90/0xa0
      [    0.036000]  [<c040a054>] apply_alternatives+0x274/0x630
      
      Chuck decoded:
      
       "  0:   8d 90 90 83 04 24       lea    0x24048390(%eax),%edx
          6:   80 fc 0f                cmp    $0xf,%ah
          9:   a8 0f                   test   $0xf,%al
       >> b:   a0 06 1e 50 55          mov    0x55501e06,%al
         10:   57                      push   %edi
         11:   56                      push   %esi
      
       Interrupt 0x30 occurred while the alternatives code was replacing the
       initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the
       optimized version, 0x8d,0x76,0x00. Only the first byte has been
       replaced so far, and it makes a mess out of the insn decoding."
      
      optimize_nops() is buggy in two aspects:
      
      - It's not disabling interrupts across the modification
      - It's lacking a sync_core() call
      
      Add both.
      
      Fixes: 4fd4b6e5 'x86/alternatives: Use optimized NOPs for padding'
      Reported-and-tested-by: default avatar"Richard W.M. Jones" <rjones@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Richard W.M. Jones <rjones@redhat.com>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509031232340.15006@nanos
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      66c117d7
  14. Aug 28, 2015
    • Thomas Gleixner's avatar
      x86/irq: Do not dereference irq descriptor before checking it · a47d4576
      Thomas Gleixner authored
      
      
      Having the IS_NULL_OR_ERR() check after dereferencing the pointer is
      not really working well.
      
      Move the dereference after the check.
      
      Fixes: a782a7e4 'x86/irq: Store irq descriptor in vector array'
      Reported-and-tested-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a47d4576
    • Luis R. Rodriguez's avatar
      x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del() · 2baa891e
      Luis R. Rodriguez authored
      The effort to replace mtrr_add() with architecture agnostic
      arch_phys_wc_add() is complete, this will ensure write-combining
      implementations (PAT on x86) is taken advantage instead of using
      MTRR. With the effort done now, hide direct MTRR access for
      drivers.
      
      The legacy user-space /proc/mtrr ABI is not affected.
      
      Update x86 documentation on MTRR to reflect the completion of
      the phasing out of direct access to MTRR, also add a note on
      platform firmware code use of MTRRs based on the obituary
      discussion of MTRRs on Linux [0].
      
        [0] http://lkml.kernel.org/r/1438991330.3109.196.camel@hp.com
      
      
      
      Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@suse.com>
      Cc: <syrjala@sci.fi>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Walls <awalls@md.metrocast.net>
      Cc: Antonino Daplas <adaplas@gmail.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Ville Syrjälä <syrjala@sci.fi>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: airlied@linux.ie
      Cc: benh@kernel.crashing.org
      Cc: bhelgaas@google.com
      Cc: dan.j.williams@intel.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-fbdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-media@vger.kernel.org
      Cc: mst@redhat.com
      Cc: netdev@vger.kernel.org
      Cc: vinod.koul@intel.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1440443613-13696-12-git-send-email-mcgrof@do-not-panic.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2baa891e
  15. Aug 26, 2015
    • Jiang Liu's avatar
      ACPI, PCI: Penalize legacy IRQ used by ACPI SCI · 5d0ddfeb
      Jiang Liu authored
      Nick Meier reported a regression with HyperV that "
        After rebooting the VM, the following messages are logged in syslog
        when trying to load the tulip driver:
          tulip: Linux Tulip drivers version 1.1.15 (Feb 27, 2007)
          tulip: 0000:00:0a.0: PCI INT A: failed to register GSI
          tulip: Cannot enable tulip board #0, aborting
          tulip: probe of 0000:00:0a.0 failed with error -16
        Errors occur in 3.19.0 kernel
        Works in 3.17 kernel.
      "
      
      According to the ACPI dump file posted by Nick at
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072
      
      The ACPI MADT table includes an interrupt source overridden entry for
      ACPI SCI:
      [236h 0566  1]                Subtable Type : 02 <Interrupt Source Override>
      [237h 0567  1]                       Length : 0A
      [238h 0568  1]                          Bus : 00
      [239h 0569  1]                       Source : 09
      [23Ah 0570  4]                    Interrupt : 00000009
      [23Eh 0574  2]        Flags (decoded below) : 000D
                                         Polarity : 1
                                     Trigger Mode : 3
      
      And in DSDT table, we have _PRT method to define PCI interrupts, which
      eventually goes to:
              Name (PRSA, ResourceTemplate ()
              {
                  IRQ (Level, ActiveLow, Shared, )
                      {3,4,5,7,9,10,11,12,14,15}
              })
              Name (PRSB, ResourceTemplate ()
              {
                  IRQ (Level, ActiveLow, Shared, )
                      {3,4,5,7,9,10,11,12,14,15}
              })
              Name (PRSC, ResourceTemplate ()
              {
                  IRQ (Level, ActiveLow, Shared, )
                      {3,4,5,7,9,10,11,12,14,15}
              })
              Name (PRSD, ResourceTemplate ()
              {
                  IRQ (Level, ActiveLow, Shared, )
                      {3,4,5,7,9,10,11,12,14,15}
              })
      
      According to the MADT and DSDT tables, IRQ 9 may be used for:
       1) ACPI SCI in level, high mode
       2) PCI legacy IRQ in level, low mode
      So there's a conflict in polarity setting for IRQ 9.
      
      Prior to commit cd68f6bd ("x86, irq, acpi: Get rid of special
      handling of GSI for ACPI SCI"), ACPI SCI is handled specially and
      there's no check for conflicts between ACPI SCI and PCI legagy IRQ.
      And it seems that the HyperV hypervisor doesn't make use of the
      polarity configuration in IOAPIC entry, so it just works.
      
      Commit cd68f6bd gets rid of the specially handling of ACPI SCI,
      and then the pin attribute checking code discloses the conflicts
      between ACPI SCI and PCI legacy IRQ on HyperV virtual machine,
      and rejects the request to assign IRQ9 to PCI devices.
      
      So penalize legacy IRQ used by ACPI SCI and mark it unusable if ACPI
      SCI attributes conflict with PCI IRQ attributes.
      
      Please refer to following links for more information:
      https://bugzilla.kernel.org/show_bug.cgi?id=101301
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072
      
      
      
      Fixes: cd68f6bd ("x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI")
      Reported-and-tested-by: default avatarNick Meier <nmeier@microsoft.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: 3.19+ <stable@vger.kernel.org> # 3.19+
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5d0ddfeb
  16. Aug 25, 2015
  17. Aug 23, 2015
  18. Aug 22, 2015
    • Thomas Gleixner's avatar
      x86/apic: Fix fallout from x2apic cleanup · a57e456a
      Thomas Gleixner authored
      
      
      In the recent x2apic cleanup I got two things really wrong:
      1) The safety check in __disable_x2apic which allows the function to
         be called unconditionally is backwards. The check is there to
         prevent access to the apic MSR in case that the machine has no
         apic. Though right now it returns if the machine has an apic and
         therefor the disabling of x2apic is never invoked.
      
      2) x2apic_disable() sets x2apic_mode to 0 after registering the local
         apic. That's wrong, because register_lapic_address() checks x2apic
         mode and therefor takes the wrong code path.
      
      This results in boot failures on machines with x2apic preenabled by
      BIOS and can also lead to an fatal MSR access on machines without
      apic.
      
      The solutions are simple:
      1) Correct the sanity check for apic availability
      2) Clear x2apic_mode _before_ calling register_lapic_address()
      
      Fixes: 659006bf 'x86/x2apic: Split enable and setup function'
      Reported-and-tested-by: default avatarJavier Monteagudo <javiermon@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1224764
      Cc: stable@vger.kernel.org # 4.0+
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      a57e456a
    • Huang Rui's avatar
      x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer · b466bdb6
      Huang Rui authored
      
      
      MWAITX can enable a timer and a corresponding timer value
      specified in SW P0 clocks. The SW P0 frequency is the same as
      TSC. The timer provides an upper bound on how long the
      instruction waits before exiting.
      
      This way, a delay function in the kernel can leverage that
      MWAITX timer of MWAITX.
      
      When a CPU core executes MWAITX, it will be quiesced in a
      waiting phase, diminishing its power consumption. This way, we
      can save power in comparison to our default TSC-based delays.
      
      A simple test shows that:
      
      	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
      	$ sleep 10000s
      	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
      
      Results:
      
      	* TSC-based default delay:      485115 uWatts average power
      	* MWAITX-based delay:           252738 uWatts average power
      
      Thus, that's about 240 milliWatts less power consumption. The
      test method relies on the support of AMD CPU accumulated power
      algorithm in fam15h_power for which patches are forthcoming.
      
      Suggested-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Suggested-by: default avatarBorislav Petkov <bp@suse.de>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      [ Fix delay truncation. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
      Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Li <tony.li@amd.com>
      Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com
      Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b466bdb6
    • Andy Lutomirski's avatar
      x86/traps: Weaken context tracking entry assertions · f0a97af8
      Andy Lutomirski authored
      
      
      We were asserting that we were all the way in CONTEXT_KERNEL
      when exception handlers were called.  While having this be true
      is, I think, a nice goal (or maybe a variant in which we assert
      that we're in CONTEXT_KERNEL or some new IRQ context), we're not
      quite there.
      
      In particular, if an IRQ interrupts the SYSCALL prologue and the
      IRQ handler in turn causes an exception, the exception entry
      will be called in RCU IRQ mode but with CONTEXT_USER.
      
      This is okay (nothing goes wrong), but until we fix up the
      SYSCALL prologue, we need to avoid warning.
      
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/c81faf3916346c0e04346c441392974f49cd7184.1440133286.git.luto@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f0a97af8
    • Ingo Molnar's avatar
      x86/fpu/math-emu: Fix crash in fork() · 827409b2
      Ingo Molnar authored
      
      
      During later stages of math-emu bootup the following crash triggers:
      
      	 math_emulate: 0060:c100d0a8
      	 Kernel panic - not syncing: Math emulation needed in kernel
      	 CPU: 0 PID: 1511 Comm: login Not tainted 4.2.0-rc7+ #1012
      	 [...]
      	 Call Trace:
      	  [<c181d50d>] dump_stack+0x41/0x52
      	  [<c181c918>] panic+0x77/0x189
      	  [<c1003530>] ? math_error+0x140/0x140
      	  [<c164c2d7>] math_emulate+0xba7/0xbd0
      	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
      	  [<c1109c3c>] ? __alloc_pages_nodemask+0x12c/0x870
      	  [<c136ac20>] ? proc_clear_tty+0x40/0x70
      	  [<c136ac6e>] ? session_clear_tty+0x1e/0x30
      	  [<c1003530>] ? math_error+0x140/0x140
      	  [<c1003575>] do_device_not_available+0x45/0x70
      	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
      	  [<c18258e6>] error_code+0x5a/0x60
      	  [<c1003530>] ? math_error+0x140/0x140
      	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
      	  [<c100c205>] arch_dup_task_struct+0x25/0x30
      	  [<c1048cea>] copy_process.part.51+0xea/0x1480
      	  [<c115a8e5>] ? dput+0x175/0x200
      	  [<c136af70>] ? no_tty+0x30/0x30
      	  [<c1157242>] ? do_vfs_ioctl+0x322/0x540
      	  [<c104a21a>] _do_fork+0xca/0x340
      	  [<c1057b06>] ? SyS_rt_sigaction+0x66/0x90
      	  [<c104a557>] SyS_clone+0x27/0x30
      	  [<c1824a80>] sysenter_do_call+0x12/0x12
      
      The reason is the incorrect assumption in fpu_copy(), that FNSAVE
      can be executed from math-emu kernels as well.
      
      Don't try to copy the registers, the soft state will be copied
      by fork anyway, so the child task inherits the parent task's
      soft math state.
      
      With this fix applied math-emu kernels boot up fine on modern
      hardware and the 'no387 nofxsr' boot options.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      827409b2
    • Ingo Molnar's avatar
      x86/fpu/math-emu: Fix math-emu boot crash · 5fc96038
      Ingo Molnar authored
      
      
      On a math-emu bootup the following crash occurs:
      
      	Initializing CPU#0
      	------------[ cut here ]------------
      	kernel BUG at arch/x86/kernel/traps.c:779!
      	invalid opcode: 0000 [#1] SMP
      	[...]
      	EIP is at do_device_not_available+0xe/0x70
      	[...]
      	Call Trace:
      	 [<c18238e6>] error_code+0x5a/0x60
      	 [<c1002bd0>] ? math_error+0x140/0x140
      	 [<c100bbd9>] ? fpu__init_cpu+0x59/0xa0
      	 [<c1012322>] cpu_init+0x202/0x330
      	 [<c104509f>] ? __native_set_fixmap+0x1f/0x30
      	 [<c1b56ab0>] trap_init+0x305/0x346
      	 [<c1b548af>] start_kernel+0x1a5/0x35d
      	 [<c1b542b4>] i386_start_kernel+0x82/0x86
      
      The reason is that in the following commit:
      
        b1276c48 ("x86/fpu: Initialize fpregs in fpu__init_cpu_generic()")
      
      I failed to consider math-emu's limitation that it cannot execute the
      FNINIT instruction in kernel mode.
      
      The long term fix might be to allow math-emu to execute (certain) kernel
      mode FPU instructions, but for now apply the safe (albeit somewhat ugly)
      fix: initialize the emulation state explicitly without trapping out to
      the FPU emulator.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5fc96038
  19. Aug 21, 2015
    • Vitaly Kuznetsov's avatar
      x86/hyperv: Mark the Hyper-V TSC as unstable · 88c9281a
      Vitaly Kuznetsov authored
      
      
      The Hyper-V top-level functional specification states, that
      "algorithms should be resilient to sudden jumps forward or
      backward in the TSC value", this means that we should consider
      TSC as unstable. In some cases tsc tests are able to detect the
      instability, it was detected in 543 out of 646 boots in my
      testing:
      
       Measured 6277 cycles TSC warp between CPUs, turning off TSC clock.
       tsc: Marking TSC unstable due to check_tsc_sync_source failed
      
      This is, however, just a heuristic. On Hyper-V platform there
      are two good clocksources: MSR-based hyperv_clocksource and
      recently introduced TSC page.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devel@linuxdriverproject.org
      Link: http://lkml.kernel.org/r/1440003264-9949-1-git-send-email-vkuznets@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      88c9281a
    • Ingo Molnar's avatar
      perf/x86/msr: Fix the MSR driver build · 82819ffb
      Ingo Molnar authored
      
      
      The new MSR PMU driver made use of rdtsc() which does not exist (yet) in
      this tree:
      
        arch/x86/kernel/cpu/perf_event_msr.c:91:3: error: implicit declaration of function 'rdtsc'
      
      Use the old rdtscll() primitive for now.
      
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      82819ffb
  20. Aug 20, 2015
  21. Aug 19, 2015
    • Dan Williams's avatar
      libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option · 7a67832c
      Dan Williams authored
      
      
      We currently register a platform device for e820 type-12 memory and
      register a nvdimm bus beneath it.  Registering the platform device
      triggers the device-core machinery to probe for a driver, but that
      search currently comes up empty.  Building the nvdimm-bus registration
      into the e820_pmem platform device registration in this way forces
      libnvdimm to be built-in.  Instead, convert the built-in portion of
      CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
      rest of the logic to the driver for e820_pmem, for the following
      reasons:
      
      1/ Letting e820_pmem support be a module allows building and testing
         libnvdimm.ko changes without rebooting
      
      2/ All the normal policy around modules can be applied to e820_pmem
         (unbind to disable and/or blacklisting the module from loading by
         default)
      
      3/ Moving the driver to a generic location and converting it to scan
         "iomem_resource" rather than "e820.map" means any other architecture can
         take advantage of this simple nvdimm resource discovery mechanism by
         registering a resource named "Persistent Memory (legacy)"
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      7a67832c
  22. Aug 18, 2015
  23. Aug 17, 2015
    • Len Brown's avatar
      x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted · 656bba30
      Len Brown authored
      
      
      Both the per-APIC flag ".wait_for_init_deassert",
      and the global atomic_t "init_deasserted"
      are dead code -- remove them.
      
      For all APIC types, "wait_for_master()"
      prevents an AP from proceeding until the BSP has set
      cpu_callout_mask, making "init_deasserted" {unnecessary}:
      
      	BSP: <de-assert INIT>
      	...
      	BSP: {set init_deasserted}
      	AP: wait_for_master()
      		set cpu_initialized_mask
      		wait for cpu_callout_mask
      	BSP: test cpu_initialized_mask
      	BSP: set cpu_callout_mask
      	AP: test cpu_callout_mask
      	AP: {wait for init_deasserted}
      	...
      	AP: <touch APIC>
      
      Deleting the {dead code} above is necessary to enable
      some parallelism in a future patch.
      
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/de4b3a9bab894735e285870b5296da25ee6a8a5a.1439739165.git.len.brown@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      656bba30
    • Len Brown's avatar
      x86/smpboot: Remove SIPI delays from cpu_up() · a9bcaa02
      Len Brown authored
      
      
      MPS 1.4 example code shows the following required delays during processor
      on-lining:
      
      	INIT
      	 udelay(10,000)
      	SIPI
      	 udelay(200)
      	SIPI
      	 udelay(200) /* Linux actually implements this as udelay(300) */
      
      Linux skips the udelay(10,000) on modern processors.
      This patch removes the udelay(200) after each SIPI
      on those same processors.
      
      All three legacy delays can be restored by the cmdline
      "cpu_init_udelay=10000".
      
      As measured by analyze_suspend.py, this patch speeds
      processor resume time on my desktop from 2.4ms to 1.8ms, per AP.
      
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/a5dfdbc8fbfdd813784da204aad5677fe459ac37.1439739165.git.len.brown@intel.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a9bcaa02
Loading