Skip to content
  1. Jul 17, 2019
  2. Jul 12, 2019
    • Mike Rapoport's avatar
      asm-generic, x86: introduce generic pte_{alloc,free}_one[_kernel] · 5fba4af4
      Mike Rapoport authored
      Most architectures have identical or very similar implementation of
      pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and
      pte_free().
      
      Add a generic implementation that can be reused across architectures and
      enable its use on x86.
      
      The generic implementation uses
      
      	GFP_KERNEL | __GFP_ZERO
      
      for the kernel page tables and
      
      	GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT
      
      for the user page tables.
      
      The "base" functions for PTE allocation, namely __pte_alloc_one_kernel()
      and __pte_alloc_one() are intended for the architectures that require
      additional actions after actual memory allocation or must use non-default
      GFP flags.
      
      x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and
      pte_free().
      
      x86 still implements pte_alloc_one() to allow run-time control of GFP
      flags required for "userpte" command line option.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-2-git-send-email-rppt@linux.ibm.com
      
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5fba4af4
    • Christoph Hellwig's avatar
      mm: lift the x86_32 PAE version of gup_get_pte to common code · 39656e83
      Christoph Hellwig authored
      The split low/high access is the only non-READ_ONCE version of gup_get_pte
      that did show up in the various arch implemenations.  Lift it to common
      code and drop the ifdef based arch override.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de
      
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      39656e83
    • Christoph Hellwig's avatar
      mm: simplify gup_fast_permitted · 26f4c328
      Christoph Hellwig authored
      Pass in the already calculated end value instead of recomputing it, and
      leave the end > start check in the callers instead of duplicating them in
      the arch code.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-3-hch@lst.de
      
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26f4c328
    • Marco Elver's avatar
      asm-generic, x86: add bitops instrumentation for KASAN · 751ad98d
      Marco Elver authored
      This adds a new header to asm-generic to allow optionally instrumenting
      architecture-specific asm implementations of bitops.
      
      This change includes the required change for x86 as reference and
      changes the kernel API doc to point to bitops-instrumented.h instead.
      Rationale: the functions in x86's bitops.h are no longer the kernel API
      functions, but instead the arch_ prefixed functions, which are then
      instrumented via bitops-instrumented.h.
      
      Other architectures can similarly add support for asm implementations of
      bitops.
      
      The documentation text was derived from x86 and existing bitops
      asm-generic versions: 1) references to x86 have been removed; 2) as a
      result, some of the text had to be reworded for clarity and consistency.
      
      Tested using lib/test_kasan with bitops tests (pre-requisite patch).
      Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=198439
      
      Link: http://lkml.kernel.org/r/20190613125950.197667-4-elver@google.com
      
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      751ad98d
  3. Jul 11, 2019
    • Eric Hankland's avatar
      KVM: x86: PMU Event Filter · 66bb8a06
      Eric Hankland authored
      
      
      Some events can provide a guest with information about other guests or the
      host (e.g. L3 cache stats); providing the capability to restrict access
      to a "safe" set of events would limit the potential for the PMU to be used
      in any side channel attacks. This change introduces a new VM ioctl that
      sets an event filter. If the guest attempts to program a counter for
      any blacklisted or non-whitelisted event, the kernel counter won't be
      created, so any RDPMC/RDMSR will show 0 instances of that event.
      
      Signed-off-by: default avatarEric Hankland <ehankland@google.com>
      [Lots of changes. All remaining bugs are probably mine. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      66bb8a06
  4. Jul 10, 2019
  5. Jul 09, 2019
  6. Jul 08, 2019
  7. Jul 07, 2019
  8. Jul 03, 2019
    • Thomas Gleixner's avatar
      x86/fsgsbase: Revert FSGSBASE support · 049331f2
      Thomas Gleixner authored
      
      
      The FSGSBASE series turned out to have serious bugs and there is still an
      open issue which is not fully understood yet.
      
      The confidence in those changes has become close to zero especially as the
      test cases which have been shipped with that series were obviously never
      run before sending the final series out to LKML.
      
        ./fsgsbase_64 >/dev/null
        Segmentation fault
      
      As the merge window is close, the only sane decision is to revert FSGSBASE
      support. The revert is necessary as this branch has been merged into
      perf/core already and rebasing all of that a few days before the merge
      window is not the most brilliant idea.
      
      I could definitely slap myself for not noticing the test case fail when
      merging that series, but TBH my expectations weren't that low back
      then. Won't happen again.
      
      Revert the following commits:
      539bca53 ("x86/entry/64: Fix and clean up paranoid_exit")
      2c7b5ac5 ("Documentation/x86/64: Add documentation for GS/FS addressing mode")
      f987c955 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2")
      2032f1f9 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
      5bf0cab6 ("x86/entry/64: Document GSBASE handling in the paranoid path")
      708078f6 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
      79e1932f ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      1d07316b ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
      f60a83df ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace")
      1ab5f3f7 ("x86/process/64: Use FSBSBASE in switch_to() if available")
      a86b4625 ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")
      8b71340d ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions")
      b64ed19b ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE")
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      049331f2
    • Michael Kelley's avatar
      clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic · dd2cb348
      Michael Kelley authored
      
      
      Continue consolidating Hyper-V clock and timer code into an ISA
      independent Hyper-V clocksource driver.
      
      Move the existing clocksource code under drivers/hv and arch/x86 to the new
      clocksource driver while separating out the ISA dependencies. Update
      Hyper-V initialization to call initialization and cleanup routines since
      the Hyper-V synthetic clock is not independently enumerated in ACPI.
      
      Update Hyper-V clocksource users in KVM and VDSO to get definitions from
      the new include file.
      
      No behavior is changed and no new functionality is added.
      
      Suggested-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: "bp@alien8.de" <bp@alien8.de>
      Cc: "will.deacon@arm.com" <will.deacon@arm.com>
      Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>
      Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>
      Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
      Cc: "olaf@aepfle.de" <olaf@aepfle.de>
      Cc: "apw@canonical.com" <apw@canonical.com>
      Cc: "jasowang@redhat.com" <jasowang@redhat.com>
      Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
      Cc: Sunil Muthuswamy <sunilmut@microsoft.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: "sashal@kernel.org" <sashal@kernel.org>
      Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com>
      Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
      Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>
      Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org>
      Cc: "arnd@arndb.de" <arnd@arndb.de>
      Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>
      Cc: "ralf@linux-mips.org" <ralf@linux-mips.org>
      Cc: "paul.burton@mips.com" <paul.burton@mips.com>
      Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
      Cc: "salyzyn@android.com" <salyzyn@android.com>
      Cc: "pcc@google.com" <pcc@google.com>
      Cc: "shuah@kernel.org" <shuah@kernel.org>
      Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com>
      Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>
      Cc: "huw@codeweavers.com" <huw@codeweavers.com>
      Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au>
      Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>
      Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com>
      Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
      Link: https://lkml.kernel.org/r/1561955054-1838-3-git-send-email-mikelley@microsoft.com
      dd2cb348
    • Michael Kelley's avatar
      clocksource/drivers: Make Hyper-V clocksource ISA agnostic · fd1fea68
      Michael Kelley authored
      
      
      Hyper-V clock/timer code and data structures are currently mixed
      in with other code in the ISA independent drivers/hv directory as
      well as the ISA dependent Hyper-V code under arch/x86.
      
      Consolidate this code and data structures into a Hyper-V clocksource driver
      to better follow the Linux model. In doing so, separate out the ISA
      dependent portions so the new clocksource driver works for x86 and for the
      in-process Hyper-V on ARM64 code.
      
      To start, move the existing clockevents code to create the new clocksource
      driver. Update the VMbus driver to call initialization and cleanup routines
      since the Hyper-V synthetic timers are not independently enumerated in
      ACPI.
      
      No behavior is changed and no new functionality is added.
      
      Suggested-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: "bp@alien8.de" <bp@alien8.de>
      Cc: "will.deacon@arm.com" <will.deacon@arm.com>
      Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>
      Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>
      Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>
      Cc: "olaf@aepfle.de" <olaf@aepfle.de>
      Cc: "apw@canonical.com" <apw@canonical.com>
      Cc: "jasowang@redhat.com" <jasowang@redhat.com>
      Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
      Cc: Sunil Muthuswamy <sunilmut@microsoft.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: "sashal@kernel.org" <sashal@kernel.org>
      Cc: "vincenzo.frascino@arm.com" <vincenzo.frascino@arm.com>
      Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
      Cc: "linux-mips@vger.kernel.org" <linux-mips@vger.kernel.org>
      Cc: "linux-kselftest@vger.kernel.org" <linux-kselftest@vger.kernel.org>
      Cc: "arnd@arndb.de" <arnd@arndb.de>
      Cc: "linux@armlinux.org.uk" <linux@armlinux.org.uk>
      Cc: "ralf@linux-mips.org" <ralf@linux-mips.org>
      Cc: "paul.burton@mips.com" <paul.burton@mips.com>
      Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
      Cc: "salyzyn@android.com" <salyzyn@android.com>
      Cc: "pcc@google.com" <pcc@google.com>
      Cc: "shuah@kernel.org" <shuah@kernel.org>
      Cc: "0x7f454c46@gmail.com" <0x7f454c46@gmail.com>
      Cc: "linux@rasmusvillemoes.dk" <linux@rasmusvillemoes.dk>
      Cc: "huw@codeweavers.com" <huw@codeweavers.com>
      Cc: "sfr@canb.auug.org.au" <sfr@canb.auug.org.au>
      Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>
      Cc: "rkrcmar@redhat.com" <rkrcmar@redhat.com>
      Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
      Link: https://lkml.kernel.org/r/1561955054-1838-2-git-send-email-mikelley@microsoft.com
      fd1fea68
    • Thomas Gleixner's avatar
      x86/irq: Seperate unused system vectors from spurious entry again · f8a8fe61
      Thomas Gleixner authored
      
      
      Quite some time ago the interrupt entry stubs for unused vectors in the
      system vector range got removed and directly mapped to the spurious
      interrupt vector entry point.
      
      Sounds reasonable, but it's subtly broken. The spurious interrupt vector
      entry point pushes vector number 0xFF on the stack which makes the whole
      logic in __smp_spurious_interrupt() pointless.
      
      As a consequence any spurious interrupt which comes from a vector != 0xFF
      is treated as a real spurious interrupt (vector 0xFF) and not
      acknowledged. That subsequently stalls all interrupt vectors of equal and
      lower priority, which brings the system to a grinding halt.
      
      This can happen because even on 64-bit the system vector space is not
      guaranteed to be fully populated. A full compile time handling of the
      unused vectors is not possible because quite some of them are conditonally
      populated at runtime.
      
      Bring the entry stubs back, which wastes 160 bytes if all stubs are unused,
      but gains the proper handling back. There is no point to selectively spare
      some of the stubs which are known at compile time as the required code in
      the IDT management would be way larger and convoluted.
      
      Do not route the spurious entries through common_interrupt and do_IRQ() as
      the original code did. Route it to smp_spurious_interrupt() which evaluates
      the vector number and acts accordingly now that the real vector numbers are
      handed in.
      
      Fixup the pr_warn so the actual spurious vector (0xff) is clearly
      distiguished from the other vectors and also note for the vectored case
      whether it was pending in the ISR or not.
      
       "Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen."
       "Spurious interrupt vector 0xed on CPU#1. Acked."
       "Spurious interrupt vector 0xee on CPU#1. Not pending!."
      
      Fixes: 2414e021 ("x86: Avoid building unused IRQ entry stubs")
      Reported-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Link: https://lkml.kernel.org/r/20190628111440.550568228@linutronix.de
      f8a8fe61
    • Thomas Gleixner's avatar
      x86/irq: Handle spurious interrupt after shutdown gracefully · b7107a67
      Thomas Gleixner authored
      
      
      Since the rework of the vector management, warnings about spurious
      interrupts have been reported. Robert provided some more information and
      did an initial analysis. The following situation leads to these warnings:
      
         CPU 0                  CPU 1               IO_APIC
      
                                                    interrupt is raised
                                                    sent to CPU1
      			  Unable to handle
      			  immediately
      			  (interrupts off,
      			   deep idle delay)
         mask()
         ...
         free()
           shutdown()
           synchronize_irq()
           clear_vector()
                                do_IRQ()
                                  -> vector is clear
      
      Before the rework the vector entries of legacy interrupts were statically
      assigned and occupied precious vector space while most of them were
      unused. Due to that the above situation was handled silently because the
      vector was handled and the core handler of the assigned interrupt
      descriptor noticed that it is shut down and returned.
      
      While this has been usually observed with legacy interrupts, this situation
      is not limited to them. Any other interrupt source, e.g. MSI, can cause the
      same issue.
      
      After adding proper synchronization for level triggered interrupts, this
      can only happen for edge triggered interrupts where the IO-APIC obviously
      cannot provide information about interrupts in flight.
      
      While the spurious warning is actually harmless in this case it worries
      users and driver developers.
      
      Handle it gracefully by marking the vector entry as VECTOR_SHUTDOWN instead
      of VECTOR_UNUSED when the vector is freed up.
      
      If that above late handling happens the spurious detector will not complain
      and switch the entry to VECTOR_UNUSED. Any subsequent spurious interrupt on
      that line will trigger the spurious warning as before.
      
      Fixes: 464d1230 ("x86/vector: Switch IOAPIC to global reservation mode")
      Reported-by: default avatarRobert Hodaszi <Robert.Hodaszi@digi.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de&gt;->
      Tested-by: default avatarRobert Hodaszi <Robert.Hodaszi@digi.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Link: https://lkml.kernel.org/r/20190628111440.459647741@linutronix.de
      b7107a67
  9. Jul 01, 2019
  10. Jun 29, 2019
    • Thomas Gleixner's avatar
      x86/timer: Skip PIT initialization on modern chipsets · c8c40767
      Thomas Gleixner authored
      
      
      Recent Intel chipsets including Skylake and ApolloLake have a special
      ITSSPRC register which allows the 8254 PIT to be gated.  When gated, the
      8254 registers can still be programmed as normal, but there are no IRQ0
      timer interrupts.
      
      Some products such as the Connex L1430 and exone go Rugged E11 use this
      register to ship with the PIT gated by default. This causes Linux to fail
      to boot:
      
        Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
        apic=debug and send a report.
      
      The panic happens before the framebuffer is initialized, so to the user, it
      appears as an early boot hang on a black screen.
      
      Affected products typically have a BIOS option that can be used to enable
      the 8254 and make Linux work (Chipset -> South Cluster Configuration ->
      Miscellaneous Configuration -> 8254 Clock Gating), however it would be best
      to make Linux support the no-8254 case.
      
      Modern sytems allow to discover the TSC and local APIC timer frequencies,
      so the calibration against the PIT is not required. These systems have
      always running timers and the local APIC timer works also in deep power
      states.
      
      So the setup of the PIT including the IO-APIC timer interrupt delivery
      checks are a pointless exercise.
      
      Skip the PIT setup and the IO-APIC timer interrupt checks on these systems,
      which avoids the panic caused by non ticking PITs and also speeds up the
      boot process.
      
      Thanks to Daniel for providing the changelog, initial analysis of the
      problem and testing against a variety of machines.
      
      Reported-by: default avatarDaniel Drake <drake@endlessm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarDaniel Drake <drake@endlessm.com>
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Cc: linux@endlessm.com
      Cc: rafael.j.wysocki@intel.com
      Cc: hdegoede@redhat.com
      Link: https://lkml.kernel.org/r/20190628072307.24678-1-drake@endlessm.com
      c8c40767
  11. Jun 27, 2019
  12. Jun 26, 2019
    • Zhenzhong Duan's avatar
      x86/speculation/mds: Eliminate leaks by trace_hardirqs_on() · ab3765a0
      Zhenzhong Duan authored
      
      
      Move mds_idle_clear_cpu_buffers() after trace_hardirqs_on() to ensure
      all store buffer entries are flushed.
      
      Signed-off-by: default avatarZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Cc: jgross@suse.com
      Cc: ndesaulniers@google.com
      Cc: gregkh@linuxfoundation.org
      Link: https://lkml.kernel.org/r/1561260904-29669-2-git-send-email-zhenzhong.duan@oracle.com
      ab3765a0
    • Thomas Gleixner's avatar
      lib/vdso: Make delta calculation work correctly · 9d90b93b
      Thomas Gleixner authored
      
      
      The x86 vdso implementation on which the generic vdso library is based on
      has subtle (unfortunately undocumented) twists:
      
       1) The code assumes that the clocksource mask is U64_MAX which means that
          no bits are masked. Which is true for any valid x86 VDSO clocksource.
          Stupidly it still did the mask operation for no reason and at the wrong
          place right after reading the clocksource.
      
       2) It contains a sanity check to catch the case where slightly
          unsynchronized TSC values can be observed which would cause the delta
          calculation to make a huge jump. It therefore checks whether the
          current TSC value is larger than the value on which the current
          conversion is based on. If it's not larger the base value is used to
          prevent time jumps.
      
      #1 Is not only stupid for the X86 case because it does the masking for no
      reason it is also completely wrong for clocksources with a smaller mask
      which can legitimately wrap around during a conversion period. The core
      timekeeping code does it correct by applying the mask after the delta
      calculation:
      
      	(now - base) & mask
      
      #2 is equally broken for clocksources which have smaller masks and can wrap
      around during a conversion period because there the now > base check is
      just wrong and causes stale time stamps and time going backwards issues.
      
      Unbreak it by:
      
        1) Removing the mask operation from the clocksource read which makes the
           fallback detection work for all clocksources
      
        2) Replacing the conditional delta calculation with a overrideable inline
           function.
      
      #2 could reuse clocksource_delta() from the timekeeping code but that
      results in a significant performance hit for the x86 VSDO. The timekeeping
      core code must have the non optimized version as it has to operate
      correctly with clocksources which have smaller masks as well to handle the
      case where TSC is discarded as timekeeper clocksource and replaced by HPET
      or pmtimer. For the VDSO there is no replacement clocksource. If TSC is
      unusable the syscall is enforced which does the right thing.
      
      To accommodate to the needs of various architectures provide an
      override-able inline function which defaults to the regular delta
      calculation with masking:
      
      	(now - base) & mask
      
      Override it for x86 with the non-masking and checking version.
      
      This unbreaks the ARM64 syscall fallback operation, allows to use
      clocksources with arbitrary width and preserves the performance
      optimization for x86.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarVincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: linux-arch@vger.kernel.org
      Cc: LAK <linux-arm-kernel@lists.infradead.org>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kselftest@vger.kernel.org
      Cc: catalin.marinas@arm.com
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux@armlinux.org.uk
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: paul.burton@mips.com
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: salyzyn@android.com
      Cc: pcc@google.com
      Cc: shuah@kernel.org
      Cc: 0x7f454c46@gmail.com
      Cc: linux@rasmusvillemoes.dk
      Cc: huw@codeweavers.com
      Cc: sthotton@marvell.com
      Cc: andre.przywara@arm.com
      Cc: Andy Lutomirski <luto@kernel.org>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906261159230.32342@nanos.tec.linutronix.de
      9d90b93b
  13. Jun 25, 2019
  14. Jun 23, 2019
    • Fenghua Yu's avatar
      x86/umwait: Initialize umwait control values · bd688c69
      Fenghua Yu authored
      
      
      umwait or tpause allows the processor to enter a light-weight
      power/performance optimized state (C0.1 state) or an improved
      power/performance optimized state (C0.2 state) for a period specified by
      the instruction or until the system time limit or until a store to the
      monitored address range in umwait.
      
      IA32_UMWAIT_CONTROL MSR register allows the OS to enable/disable C0.2 on
      the processor and to set the maximum time the processor can reside in C0.1
      or C0.2.
      
      By default C0.2 is enabled so the user wait instructions can enter the
      C0.2 state to save more power with slower wakeup time.
      
      Andy Lutomirski proposed to set the maximum umwait time to 100000 cycles by
      default. A quote from Andy:
      
        "What I want to avoid is the case where it works dramatically differently
         on NO_HZ_FULL systems as compared to everything else. Also, UMWAIT may
         behave a bit differently if the max timeout is hit, and I'd like that
         path to get exercised widely by making it happen even on default
         configs."
      
      A sysfs interface to adjust the time and the C0.2 enablement is provided in
      a follow up change.
      
      [ tglx: Renamed MSR_IA32_UMWAIT_CONTROL_MAX_TIME to
        	MSR_IA32_UMWAIT_CONTROL_TIME_MASK because the constant is used as
        	mask throughout the code.
      	Massaged comments and changelog ]
      
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-3-git-send-email-fenghua.yu@intel.com
      bd688c69
    • Fenghua Yu's avatar
      x86/cpufeatures: Enumerate user wait instructions · 6dbbf5ec
      Fenghua Yu authored
      
      
      umonitor, umwait, and tpause are a set of user wait instructions.
      
      umonitor arms address monitoring hardware using an address. The
      address range is determined by using CPUID.0x5. A store to
      an address within the specified address range triggers the
      monitoring hardware to wake up the processor waiting in umwait.
      
      umwait instructs the processor to enter an implementation-dependent
      optimized state while monitoring a range of addresses. The optimized
      state may be either a light-weight power/performance optimized state
      (C0.1 state) or an improved power/performance optimized state
      (C0.2 state).
      
      tpause instructs the processor to enter an implementation-dependent
      optimized state C0.1 or C0.2 state and wake up when time-stamp counter
      reaches specified timeout.
      
      The three instructions may be executed at any privilege level.
      
      The instructions provide power saving method while waiting in
      user space. Additionally, they can allow a sibling hyperthread to
      make faster progress while this thread is waiting. One example of an
      application usage of umwait is when waiting for input data from another
      application, such as a user level multi-threaded packet processing
      engine.
      
      Availability of the user wait instructions is indicated by the presence
      of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5].
      
      Detailed information on the instructions and CPUID feature WAITPKG flag
      can be found in the latest Intel Architecture Instruction Set Extensions
      and Future Features Programming Reference and Intel 64 and IA-32
      Architectures Software Developer's Manual.
      
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: "H Peter Anvin" <hpa@zytor.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1560994438-235698-2-git-send-email-fenghua.yu@intel.com
      6dbbf5ec
    • Andy Lutomirski's avatar
      x86/vdso: Give the [ph]vclock_page declarations real types · ecf9db3d
      Andy Lutomirski authored
      
      
      Clean up the vDSO code a bit by giving pvclock_page and hvclock_page
      their actual types instead of u8[PAGE_SIZE].  This shouldn't
      materially affect the generated code.
      
      Heavily based on a patch from Linus.
      
      [ tglx: Adapted to the unified VDSO code ]
      
      Co-developed-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/6920c5188f8658001af1fc56fd35b815706d300c.1561241273.git.luto@kernel.org
      ecf9db3d
  15. Jun 22, 2019
  16. Jun 20, 2019
    • Christian Brauner's avatar
      arch: handle arches who do not yet define clone3 · d68dbb0c
      Christian Brauner authored
      
      
      This cleanly handles arches who do not yet define clone3.
      
      clone3() was initially placed under __ARCH_WANT_SYS_CLONE under the
      assumption that this would cleanly handle all architectures. It does
      not.
      Architectures such as nios2 or h8300 simply take the asm-generic syscall
      definitions and generate their syscall table from it. Since they don't
      define __ARCH_WANT_SYS_CLONE the build would fail complaining about
      sys_clone3 missing. The reason this doesn't happen for legacy clone is
      that nios2 and h8300 provide assembly stubs for sys_clone. This seems to
      be done for architectural reasons.
      
      The build failures for nios2 and h8300 were caught int -next luckily.
      The solution is to define __ARCH_WANT_SYS_CLONE3 that architectures can
      add. Additionally, we need a cond_syscall(clone3) for architectures such
      as nios2 or h8300 that generate their syscall table in the way I
      explained above.
      
      Fixes: 8f3220a8 ("arch: wire-up clone3() syscall")
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Adrian Reber <adrian@lisas.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: x86@kernel.org
      d68dbb0c
Loading