Skip to content
  1. Jan 31, 2017
  2. Dec 29, 2016
    • Linus Torvalds's avatar
      mm: optimize PageWaiters bit use for unlock_page() · b91e1302
      Linus Torvalds authored
      
      
      In commit 62906027 ("mm: add PageWaiters indicating tasks are
      waiting for a page bit") Nick Piggin made our page locking no longer
      unconditionally touch the hashed page waitqueue, which not only helps
      performance in general, but is particularly helpful on NUMA machines
      where the hashed wait queues can bounce around a lot.
      
      However, the "clear lock bit atomically and then test the waiters bit"
      sequence turns out to be much more expensive than it needs to be,
      because you get a nasty stall when trying to access the same word that
      just got updated atomically.
      
      On architectures where locking is done with LL/SC, this would be trivial
      to fix with a new primitive that clears one bit and tests another
      atomically, but that ends up not working on x86, where the only atomic
      operations that return the result end up being cmpxchg and xadd.  The
      atomic bit operations return the old value of the same bit we changed,
      not the value of an unrelated bit.
      
      On x86, we could put the lock bit in the high bit of the byte, and use
      "xadd" with that bit (where the overflow ends up not touching other
      bits), and look at the other bits of the result.  However, an even
      simpler model is to just use a regular atomic "and" to clear the lock
      bit, and then the sign bit in eflags will indicate the resulting state
      of the unrelated bit #7.
      
      So by moving the PageWaiters bit up to bit #7, we can atomically clear
      the lock bit and test the waiters bit on x86 too.  And architectures
      with LL/SC (which is all the usual RISC suspects), the particular bit
      doesn't matter, so they are fine with this approach too.
      
      This avoids the extra access to the same atomic word, and thus avoids
      the costly stall at page unlock time.
      
      The only downside is that the interface ends up being a bit odd and
      specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
      love the resulting name of the new primitive, but I'd rather make the
      name be descriptive and very clear about the limitation imposed by
      trying to work across all relevant architectures than make it be some
      generic thing that doesn't make the odd semantics explicit.
      
      So this introduces the new architecture primitive
      
          clear_bit_unlock_is_negative_byte();
      
      and adds the trivial implementation for x86.  We have a generic
      non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
      combination) which can be overridden by any architecture that can do
      better.  According to Nick, Power has the same hickup x86 has, for
      example, but some other architectures may not even care.
      
      All these optimizations mean that my page locking stress-test (which is
      just executing a lot of small short-lived shell scripts: "make test" in
      the git source tree) no longer makes our page locking look horribly bad.
      Before all these optimizations, just the unlock_page() costs were just
      over 3% of all CPU overhead on "make test".  After this, it's down to
      0.66%, so just a quarter of the cost it used to be.
      
      (The difference on NUMA is bigger, but there this micro-optimization is
      likely less noticeable, since the big issue on NUMA was not the accesses
      to 'struct page', but the waitqueue accesses that were already removed
      by Nick's earlier commit).
      
      Acked-by: default avatarNick Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b91e1302
  3. Dec 27, 2016
  4. Dec 26, 2016
    • Al Viro's avatar
      arm64: don't pull uaccess.h into *.S · b4b8664d
      Al Viro authored
      
      
      Split asm-only parts of arm64 uaccess.h into a new header and use that
      from *.S.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b4b8664d
    • Larry Finger's avatar
      powerpc: Fix build warning on 32-bit PPC · 8ae679c4
      Larry Finger authored
      
      
      I am getting the following warning when I build kernel 4.9-git on my
      PowerBook G4 with a 32-bit PPC processor:
      
          AS      arch/powerpc/kernel/misc_32.o
        arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
      
      This problem is evident after commit 989cea5c ("kbuild: prevent
      lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
      error that has been in the code since 2005 when this source file was
      created.  That was with commit 9994a338 ("powerpc: Introduce
      entry_{32,64}.S, misc_{32,64}.S, systbl.S").
      
      The offending line does not make a lot of sense.  This error does not
      seem to cause any errors in the executable, thus I am not recommending
      that it be applied to any stable versions.
      
      Thanks to Nicholas Piggin for suggesting this solution.
      
      Fixes: 9994a338 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ae679c4
  5. Dec 25, 2016
    • Thomas Gleixner's avatar
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner authored
      
      
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
    • Thomas Gleixner's avatar
      clocksource: Use a plain u64 instead of cycle_t · a5a1d1c2
      Thomas Gleixner authored
      
      
      There is no point in having an extra type for extra confusion. u64 is
      unambiguous.
      
      Conversion was done with the following coccinelle script:
      
      @rem@
      @@
      -typedef u64 cycle_t;
      
      @fix@
      typedef cycle_t;
      @@
      -cycle_t
      +u64
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      a5a1d1c2
    • Thomas Gleixner's avatar
      cpu/hotplug: Cleanup state names · 73c1b41e
      Thomas Gleixner authored
      
      
      When the state names got added a script was used to add the extra argument
      to the calls. The script basically converted the state constant to a
      string, but the cleanup to convert these strings into meaningful ones did
      not happen.
      
      Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
      are used in all the other places already.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      73c1b41e
    • Thomas Gleixner's avatar
      x86/msr: Remove bogus cleanup from the error path · 59fefd08
      Thomas Gleixner authored
      
      
      The error cleanup which is invoked when the hotplug state setup failed
      tries to remove the failed state, which is broken.
      
      Fixes: 8fba38c9 ("x86/msr: Convert to hotplug state machine")
      Reported-by: default avatarkernel test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      59fefd08
    • Thomas Gleixner's avatar
      perf/x86/intel/cstate: Prevent hotplug callback leak · 834fcd29
      Thomas Gleixner authored
      
      
      If the pmu registration fails the registered hotplug callbacks are not
      removed. Wrong in any case, but fatal in case of a modular driver.
      
      Replace the nonsensical state names with proper ones while at it.
      
      Fixes: 77c34ef1 ("perf/x86/intel/cstate: Convert Intel CSTATE to hotplug state machine")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      834fcd29
    • Thomas Gleixner's avatar
      ARM/imx/mmcd: Fix broken cpu hotplug handling · a051f220
      Thomas Gleixner authored
      
      
      The cpu hotplug support of this perf driver is broken in several ways:
      
      1) It adds a instance before setting up the state.
      
      2) The state for the instance is different from the state of the
         callback. It's just a randomly chosen state.
      
      3) The instance registration is not error checked so nobody noticed that
         the call can never succeed.
      
      4) The state for the multi install callbacks is chosen randomly and
         overwrites existing state. This is now prevented by the core code so the
         call is guaranteed to fail.
      
      5) The error exit path in the init function leaves the instance registered
         and then frees the memory which contains the enqueued hlist node.
      
      6) The remove function is removing the state and not the instance.
      
      Fix it by:
      
      - Setting up the state before adding instances. Use a dynamically allocated
        state for it.
      
      - Installing instances after the state has been set up
      
      - Removing the instance in the error path before freeing memory
      
      - Removing the instance not the state in the driver remove callback
      
      While at is use raw_cpu_processor_id(), because cpu_processor_id() cannot
      be used in preemptible context, and set the driver data after successful
      registration of the pmu.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarShawn Guo <shawnguo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Frank Li <frank.li@nxp.com>
      Cc: Zhengyu Shen <zhengyu.shen@nxp.com>
      Link: http://lkml.kernel.org/r/20161221192111.596204211@linutronix.de
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a051f220
  6. Dec 24, 2016
  7. Dec 23, 2016
    • Josh Poimboeuf's avatar
      Revert "x86/unwind: Detect bad stack return address" · c280f773
      Josh Poimboeuf authored
      
      
      Revert the following commit:
      
        b6959a36 ("x86/unwind: Detect bad stack return address")
      
      ... because Andrey Konovalov reported an unwinder warning:
      
        WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467
      
      The unwind was initiated from an interrupt which occurred while running in the
      generated code for a kprobe.  The unwinder printed the warning because it
      expected regs->ip to point to a valid text address, but instead it pointed to
      the generated code.
      
      Eventually we may want come up with a way to identify generated kprobe
      code so the unwinder can know that it's a valid return address.  Until
      then, just remove the warning.
      
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c280f773
  8. Dec 22, 2016
    • Peter Zijlstra's avatar
      perf/x86: Fix overlap counter scheduling bug · 1134c2b5
      Peter Zijlstra authored
      
      
      Jiri reported the overlap scheduling exceeding its max stack.
      
      Looking at the constraint that triggered this, it turns out the
      overlap marker isn't needed.
      
      The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if
      the counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight".
      
      Esp. that latter part is of interest here I think, our overlapping mask
      is 0x0e, that has 3 bits set and is the highest weight mask in on the
      PMU, therefore it will be placed last. Can we still create a scenario
      where we would need to rewind that?
      
      The scenario for AMD Fam15h is we're having masks like:
      
      	0x3F -- 111111
      	0x38 -- 111000
      	0x07 -- 000111
      
      	0x09 -- 001001
      
      And we mark 0x09 as overlapping, because it is not a direct subset of
      0x38 or 0x07 and has less weight than either of those. This means we'll
      first try and place the 0x09 event, then try and place 0x38/0x07 events.
      Now imagine we have:
      
      	3 * 0x07 + 0x09
      
      and the initial pick for the 0x09 event is counter 0, then we'll fail to
      place all 0x07 events. So we'll pop back, try counter 4 for the 0x09
      event, and then re-try all 0x07 events, which will now work.
      
      The masks on the PMU in question are:
      
        0x01 - 0001
        0x03 - 0011
        0x0e - 1110
        0x0c - 1100
      
      But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 ->
      0x1) are of heavier weight, it should all work out.
      
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Liang Kan <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1134c2b5
    • Stephane Eranian's avatar
      perf/x86/pebs: Fix handling of PEBS buffer overflows · daa864b8
      Stephane Eranian authored
      
      
      This patch solves a race condition between PEBS and the PMU handler.
      
      In case multiple PEBS events are sampled at the same time,
      it is possible to have GLOBAL_STATUS bit 62 set indicating
      PEBS buffer overflow and also seeing at most 3 PEBS counters
      having their bits set in the status register. This is a sign
      that there was at least one PEBS record pending at the time
      of the PMU interrupt. PEBS counters must only be processed
      via the drain_pebs() calls, and not via the regular sample
      processing loop coming after that the function, otherwise
      phony regular samples may be generated in the sampling buffer
      not marked with the EXACT tag.
      
      Another possibility is to have one PEBS event and at least
      one non-PEBS event whic hoverflows while PEBS has armed. In this
      case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow
      status bit for the PEBS counter will be on Skylake.
      
      To avoid this problem, we systematically ignore the PEBS-enabled
      counters from the GLOBAL_STATUS mask and we always process PEBS
      events via drain_pebs().
      
      The problem manifested itself by having non-exact samples when
      sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would
      not have the EXACT flag set.
      
      Note that this problem is only present on Skylake processor.
      This fix is harmless on older processors.
      
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      daa864b8
    • Peter Zijlstra's avatar
      x86/paravirt: Mark unused patch_default label · cef4402d
      Peter Zijlstra authored
      
      
      A bugfix commit:
      
        45dbea5f ("x86/paravirt: Fix native_patch()")
      
      ... introduced a harmless warning:
      
        arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch':
        arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label]
      
      Fix it by annotating the label as __maybe_unused.
      
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reported-by: default avatarPiotr Gregor <piotrgregor@rsyncme.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 45dbea5f ("x86/paravirt: Fix native_patch()")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cef4402d
  9. Dec 21, 2016
  10. Dec 20, 2016
    • Helge Deller's avatar
      parisc: Optimize timer interrupt function · 160494d3
      Helge Deller authored
      
      
      Restructure the timer interrupt function to better cope with missed timer irqs.
      Optimize the calculation when the next interrupt should happen and skip irqs if
      they would happen too shortly after exit of the irq function.
      
      The update_process_times() call is done anyway at every timer irq, so we can
      safely drop the prof_counter and prof_multiplier variables from the per_cpu
      structure.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      160494d3
    • Dongpo Li's avatar
      ARM: dts: hix5hd2: don't change the existing compatible string · 48fed73a
      Dongpo Li authored
      
      
      The SoC hix5hd2 compatible string has the suffix "-gmac" and
      we should not change it.
      We should only add the generic compatible string "hisi-gmac-v1".
      
      Fixes: 0855950b ("ARM: dts: hix5hd2: add gmac generic compatible and clock names")
      Signed-off-by: default avatarDongpo Li <lidongpo@hisilicon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48fed73a
    • Madalin Bucur's avatar
      powerpc: fsl/fman: remove fsl,fman from of_device_ids[] · ae6021d4
      Madalin Bucur authored
      
      
      The fsl/fman drivers will use of_platform_populate() on all
      supported platforms. Call of_platform_populate() to probe the
      FMan sub-nodes.
      
      Signed-off-by: default avatarIgal Liberman <igal.liberman@freescale.com>
      Signed-off-by: default avatarMadalin Bucur <madalin.bucur@nxp.com>
      Acked-by: default avatarScott Wood <oss@buserror.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae6021d4
    • Alexander Popov's avatar
      arm64: setup: introduce kaslr_offset() · 7ede8665
      Alexander Popov authored
      Introduce kaslr_offset() similar to x86_64 to fix kcov.
      
      [ Updated by Will Deacon ]
      
      Link: http://lkml.kernel.org/r/1481417456-28826-2-git-send-email-alex.popov@linux.com
      
      
      Signed-off-by: default avatarAlexander Popov <alex.popov@linux.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ede8665
    • Thiago Jung Bauermann's avatar
      powerpc: ima: send the kexec buffer to the next kernel · ab6b1d1f
      Thiago Jung Bauermann authored
      The IMA kexec buffer allows the currently running kernel to pass the
      measurement list via a kexec segment to the kernel that will be kexec'd.
      
      This is the architecture-specific part of setting up the IMA kexec
      buffer for the next kernel.  It will be used in the next patch.
      
      Link: http://lkml.kernel.org/r/1480554346-29071-6-git-send-email-zohar@linux.vnet.ibm.com
      
      
      Signed-off-by: default avatarThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andreas Steffen <andreas.steffen@strongswan.org>
      Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab6b1d1f
    • Thiago Jung Bauermann's avatar
      powerpc: ima: get the kexec buffer passed by the previous kernel · 467d2782
      Thiago Jung Bauermann authored
      Patch series "ima: carry the measurement list across kexec", v8.
      
      The TPM PCRs are only reset on a hard reboot.  In order to validate a
      TPM's quote after a soft reboot (eg.  kexec -e), the IMA measurement
      list of the running kernel must be saved and then restored on the
      subsequent boot, possibly of a different architecture.
      
      The existing securityfs binary_runtime_measurements file conveniently
      provides a serialized format of the IMA measurement list.  This patch
      set serializes the measurement list in this format and restores it.
      
      Up to now, the binary_runtime_measurements was defined as architecture
      native format.  The assumption being that userspace could and would
      handle any architecture conversions.  With the ability of carrying the
      measurement list across kexec, possibly from one architecture to a
      different one, the per boot architecture information is lost and with it
      the ability of recalculating the template digest hash.  To resolve this
      problem, without breaking the existing ABI, this patch set introduces
      the boot command line option "ima_canonical_fmt", which is arbitrarily
      defined as little endian.
      
      The need for this boot command line option will be limited to the
      existing version 1 format of the binary_runtime_measurements.
      Subsequent formats will be defined as canonical format (eg.  TPM 2.0
      support for larger digests).
      
      A simplified method of Thiago Bauermann's "kexec buffer handover" patch
      series for carrying the IMA measurement list across kexec is included in
      this patch set.  The simplified method requires all file measurements be
      taken prior to executing the kexec load, as subsequent measurements will
      not be carried across the kexec and restored.
      
      This patch (of 10):
      
      The IMA kexec buffer allows the currently running kernel to pass the
      measurement list via a kexec segment to the kernel that will be kexec'd.
      The second kernel can check whether the previous kernel sent the buffer
      and retrieve it.
      
      This is the architecture-specific part which enables IMA to receive the
      measurement list passed by the previous kernel.  It will be used in the
      next patch.
      
      The change in machine_kexec_64.c is to factor out the logic of removing
      an FDT memory reservation so that it can be used by remove_ima_buffer.
      
      Link: http://lkml.kernel.org/r/1480554346-29071-2-git-send-email-zohar@linux.vnet.ibm.com
      
      
      Signed-off-by: default avatarThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andreas Steffen <andreas.steffen@strongswan.org>
      Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      467d2782
    • Nicolas Iooss's avatar
      x86/platform/intel/quark: Add printf attribute to imr_self_test_result() · 9120cf4f
      Nicolas Iooss authored
      
      
      __printf() attributes help detecting issues in printf() format strings at
      compile time.
      
      Even though imr_selftest.c is only compiled with
      CONFIG_DEBUG_IMR_SELFTEST=y, GCC complains about a missing format
      attribute when compiling allmodconfig with -Wmissing-format-attribute.
      
      Silence this warning by adding the attribute.
      
      Signed-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Acked-by: default avatarBryan O'Donoghue <pure.logic@nexus-software.ie>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161219132144.4108-1-nicolas.iooss_linux@m4x.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9120cf4f
    • Linus Walleij's avatar
      x86/platform/intel-mid: Switch MPU3050 driver to IIO · 634b847b
      Linus Walleij authored
      
      
      The Intel Mid goes in and creates a I2C device for the
      MPU3050 if the input driver for MPU-3050 is activated.
      
      As of commit:
      
        3904b28e ("iio: gyro: Add driver for the MPU-3050 gyroscope")
      
      .. there is a proper and fully featured IIO driver for this
      device, so deprecate the use of the incomplete input driver
      by augmenting the device population code to react to the
      presence of the IIO driver's Kconfig symbol instead.
      
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Acked-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1481722794-4348-1-git-send-email-linus.walleij@linaro.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      634b847b
    • Borislav Petkov's avatar
      x86/alternatives: Do not use sync_core() to serialize I$ · 34bfab0e
      Borislav Petkov authored
      
      
      We use sync_core() in the alternatives code to stop speculative
      execution of prefetched instructions because we are potentially changing
      them and don't want to execute stale bytes.
      
      What it does on most machines is call CPUID which is a serializing
      instruction. And that's expensive.
      
      However, the instruction cache is serialized when we're on the local CPU
      and are changing the data through the same virtual address. So then, we
      don't need the serializing CPUID but a simple control flow change. Last
      being accomplished with a CALL/RET which the noinline causes.
      
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: Matthew Whitehead <tedheadster@gmail.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161203150258.vwr5zzco7ctgc4pe@pd.tnic
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      34bfab0e
    • Vitaly Kuznetsov's avatar
      x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic · 59107e2f
      Vitaly Kuznetsov authored
      
      
      There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
      which injects NMI to the guest. We may want to crash the guest and do kdump
      on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
      allow the kdump kernel to re-establish VMBus connection so it will see
      VMBus devices (storage, network,..).
      
      To properly unload VMBus making it possible to start over during kdump we
      need to do the following:
      
       - Send an 'unload' message to the hypervisor. This can be done on any CPU
         so we do this the crashing CPU.
      
       - Receive the 'unload finished' reply message. WS2012R2 delivers this
         message to the CPU which was used to establish VMBus connection during
         module load and this CPU may differ from the CPU sending 'unload'.
      
      Receiving a VMBus message means the following:
      
       - There is a per-CPU slot in memory for one message. This slot can in
         theory be accessed by any CPU.
      
       - We get an interrupt on the CPU when a message was placed into the slot.
      
       - When we read the message we need to clear the slot and signal the fact
         to the hypervisor. In case there are more messages to this CPU pending
         the hypervisor will deliver the next message. The signaling is done by
         writing to an MSR so this can only be done on the appropriate CPU.
      
      To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
      function which checks message slots for all CPUs in a loop waiting for the
      'unload finished' messages. However, there is an issue which arises when
      these conditions are met:
      
       - We're crashing on a CPU which is different from the one which was used
         to initially contact the hypervisor.
      
       - The CPU which was used for the initial contact is blocked with interrupts
         disabled and there is a message pending in the message slot.
      
      In this case we won't be able to read the 'unload finished' message on the
      crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
      simultaneously: the first CPU entering panic() will proceed to crash and
      all other CPUs will stop themselves with interrupts disabled.
      
      The suggested solution is to handle unknown NMIs for Hyper-V guests on the
      first CPU which gets them only. This will allow us to rely on VMBus
      interrupt handler being able to receive the 'unload finish' message in
      case it is delivered to a different CPU.
      
      The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
      boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.
      
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Cc: devel@linuxdriverproject.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      59107e2f
  11. Dec 19, 2016
Loading