Skip to content
  1. Sep 09, 2017
  2. Aug 30, 2017
  3. Aug 29, 2017
  4. Aug 25, 2017
    • Denys Vlasenko's avatar
      lib/raid6: align AVX512 constants to 512 bits, not bytes · b5e0fff1
      Denys Vlasenko authored
      
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: mingo@redhat.com
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Megha Dey <megha.dey@linux.intel.com>
      Cc: Gayatri Kammela <gayatri.kammela@intel.com>
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      b5e0fff1
    • Peter Zijlstra's avatar
      locking/lockdep/selftests: Add mixed read-write ABBA tests · e9149858
      Peter Zijlstra authored
      
      
      Currently lockdep has limited support for recursive readers, add a few
      mixed read-write ABBA selftests to show the extend of these
      limitations.
      
        [    0.000000] ----------------------------------------------------------------------------
        [    0.000000]                                  | spin |wlock |rlock |mutex | wsem | rsem |
        [    0.000000]   --------------------------------------------------------------------------
      
        [    0.000000]   mixed read-lock/lock-write ABBA:             |FAILED|             |  ok  |
        [    0.000000]    mixed read-lock/lock-read ABBA:             |  ok  |             |  ok  |
        [    0.000000]  mixed write-lock/lock-write ABBA:             |  ok  |             |  ok  |
      
      This clearly illustrates the case where lockdep fails to find a
      deadlock.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boqun.feng@gmail.com
      Cc: byungchul.park@lge.com
      Cc: david@fromorbit.com
      Cc: johannes@sipsolutions.net
      Cc: oleg@redhat.com
      Cc: tj@kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e9149858
  5. Aug 22, 2017
  6. Aug 18, 2017
    • Chris Wilson's avatar
      drm/i915: Replace execbuf vma ht with an idr · d1b48c1e
      Chris Wilson authored
      
      
      This was the competing idea long ago, but it was only with the rewrite
      of the idr as an radixtree and using the radixtree directly ourselves,
      along with the realisation that we can store the vma directly in the
      radixtree and only need a list for the reverse mapping, that made the
      patch performant enough to displace using a hashtable. Though the vma ht
      is fast and doesn't require any extra allocation (as we can embed the node
      inside the vma), it does require a thread for resizing and serialization
      and will have the occasional slow lookup. That is hairy enough to
      investigate alternatives and favour them if equivalent in peak performance.
      One advantage of allocating an indirection entry is that we can support a
      single shared bo between many clients, something that was done on a
      first-come first-serve basis for shared GGTT vma previously. To offset
      the extra allocations, we create yet another kmem_cache for them.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
      d1b48c1e
    • Thomas Gleixner's avatar
      kernel/watchdog: Prevent false positives with turbo modes · 7edaeb68
      Thomas Gleixner authored
      
      
      The hardlockup detector on x86 uses a performance counter based on unhalted
      CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
      performance counter period, so the hrtimer should fire 2-3 times before the
      performance counter NMI fires. The NMI code checks whether the hrtimer
      fired since the last invocation. If not, it assumess a hard lockup.
      
      The calculation of those periods is based on the nominal CPU
      frequency. Turbo modes increase the CPU clock frequency and therefore
      shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
      nominal frequency) the perf/NMI period is shorter than the hrtimer period
      which leads to false positives.
      
      A simple fix would be to shorten the hrtimer period, but that comes with
      the side effect of more frequent hrtimer and softlockup thread wakeups,
      which is not desired.
      
      Implement a low pass filter, which checks the perf/NMI period against
      kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
      elapsed then the event is ignored and postponed to the next perf/NMI.
      
      That solves the problem and avoids the overhead of shorter hrtimer periods
      and more frequent softlockup thread wakeups.
      
      Fixes: 58687acb ("lockup_detector: Combine nmi_watchdog and softlockup detector")
      Reported-and-tested-by: default avatarKan Liang <Kan.liang@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: dzickus@redhat.com
      Cc: prarit@redhat.com
      Cc: ak@linux.intel.com
      Cc: babu.moger@oracle.com
      Cc: peterz@infradead.org
      Cc: eranian@google.com
      Cc: acme@redhat.com
      Cc: stable@vger.kernel.org
      Cc: atomlin@redhat.com
      Cc: akpm@linux-foundation.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
      7edaeb68
  7. Aug 17, 2017
  8. Aug 14, 2017
    • Waiman Long's avatar
      debugobjects: Make kmemleak ignore debug objects · caba4cbb
      Waiman Long authored
      
      
      The allocated debug objects are either on the free list or in the
      hashed bucket lists. So they won't get lost. However if both debug
      objects and kmemleak are enabled and kmemleak scanning is done
      while some of the debug objects are transitioning from one list to
      the others, false negative reporting of memory leaks may happen for
      those objects. For example,
      
      [38687.275678] kmemleak: 12 new suspected memory leaks (see
      /sys/kernel/debug/kmemleak)
      unreferenced object 0xffff92e98aabeb68 (size 40):
        comm "ksmtuned", pid 4344, jiffies 4298403600 (age 906.430s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 d0 bc db 92 e9 92 ff ff  ................
          01 00 00 00 00 00 00 00 38 36 8a 61 e9 92 ff ff  ........86.a....
        backtrace:
          [<ffffffff8fa5378a>] kmemleak_alloc+0x4a/0xa0
          [<ffffffff8f47c019>] kmem_cache_alloc+0xe9/0x320
          [<ffffffff8f62ed96>] __debug_object_init+0x3e6/0x400
          [<ffffffff8f62ef01>] debug_object_activate+0x131/0x210
          [<ffffffff8f330d9f>] __call_rcu+0x3f/0x400
          [<ffffffff8f33117d>] call_rcu_sched+0x1d/0x20
          [<ffffffff8f4a183c>] put_object+0x2c/0x40
          [<ffffffff8f4a188c>] __delete_object+0x3c/0x50
          [<ffffffff8f4a18bd>] delete_object_full+0x1d/0x20
          [<ffffffff8fa535c2>] kmemleak_free+0x32/0x80
          [<ffffffff8f47af07>] kmem_cache_free+0x77/0x350
          [<ffffffff8f453912>] unlink_anon_vmas+0x82/0x1e0
          [<ffffffff8f440341>] free_pgtables+0xa1/0x110
          [<ffffffff8f44af91>] exit_mmap+0xc1/0x170
          [<ffffffff8f29db60>] mmput+0x80/0x150
          [<ffffffff8f2a7609>] do_exit+0x2a9/0xd20
      
      The references in the debug objects may also hide a real memory leak.
      
      As there is no point in having kmemleak to track debug object
      allocations, kmemleak checking is now disabled for debug objects.
      
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1502718733-8527-1-git-send-email-longman@redhat.com
      caba4cbb
  9. Aug 10, 2017
  10. Aug 09, 2017
    • Daniel Borkmann's avatar
      bpf: add BPF_J{LT,LE,SLT,SLE} instructions · 92b31a9a
      Daniel Borkmann authored
      Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=),
      BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that
      particularly *JLT/*JLE counterparts involving immediates need
      to be rewritten from e.g. X < [IMM] by swapping arguments into
      [IMM] > X, meaning the immediate first is required to be loaded
      into a register Y := [IMM], such that then we can compare with
      Y > X. Note that the destination operand is always required to
      be a register.
      
      This has the downside of having unnecessarily increased register
      pressure, meaning complex program would need to spill other
      registers temporarily to stack in order to obtain an unused
      register for the [IMM]. Loading to registers will thus also
      affect state pruning since we need to account for that register
      use and potentially those registers that had to be spilled/filled
      again. As a consequence slightly more stack space might have
      been used due to spilling, and BPF programs are a bit longer
      due to extra code involving the register load and potentially
      required spill/fills.
      
      Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=)
      counterparts to the eBPF instruction set. Modifying LLVM to
      remove the NegateCC() workaround in a PoC patch at [1] and
      allowing it to also emit the new instructions resulted in
      cilium's BPF programs that are injected into the fast-path to
      have a reduced program length in the range of 2-3% (e.g.
      accumulated main and tail call sections from one of the object
      file reduced from 4864 to 4729 insns), reduced complexity in
      the range of 10-30% (e.g. accumulated sections reduced in one
      of the cases from 116432 to 88428 insns), and reduced stack
      usage in the range of 1-5% (e.g. accumulated sections from one
      of the object files reduced from 824 to 784b).
      
      The modification for LLVM will be incorporated in a backwards
      compatible way. Plan is for LLVM to have i) a target specific
      option to offer a possibility to explicitly enable the extension
      by the user (as we have with -m target specific extensions today
      for various CPU insns), and ii) have the kernel checked for
      presence of the extensions and enable them transparently when
      the user is selecting more aggressive options such as -march=native
      in a bpf target context. (Other frontends generating BPF byte
      code, e.g. ply can probe the kernel directly for its code
      generation.)
      
        [1] https://github.com/borkmann/llvm/tree/bpf-insns
      
      
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92b31a9a
    • Ard Biesheuvel's avatar
      md/raid6: implement recovery using ARM NEON intrinsics · 6ec4e251
      Ard Biesheuvel authored
      
      
      Provide a NEON accelerated implementation of the recovery algorithm,
      which supersedes the default byte-by-byte one.
      
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      6ec4e251
    • Ard Biesheuvel's avatar
      md/raid6: use faster multiplication for ARM NEON delta syndrome · 35129dde
      Ard Biesheuvel authored
      
      
      The P/Q left side optimization in the delta syndrome simply involves
      repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given
      that 'x * x * x * x' equals 'x^4' even in the polynomial world, we
      can accelerate this substantially by performing up to 4 such operations
      at once, using the NEON instructions for polynomial multiplication.
      
      Results on a Cortex-A57 running in 64-bit mode:
      
        Before:
        -------
        raid6: neonx1   xor()  1680 MB/s
        raid6: neonx2   xor()  2286 MB/s
        raid6: neonx4   xor()  3162 MB/s
        raid6: neonx8   xor()  3389 MB/s
      
        After:
        ------
        raid6: neonx1   xor()  2281 MB/s
        raid6: neonx2   xor()  3362 MB/s
        raid6: neonx4   xor()  3787 MB/s
        raid6: neonx8   xor()  4239 MB/s
      
      While we're at it, simplify MASK() by using a signed shift rather than
      a vector compare involving a temp register.
      
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      35129dde
  11. Jul 31, 2017
    • Phil Sutter's avatar
      netlink: Introduce nla_strdup() · 2cf0c8b3
      Phil Sutter authored
      
      
      This is similar to strdup() for netlink string attributes.
      
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2cf0c8b3
    • Jamal Hadi Salim's avatar
      net netlink: Add new type NLA_BITFIELD32 · 64c83d83
      Jamal Hadi Salim authored
      
      
      Generic bitflags attribute content sent to the kernel by user.
      With this netlink attr type the user can either set or unset a
      flag in the kernel.
      
      The value is a bitmap that defines the bit values being set
      The selector is a bitmask that defines which value bit is to be
      considered.
      
      A check is made to ensure the rules that a kernel subsystem always
      conforms to bitflags the kernel already knows about. i.e
      if the user tries to set a bit flag that is not understood then
      the _it will be rejected_.
      
      In the most basic form, the user specifies the attribute policy as:
      [ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },
      
      where myvalidflags is the bit mask of the flags the kernel understands.
      
      If the user _does not_ provide myvalidflags then the attribute will
      also be rejected.
      
      Examples:
      value = 0x0, and selector = 0x1
      implies we are selecting bit 1 and we want to set its value to 0.
      
      value = 0x2, and selector = 0x2
      implies we are selecting bit 2 and we want to set its value to 1.
      
      Suggested-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64c83d83
  12. Jul 26, 2017
    • Jeff Layton's avatar
      errseq: rename __errseq_set to errseq_set · 3acdfd28
      Jeff Layton authored
      
      
      Nothing calls this wrapper anymore, so just remove it and rename the
      old function to get rid of the double underscore prefix.
      
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      3acdfd28
    • Josh Poimboeuf's avatar
      x86/kconfig: Make it easier to switch to the new ORC unwinder · a34a766f
      Josh Poimboeuf authored
      
      
      A couple of Kconfig changes which make it much easier to switch to the
      new CONFIG_ORC_UNWINDER:
      
      1) Remove x86 dependencies on CONFIG_FRAME_POINTER for lockdep,
         latencytop, and fault injection.  x86 has a 'guess' unwinder which
         just scans the stack for kernel text addresses.  It's not 100%
         accurate but in many cases it's good enough.  This allows those users
         who don't want the text overhead of the frame pointer or ORC
         unwinders to still use these features.  More importantly, this also
         makes it much more straightforward to disable frame pointers.
      
      2) Make CONFIG_ORC_UNWINDER depend on !CONFIG_FRAME_POINTER.  While it
         would be possible to have both enabled, it doesn't really make sense
         to do so.  So enforce a sane configuration to prevent the user from
         making a dumb mistake.
      
      With these changes, when you disable CONFIG_FRAME_POINTER, "make
      oldconfig" will ask if you want to enable CONFIG_ORC_UNWINDER.
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/9985fb91ce5005fe33ea5cc2a20f14bd33c61d03.1500938583.git.jpoimboe@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a34a766f
    • Josh Poimboeuf's avatar
      x86/unwind: Add the ORC unwinder · ee9f8fce
      Josh Poimboeuf authored
      
      
      Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
      It plugs into the existing x86 unwinder framework.
      
      It relies on objtool to generate the needed .orc_unwind and
      .orc_unwind_ip sections.
      
      For more details on why ORC is used instead of DWARF, see
      Documentation/x86/orc-unwinder.txt - but the short version is
      that it's a simplified, fundamentally more robust debugninfo
      data structure, which also allows up to two orders of magnitude
      faster lookups than the DWARF unwinder - which matters to
      profiling workloads like perf.
      
      Thanks to Andy Lutomirski for the performance improvement ideas:
      splitting the ORC unwind table into two parallel arrays and creating a
      fast lookup table to search a subset of the unwind table.
      
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
      
      
      [ Extended the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ee9f8fce
  13. Jul 25, 2017
  14. Jul 24, 2017
    • Phil Sutter's avatar
      lib: test_rhashtable: fix for large entry counts · e859afe1
      Phil Sutter authored
      
      
      During concurrent access testing, threadfunc() concatenated thread ID
      and object index to create a unique key like so:
      
      | tdata->objs[i].value = (tdata->id << 16) | i;
      
      This breaks if a user passes an entries parameter of 64k or higher,
      since 'i' might use more than 16 bits then. Effectively, this will lead
      to duplicate keys in the table.
      
      Fix the problem by introducing a struct holding object and thread ID and
      using that as key instead of a single integer type field.
      
      Fixes: f4a3e90b ("rhashtable-test: extend to test concurrency")
      Reported by: Manuel Messner <mm@skelett.io>
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e859afe1
Loading