Skip to content
  1. Feb 03, 2019
  2. Feb 01, 2019
  3. Jan 26, 2019
  4. Jan 25, 2019
  5. Jan 23, 2019
  6. Jan 22, 2019
  7. Jan 20, 2019
    • Will Deacon's avatar
      x86: uaccess: Inhibit speculation past access_ok() in user_access_begin() · 6e693b3f
      Will Deacon authored
      
      
      Commit 594cc251 ("make 'user_access_begin()' do 'access_ok()'")
      makes the access_ok() check part of the user_access_begin() preceding a
      series of 'unsafe' accesses.  This has the desirable effect of ensuring
      that all 'unsafe' accesses have been range-checked, without having to
      pick through all of the callsites to verify whether the appropriate
      checking has been made.
      
      However, the consolidated range check does not inhibit speculation, so
      it is still up to the caller to ensure that they are not susceptible to
      any speculative side-channel attacks for user addresses that ultimately
      fail the access_ok() check.
      
      This is an oversight, so use __uaccess_begin_nospec() to ensure that
      speculation is inhibited until the access_ok() check has passed.
      
      Reported-by: default avatarJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e693b3f
  8. Jan 19, 2019
  9. Jan 18, 2019
    • Eugeniy Paltsev's avatar
      ARCv2: lib: memeset: fix doing prefetchw outside of buffer · e6a72b7d
      Eugeniy Paltsev authored
      
      
      ARCv2 optimized memset uses PREFETCHW instruction for prefetching the
      next cache line but doesn't ensure that the line is not past the end of
      the buffer. PRETECHW changes the line ownership and marks it dirty,
      which can cause issues in SMP config when next line was already owned by
      other core. Fix the issue by avoiding the PREFETCHW
      
      Some more details:
      
      The current code has 3 logical loops (ignroing the unaligned part)
        (a) Big loop for doing aligned 64 bytes per iteration with PREALLOC
        (b) Loop for 32 x 2 bytes with PREFETCHW
        (c) any left over bytes
      
      loop (a) was already eliding the last 64 bytes, so PREALLOC was
      safe. The fix was removing PREFETCW from (b).
      
      Another potential issue (applicable to configs with 32 or 128 byte L1
      cache line) is that PREALLOC assumes 64 byte cache line and may not do
      the right thing specially for 32b. While it would be easy to adapt,
      there are no known configs with those lie sizes, so for now, just
      compile out PREALLOC in such cases.
      
      Signed-off-by: default avatarEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Cc: stable@vger.kernel.org #4.4+
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      [vgupta: rewrote changelog, used asm .macro vs. "C" macro]
      e6a72b7d
    • Vineet Gupta's avatar
      ARC: mm: do_page_fault fixes #1: relinquish mmap_sem if signal arrives while handle_mm_fault · 4d447455
      Vineet Gupta authored
      
      
      do_page_fault() forgot to relinquish mmap_sem if a signal came while
      handling handle_mm_fault() - due to say a ctl+c or oom etc.
      This would later cause a deadlock by acquiring it twice.
      
      This came to light when running libc testsuite tst-tls3-malloc test but
      is likely also the cause for prior seen LTP failures. Using lockdep
      clearly showed what the issue was.
      
      | # while true; do ./tst-tls3-malloc ; done
      | Didn't expect signal from child: got `Segmentation fault'
      | ^C
      | ============================================
      | WARNING: possible recursive locking detected
      | 4.17.0+ #25 Not tainted
      | --------------------------------------------
      | tst-tls3-malloc/510 is trying to acquire lock:
      | 606c7728 (&mm->mmap_sem){++++}, at: __might_fault+0x28/0x5c
      |
      |but task is already holding lock:
      |606c7728 (&mm->mmap_sem){++++}, at: do_page_fault+0x9c/0x2a0
      |
      | other info that might help us debug this:
      |  Possible unsafe locking scenario:
      |
      |       CPU0
      |       ----
      |  lock(&mm->mmap_sem);
      |  lock(&mm->mmap_sem);
      |
      | *** DEADLOCK ***
      |
      
      ------------------------------------------------------------
      What the change does is not obvious (note to myself)
      
      prior code was
      
      | do_page_fault
      |
      |   down_read()		<-- lock taken
      |   handle_mm_fault	<-- signal pending as this runs
      |   if fatal_signal_pending
      |       if VM_FAULT_ERROR
      |           up_read
      |       if user_mode
      |          return	<-- lock still held, this was the BUG
      
      New code
      
      | do_page_fault
      |
      |   down_read()		<-- lock taken
      |   handle_mm_fault	<-- signal pending as this runs
      |   if fatal_signal_pending
      |       if VM_FAULT_RETRY
      |          return       <-- not same case as above, but still OK since
      |                           core mm already relinq lock for FAULT_RETRY
      |    ...
      |
      |   < Now falls through for bug case above >
      |
      |   up_read()		<-- lock relinquished
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      4d447455
  10. Jan 17, 2019
    • Jan Beulich's avatar
      x86/entry/64/compat: Fix stack switching for XEN PV · fc24d75a
      Jan Beulich authored
      
      
      While in the native case entry into the kernel happens on the trampoline
      stack, PV Xen kernels get entered with the current thread stack right
      away. Hence source and destination stacks are identical in that case,
      and special care is needed.
      
      Other than in sync_regs() the copying done on the INT80 path isn't
      NMI / #MC safe, as either of these events occurring in the middle of the
      stack copying would clobber data on the (source) stack.
      
      There is similar code in interrupt_entry() and nmi(), but there is no fixup
      required because those code paths are unreachable in XEN PV guests.
      
      [ tglx: Sanitized subject, changelog, Fixes tag and stable mail address. Sigh ]
      
      Fixes: 7f2590a1 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: xen-devel@lists.xenproject.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/5C3E1128020000780020DFAD@prv1-mh.provo.novell.com
      fc24d75a
    • David Rheinsberg's avatar
      net: introduce SO_BINDTOIFINDEX sockopt · f5dd3d0c
      David Rheinsberg authored
      
      
      This introduces a new generic SOL_SOCKET-level socket option called
      SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a
      network interface index as argument, rather than the network interface
      name.
      
      User-space often refers to network-interfaces via their index, but has
      to temporarily resolve it to a name for a call into SO_BINDTODEVICE.
      This might pose problems when the network-device is renamed
      asynchronously by other parts of the system. When this happens, the
      SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong
      device.
      
      In most cases user-space only ever operates on devices which they
      either manage themselves, or otherwise have a guarantee that the device
      name will not change (e.g., devices that are UP cannot be renamed).
      However, particularly in libraries this guarantee is non-obvious and it
      would be nice if that race-condition would simply not exist. It would
      make it easier for those libraries to operate even in situations where
      the device-name might change under the hood.
      
      A real use-case that we recently hit is trying to start the network
      stack early in the initrd but make it survive into the real system.
      Existing distributions rename network-interfaces during the transition
      from initrd into the real system. This, obviously, cannot affect
      devices that are up and running (unless you also consider moving them
      between network-namespaces). However, the network manager now has to
      make sure its management engine for dormant devices will not run in
      parallel to these renames. Particularly, when you offload operations
      like DHCP into separate processes, these might setup their sockets
      early, and thus have to resolve the device-name possibly running into
      this race-condition.
      
      By avoiding a call to resolve the device-name, we no longer depend on
      the name and can run network setup of dormant devices in parallel to
      the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this
      race.
      
      Reviewed-by: default avatarTom Gundersen <teg@jklm.no>
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5dd3d0c
    • Vineet Gupta's avatar
      ARC: show_regs: lockdep: re-enable preemption · f731a8e8
      Vineet Gupta authored
      
      
      signal handling core calls show_regs() with preemption disabled which
      on ARC takes mmap_sem for mm/vma access, causing lockdep splat.
      
      | [ARCLinux]# ./segv-null-ptr
      | potentially unexpected fatal signal 11.
      | BUG: sleeping function called from invalid context at kernel/fork.c:1011
      | in_atomic(): 1, irqs_disabled(): 0, pid: 70, name: segv-null-ptr
      | no locks held by segv-null-ptr/70.
      | CPU: 0 PID: 70 Comm: segv-null-ptr Not tainted 4.18.0+ #69
      |
      | Stack Trace:
      |  arc_unwind_core+0xcc/0x100
      |  ___might_sleep+0x17a/0x190
      |  mmput+0x16/0xb8
      |  show_regs+0x52/0x310
      |  get_signal+0x5ee/0x610
      |  do_signal+0x2c/0x218
      |  resume_user_mode_begin+0x90/0xd8
      
      Workaround by re-enabling preemption temporarily.
      
      Note that the preemption disabling in core code around show_regs()
      was introduced by commit 3a9f84d3 ("signals, debug: fix BUG: using
      smp_processor_id() in preemptible code in print_fatal_signal()")
      
      to silence a differnt lockdep seen on x86 bakc in 2009.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      f731a8e8
    • Vineet Gupta's avatar
      ARC: show_regs: lockdep: avoid page allocator... · ab6c0367
      Vineet Gupta authored
      
      
      and use smaller/on-stack buffer instead
      
      The motivation for this change was lockdep splat like below.
      
      | potentially unexpected fatal signal 11.
      | BUG: sleeping function called from invalid context at ../mm/page_alloc.c:4317
      | in_atomic(): 1, irqs_disabled(): 0, pid: 57, name: segv
      | no locks held by segv/57.
      | Preemption disabled at:
      | [<8182f17e>] get_signal+0x4a6/0x7c4
      | CPU: 0 PID: 57 Comm: segv Not tainted 4.17.0+ #23
      |
      | Stack Trace:
      |  arc_unwind_core.constprop.1+0xd0/0xf4
      |  __might_sleep+0x1f6/0x234
      |  __get_free_pages+0x174/0xca0
      |  show_regs+0x22/0x330
      |  get_signal+0x4ac/0x7c4     # print_fatal_signals() -> preempt_disable()
      |  do_signal+0x30/0x224
      |  resume_user_mode_begin+0x90/0xd8
      
      So signal handling core calls show_regs() with preemption disabled but
      an ensuing GFP_KERNEL page allocator call is flagged by lockdep.
      
      We could have switched to GFP_NOWAIT, but turns out that is not enough
      anways and eliding page allocator call leads to less code and
      instruction traces to sift thru when debugging pesky crashes.
      
      FWIW, this patch doesn't cure the lockdep splat (which next patch does).
      
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      ab6c0367
    • Eugeniy Paltsev's avatar
      ARC: perf: avoid kernel killing where it is possible · 29133260
      Eugeniy Paltsev authored
      
      
      No, not gonna die tonight.
      
      Signed-off-by: default avatarEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      29133260
    • Eugeniy Paltsev's avatar
      ARC: perf: move HW events mapping to separate function · baf9cc85
      Eugeniy Paltsev authored
      
      
      Move HW events mapping to separate function to make code more readable.
      
      Signed-off-by: default avatarEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      baf9cc85
    • Eugeniy Paltsev's avatar
      ARC: perf: introduce Kernel PMU events support · 0e956150
      Eugeniy Paltsev authored
      
      
      Export all available ARC architected hardware events as
      kernel PMU events to make non-generic events accessible.
      
      ARC PMU HW allow us to read the list of all available
      events names. So we generate kernel PMU event list
      dynamically in arc_pmu_device_probe() using
      human-readable events names we got from HW instead of
      using pre-defined events list.
      
      -------------------------->8--------------------------
      $ perf list
        [snip]
        arc_pmu/bdata64/                  [Kernel PMU event]
        arc_pmu/bdcstall/                 [Kernel PMU event]
        arc_pmu/bdslot/                   [Kernel PMU event]
        arc_pmu/bfbmp/                    [Kernel PMU event]
        arc_pmu/bfirqex/                  [Kernel PMU event]
        arc_pmu/bflgstal/                 [Kernel PMU event]
        arc_pmu/bflush/                   [Kernel PMU event]
      -------------------------->8--------------------------
      
      Signed-off-by: default avatarEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      0e956150
    • Eugeniy Paltsev's avatar
      ARC: perf: trivial code cleanup · 14f81a91
      Eugeniy Paltsev authored
      
      
      * Use BIT(), lower_32_bits(), upper_32_bits() macroses,
        fix code style violations.
      * Use u32, u64, s64 instead of uint32_t, uint64_t, int64_t
      * Fix description comment as this code doesn't belong only to
        ARC700 anymore.
      * Use SPDX License Identifier.
      * Remove useless ifdefs. ifdef around 'arc_pmu_match' structure
        declaration is useless as we refer to 'arc_pmu_match' in
        several places which aren't guarded with ifdef. Nevertheless
        'ARC' option selects 'OF' unconditionally so we can simply
        get rid of this ifdef.
      
      Acked-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      14f81a91
    • Eugeniy Paltsev's avatar
      ARC: perf: map generic branches to correct hardware condition · 3affbf0e
      Eugeniy Paltsev authored
      
      
      So far we've mapped branches to "ijmp" which also counts conditional
      branches NOT taken. This makes us different from other architectures
      such as ARM which seem to be counting only taken branches.
      
      So use "ijmptak" hardware condition which only counts (all jump
      instructions that are taken)
      
      'ijmptak' event is available on both ARCompact and ARCv2 ISA based
      cores.
      
      Signed-off-by: default avatarEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      [vgupta: reworked changelog]
      3affbf0e
Loading