Skip to content
  1. Feb 21, 2019
  2. Feb 03, 2019
    • Deepa Dinamani's avatar
      sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW · a9beb86a
      Deepa Dinamani authored
      
      
      Add new socket timeout options that are y2038 safe.
      
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Cc: ccaulfie@redhat.com
      Cc: davem@davemloft.net
      Cc: deller@gmx.de
      Cc: paulus@samba.org
      Cc: ralf@linux-mips.org
      Cc: rth@twiddle.net
      Cc: cluster-devel@redhat.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9beb86a
    • Deepa Dinamani's avatar
      socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes · 45bdc661
      Deepa Dinamani authored
      
      
      SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
      as the time format. struct timeval is not y2038 safe.
      The subsequent patches in the series add support for new socket
      timeout options with _NEW suffix that will use y2038 safe
      data structures. Although the existing struct timeval layout
      is sufficiently wide to represent timeouts, because of the way
      libc will interpret time_t based on user defined flag, these
      new flags provide a way of having a structure that is the same
      for all architectures consistently.
      Rename the existing options with _OLD suffix forms so that the
      right option is enabled for userspace applications according
      to the architecture and time_t definition of libc.
      
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Cc: ccaulfie@redhat.com
      Cc: deller@gmx.de
      Cc: paulus@samba.org
      Cc: ralf@linux-mips.org
      Cc: rth@twiddle.net
      Cc: cluster-devel@redhat.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45bdc661
    • Deepa Dinamani's avatar
      socket: Add SO_TIMESTAMPING_NEW · 9718475e
      Deepa Dinamani authored
      
      
      Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
      This is the y2038 safe versions of the SO_TIMESTAMPING_OLD
      for all architectures.
      
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Cc: chris@zankel.net
      Cc: fenghua.yu@intel.com
      Cc: rth@twiddle.net
      Cc: tglx@linutronix.de
      Cc: ubraun@linux.ibm.com
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9718475e
    • Deepa Dinamani's avatar
      socket: Add SO_TIMESTAMP[NS]_NEW · 887feae3
      Deepa Dinamani authored
      
      
      Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of
      socket timestamp options.
      These are the y2038 safe versions of the SO_TIMESTAMP_OLD
      and SO_TIMESTAMPNS_OLD for all architectures.
      
      Note that the format of scm_timestamping.ts[0] is not changed
      in this patch.
      
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Cc: jejb@parisc-linux.org
      Cc: ralf@linux-mips.org
      Cc: rth@twiddle.net
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-rdma@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      887feae3
    • Deepa Dinamani's avatar
      sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD · 7f1bc6e9
      Deepa Dinamani authored
      
      
      SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
      way they are currently defined, are not y2038 safe.
      Subsequent patches in the series add new y2038 safe versions
      of these options which provide 64 bit timestamps on all
      architectures uniformly.
      Hence, rename existing options with OLD tag suffixes.
      
      Also note that kernel will not use the untagged SO_TIMESTAMP*
      and SCM_TIMESTAMP* options internally anymore.
      
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Cc: deller@gmx.de
      Cc: dhowells@redhat.com
      Cc: jejb@parisc-linux.org
      Cc: ralf@linux-mips.org
      Cc: rth@twiddle.net
      Cc: linux-afs@lists.infradead.org
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-rdma@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f1bc6e9
  3. Jan 17, 2019
    • David Rheinsberg's avatar
      net: introduce SO_BINDTOIFINDEX sockopt · f5dd3d0c
      David Rheinsberg authored
      
      
      This introduces a new generic SOL_SOCKET-level socket option called
      SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a
      network interface index as argument, rather than the network interface
      name.
      
      User-space often refers to network-interfaces via their index, but has
      to temporarily resolve it to a name for a call into SO_BINDTODEVICE.
      This might pose problems when the network-device is renamed
      asynchronously by other parts of the system. When this happens, the
      SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong
      device.
      
      In most cases user-space only ever operates on devices which they
      either manage themselves, or otherwise have a guarantee that the device
      name will not change (e.g., devices that are UP cannot be renamed).
      However, particularly in libraries this guarantee is non-obvious and it
      would be nice if that race-condition would simply not exist. It would
      make it easier for those libraries to operate even in situations where
      the device-name might change under the hood.
      
      A real use-case that we recently hit is trying to start the network
      stack early in the initrd but make it survive into the real system.
      Existing distributions rename network-interfaces during the transition
      from initrd into the real system. This, obviously, cannot affect
      devices that are up and running (unless you also consider moving them
      between network-namespaces). However, the network manager now has to
      make sure its management engine for dormant devices will not run in
      parallel to these renames. Particularly, when you offload operations
      like DHCP into separate processes, these might setup their sockets
      early, and thus have to resolve the device-name possibly running into
      this race-condition.
      
      By avoiding a call to resolve the device-name, we no longer depend on
      the name and can run network setup of dormant devices in parallel to
      the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this
      race.
      
      Reviewed-by: default avatarTom Gundersen <teg@jklm.no>
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5dd3d0c
  4. Jan 06, 2019
  5. Dec 10, 2018
    • Firoz Khan's avatar
      parisc: generate uapi header and system call table files · 575afc4d
      Firoz Khan authored
      
      
      System call table generation script must be run to gener-
      ate unistd_32/64.h and syscall_table_32/64/c32.h files.
      This patch will have changes which will invokes the script.
      
      This patch will generate unistd_32/64.h and syscall_table-
      _32/64/c32.h files by the syscall table generation script
      invoked by parisc/Makefile and the generated files against
      the removed files must be identical.
      
      The generated uapi header file will be included in uapi/-
      asm/unistd.h and generated system call table header file
      will be included by kernel/syscall.S file.
      
      Signed-off-by: default avatarFiroz Khan <firoz.khan@linaro.org>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      575afc4d
    • Firoz Khan's avatar
      parisc: remove __NR_Linux from uapi header file. · 28ff62a4
      Firoz Khan authored
      
      
      The __NR_Linux defined as 0 to support HP-UX syscalls along
      with an offset to other system call. But support for HP-UX
      is gone and there is no need to define __NR_Linux as 0.
      
      One of the patch in this patch series will generate uapi header
      file which does have offset logic support. But here the offset
      value __NR_Linux defined as 0 and it doesn't make much effect.
      So remove the offset  __NR_Linux from uapi header file.
      
      Signed-off-by: default avatarFiroz Khan <firoz.khan@linaro.org>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      28ff62a4
    • Firoz Khan's avatar
      parisc: add __NR_syscalls along with __NR_Linux_syscalls · dbf91a54
      Firoz Khan authored
      
      
      __NR_Linux_syscalls macro holds the number of system call
      exist in parisc architecture. We have to change the value
      of __NR_Linux_syscalls, if we add or delete a system call.
      
      One of the patch in this patch series has a script which
      will generate a uapi header based on syscall.tbl file.
      The syscall.tbl file contains the total number of system
      calls information. So we have two option to update __NR-
      _Linux_syscalls value.
      
      1. Update __NR_Linux_syscalls in asm/unistd.h manually by
         counting the no.of system calls. No need to update __NR-
         _Linux_syscalls until we either add a new system call or
         delete existing system call.
      
      2. We can keep this feature it above mentioned script,
         that will count the number of syscalls and keep it in
         a generated file. In this case we don't need to expli-
         citly update __NR_Linux_syscalls in asm/unistd.h file.
      
      The 2nd option will be the recommended one. For that, I
      added the __NR_syscalls macro in uapi/asm/unistd.h along
      with __NR_Linux_syscalls asm/unistd.h. The macro __NR_sys-
      calls also added for making the name convention same across
      all architecture. While __NR_syscalls isn't strictly part
      of the uapi, having it as part of the generated header to
      simplifies the implementation. We also need to enclose
      this macro with #ifdef __KERNEL__ to avoid side effects.
      
      Signed-off-by: default avatarFiroz Khan <firoz.khan@linaro.org>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      dbf91a54
    • Firoz Khan's avatar
      parisc: move __IGNORE* entries to non uapi header · dfddd1a8
      Firoz Khan authored
      
      
      All the __IGNORE* entries are resides in the uapi header
      file move to non uapi header asm/unistd.h as it is not
      used by any user space applications.
      
      It is correct to keep __IGNORE* entry in non uapi header
      asm/unistd.h while uapi/asm/unistd.h must hold information
      only useful for user space applications.
      
      One of the patch in this patch series will generate uapi
      header file. The information which directly used by the
      user space application must be present in uapi file.
      
      Signed-off-by: default avatarFiroz Khan <firoz.khan@linaro.org>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      dfddd1a8
  6. Oct 26, 2018
    • Arnd Bergmann's avatar
      parisc64: change __kernel_suseconds_t to match glibc · 9a298b44
      Arnd Bergmann authored
      
      
      There are only two 64-bit architecture ports that have a 32-bit
      suseconds_t: sparc64 and parisc64. I've encountered a number of problems
      with this, while trying to get a proper 64-bit time_t working on 32-bit
      architectures. Having a 32-bit suseconds_t combined with a 64-bit time_t
      means that we get extra padding in data structures that may leak kernel
      stack data to user space, and it breaks all code that assumes that
      timespec and timeval have the same layout.
      
      While we can't change sparc64, it seems that glibc on parisc64 has always
      set suseconds_t to 'long', and the current version would give incorrect
      results for gettimeofday() and many other interfaces: timestamps passed
      from user space into the kernel result in tv_usec being always zero
      (the lower bits contain the intended value but are ignored) while data
      passed from the kernel to user space contains either zeroes or random
      data in tv_usec.
      
      Based on that, it seems best to change the user API in the kernel in
      an incompatible way to match what glibc expects.
      
      Note that the distros I could find (gentoo and debian) all just
      have 32-bit user space, which does not suffer from this problem.
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      9a298b44
  7. Oct 03, 2018
  8. Oct 02, 2018
  9. Aug 13, 2018
    • Helge Deller's avatar
      parisc: Drop architecture-specific ENOTSUP define · 93cb8e20
      Helge Deller authored
      parisc is the only Linux architecture which has defined a value for ENOTSUP.
      All other architectures #define ENOTSUP as EOPNOTSUPP in their libc headers.
      
      Having an own value for ENOTSUP which is different than EOPNOTSUPP often gives
      problems with userspace programs which expect both to be the same.  One such
      example is a build error in the libuv package, as can be seen in
      https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900237
      
      .
      
      Since we dropped HP-UX support, there is no real benefit in keeping an own
      value for ENOTSUP. This patch drops the parisc value for ENOTSUP from the
      kernel sources. glibc needs no patch, it reuses the exported headers.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      93cb8e20
  10. Jul 04, 2018
  11. Jun 28, 2018
  12. Apr 20, 2018
    • Arnd Bergmann's avatar
      y2038: parisc: Extend sysvipc data structures · f69c97f6
      Arnd Bergmann authored
      
      
      parisc, uses a nonstandard variation of the generic sysvipc
      data structures, intended to have the padding moved around
      so it can deal with big-endian 32-bit user space that has
      64-bit time_t.
      
      Unlike most architectures, parisc actually succeeded in
      defining this right for big-endian CPUs, but as everyone else
      got it wrong, we just use the same hack everywhere.
      
      This takes just take the same approach here that we have for
      the asm-generic headers and adds separate 32-bit fields for the
      upper halves of the timestamps, to let libc deal with the mess
      in user space.
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      f69c97f6
  13. Apr 11, 2018
    • Michal Hocko's avatar
      mm: introduce MAP_FIXED_NOREPLACE · a4ff8e86
      Michal Hocko authored
      Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2.
      
      This has started as a follow up discussion [3][4] resulting in the
      runtime failure caused by hardening patch [5] which removes MAP_FIXED
      from the elf loader because MAP_FIXED is inherently dangerous as it
      might silently clobber an existing underlying mapping (e.g.  stack).
      The reason for the failure is that some architectures enforce an
      alignment for the given address hint without MAP_FIXED used (e.g.  for
      shared or file backed mappings).
      
      One way around this would be excluding those archs which do alignment
      tricks from the hardening [6].  The patch is really trivial but it has
      been objected, rightfully so, that this screams for a more generic
      solution.  We basically want a non-destructive MAP_FIXED.
      
      The first patch introduced MAP_FIXED_NOREPLACE which enforces the given
      address but unlike MAP_FIXED it fails with EEXIST if the given range
      conflicts with an existing one.  The flag is introduced as a completely
      new one rather than a MAP_FIXED extension because of the backward
      compatibility.  We really want a never-clobber semantic even on older
      kernels which do not recognize the flag.  Unfortunately mmap sucks
      wrt flags evaluation because we do not EINVAL on unknown flags.  On
      those kernels we would simply use the traditional hint based semantic so
      the caller can still get a different address (which sucks) but at least
      not silently corrupt an existing mapping.  I do not see a good way
      around that.  Except we won't export expose the new semantic to the
      userspace at all.
      
      It seems there are users who would like to have something like that.
      Jemalloc has been mentioned by Michael Ellerman [7]
      
      Florian Weimer has mentioned the following:
      : glibc ld.so currently maps DSOs without hints.  This means that the kernel
      : will map right next to each other, and the offsets between them a completely
      : predictable.  We would like to change that and supply a random address in a
      : window of the address space.  If there is a conflict, we do not want the
      : kernel to pick a non-random address. Instead, we would try again with a
      : random address.
      
      John Hubbard has mentioned CUDA example
      : a) Searches /proc/<pid>/maps for a "suitable" region of available
      : VA space.  "Suitable" generally means it has to have a base address
      : within a certain limited range (a particular device model might
      : have odd limitations, for example), it has to be large enough, and
      : alignment has to be large enough (again, various devices may have
      : constraints that lead us to do this).
      :
      : This is of course subject to races with other threads in the process.
      :
      : Let's say it finds a region starting at va.
      :
      : b) Next it does:
      :     p = mmap(va, ...)
      :
      : *without* setting MAP_FIXED, of course (so va is just a hint), to
      : attempt to safely reserve that region. If p != va, then in most cases,
      : this is a failure (almost certainly due to another thread getting a
      : mapping from that region before we did), and so this layer now has to
      : call munmap(), before returning a "failure: retry" to upper layers.
      :
      :     IMPROVEMENT: --> if instead, we could call this:
      :
      :             p = mmap(va, ... MAP_FIXED_NOREPLACE ...)
      :
      :         , then we could skip the munmap() call upon failure. This
      :         is a small thing, but it is useful here. (Thanks to Piotr
      :         Jaroszynski and Mark Hairgrove for helping me get that detail
      :         exactly right, btw.)
      :
      : c) After that, CUDA suballocates from p, via:
      :
      :      q = mmap(sub_region_start, ... MAP_FIXED ...)
      :
      : Interestingly enough, "freeing" is also done via MAP_FIXED, and
      : setting PROT_NONE to the subregion. Anyway, I just included (c) for
      : general interest.
      
      Atomic address range probing in the multithreaded programs in general
      sounds like an interesting thing to me.
      
      The second patch simply replaces MAP_FIXED use in elf loader by
      MAP_FIXED_NOREPLACE.  I believe other places which rely on MAP_FIXED
      should follow.  Actually real MAP_FIXED usages should be docummented
      properly and they should be more of an exception.
      
      [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org
      [2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org
      [3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
      [4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
      [5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
      [6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
      [7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
      
      This patch (of 2):
      
      MAP_FIXED is used quite often to enforce mapping at the particular range.
      The main problem of this flag is, however, that it is inherently dangerous
      because it unmaps existing mappings covered by the requested range.  This
      can cause silent memory corruptions.  Some of them even with serious
      security implications.  While the current semantic might be really
      desiderable in many cases there are others which would want to enforce the
      given range but rather see a failure than a silent memory corruption on a
      clashing range.  Please note that there is no guarantee that a given range
      is obeyed by the mmap even when it is free - e.g.  arch specific code is
      allowed to apply an alignment.
      
      Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this
      behavior.  It has the same semantic as MAP_FIXED wrt.  the given address
      request with a single exception that it fails with EEXIST if the requested
      address is already covered by an existing mapping.  We still do rely on
      get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
      check for a conflicting vma after it returns.
      
      The flag is introduced as a completely new one rather than a MAP_FIXED
      extension because of the backward compatibility.  We really want a
      never-clobber semantic even on older kernels which do not recognize the
      flag.  Unfortunately mmap sucks wrt.  flags evaluation because we do not
      EINVAL on unknown flags.  On those kernels we would simply use the
      traditional hint based semantic so the caller can still get a different
      address (which sucks) but at least not silently corrupt an existing
      mapping.  I do not see a good way around that.
      
      [mpe@ellerman.id.au: fix whitespace]
      [fail on clashing range with EEXIST as per Florian Weimer]
      [set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
      Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.org
      
      
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Jason Evans <jasone@google.com>
      Cc: David Goldblatt <davidtgoldblatt@gmail.com>
      Cc: Edward Tomasz Napierała <trasz@FreeBSD.org>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4ff8e86
    • Helge Deller's avatar
      parisc/signal: Add FPE_CONDTRAP for conditional trap handling · 75abf642
      Helge Deller authored
      
      
      Posix and common sense requires that SI_USER not be a signal specific
      si_code. Thus add a new FPE_CONDTRAP si_code for conditional traps.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      75abf642
  14. Mar 27, 2018
    • Helge Deller's avatar
      parisc: Convert MAP_TYPE to cover 4 bits on parisc · d5b59a71
      Helge Deller authored
      
      
      On parisc we want to be as much as possible compatible to the major
      architectures like x86. Those architectures have MAP_TYPE defined as 0x0f which
      covers MAP_SHARED and MAP_PRIVATE and leaves two more bits unused.
      
      In contrast, on parisc we have MAP_TYPE defined to 0x03 which covers MAP_SHARED
      and MAP_PRIVATE only. But we don't have the 2 bits free as x86.
      
      Usually that's not a problem, but during the discussions for pmem+dax support
      the idea came up to use the two remaining bits of MAP_TYPE (on x86 and others)
      for the new MAP_DIRECT and MAP_SYNC flags. One requirement is, that an old
      kernel should correctly handle MAP_DIRECT and MAP_SYNC and fail on those if
      set. This only works if MAP_TYPE has 4 bits.
      
      Even though the pmem+dax people now choosed another solution via
      MAP_SHARED_VALIDATE, let's still proceed to be more compatible to x86 by adding
      two more bits for future usage.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      d5b59a71
  15. Jan 12, 2018
    • Eric W. Biederman's avatar
      signal/parisc: Document a conflict with SI_USER with SIGFPE · b5daf2b9
      Eric W. Biederman authored
      Setting si_code to 0 results in a userspace seeing an si_code of 0.
      This is the same si_code as SI_USER.  Posix and common sense requires
      that SI_USER not be a signal specific si_code.  As such this use of 0
      for the si_code is a pretty horribly broken ABI.
      
      Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
      value of __SI_KILL and now sees a value of SIL_KILL with the result
      that uid and pid fields are copied and which might copying the si_addr
      field by accident but certainly not by design.  Making this a very
      flakey implementation.
      
      Utilizing FPE_FIXME siginfo_layout will now return SIL_FAULT and the
      appropriate fields will reliably be copied.
      
      This bug is 13 years old and parsic machines are no longer being built
      so I don't know if it possible or worth fixing it.  But it is at least
      worth documenting this so other architectures don't make the same
      mistake.
      
      Possible ABI fixes includee:
        - Send the signal without siginfo
        - Don't generate a signal
        - Possibly assign and use an appropriate si_code
        - Don't handle cases which can't happen
      
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Ref: 313c01d3e3fd ("[PATCH] PA-RISC update for 2.6.0")
      Histroy Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      b5daf2b9
  16. Dec 05, 2017
    • Hendrik Brueckner's avatar
      bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type · c895f6f7
      Hendrik Brueckner authored
      
      
      Commit 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT
      program type") introduced the bpf_perf_event_data structure which
      exports the pt_regs structure.  This is OK for multiple architectures
      but fail for s390 and arm64 which do not export pt_regs.  Programs
      using them, for example, the bpf selftest fail to compile on these
      architectures.
      
      For s390, exporting the pt_regs is not an option because s390 wants
      to allow changes to it.  For arm64, there is a user_pt_regs structure
      that covers parts of the pt_regs structure for use by user space.
      
      To solve the broken uapi for s390 and arm64, introduce an abstract
      type for pt_regs and add an asm/bpf_perf_event.h file that concretes
      the type.  An asm-generic header file covers the architectures that
      export pt_regs today.
      
      The arch-specific enablement for s390 and arm64 follows in separate
      commits.
      
      Reported-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Fixes: 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type")
      Signed-off-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Reviewed-and-tested-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c895f6f7
  17. Nov 17, 2017
  18. Nov 03, 2017
    • Dan Williams's avatar
      mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags · 1c972597
      Dan Williams authored
      
      
      The mmap(2) syscall suffers from the ABI anti-pattern of not validating
      unknown flags. However, proposals like MAP_SYNC need a mechanism to
      define new behavior that is known to fail on older kernels without the
      support. Define a new MAP_SHARED_VALIDATE flag pattern that is
      guaranteed to fail on all legacy mmap implementations.
      
      It is worth noting that the original proposal was for a standalone
      MAP_VALIDATE flag. However, when that  could not be supported by all
      archs Linus observed:
      
          I see why you *think* you want a bitmap. You think you want
          a bitmap because you want to make MAP_VALIDATE be part of MAP_SYNC
          etc, so that people can do
      
          ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED
      		    | MAP_SYNC, fd, 0);
      
          and "know" that MAP_SYNC actually takes.
      
          And I'm saying that whole wish is bogus. You're fundamentally
          depending on special semantics, just make it explicit. It's already
          not portable, so don't try to make it so.
      
          Rename that MAP_VALIDATE as MAP_SHARED_VALIDATE, make it have a value
          of 0x3, and make people do
      
          ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED_VALIDATE
      		    | MAP_SYNC, fd, 0);
      
          and then the kernel side is easier too (none of that random garbage
          playing games with looking at the "MAP_VALIDATE bit", but just another
          case statement in that map type thing.
      
          Boom. Done.
      
      Similar to ->fallocate() we also want the ability to validate the
      support for new flags on a per ->mmap() 'struct file_operations'
      instance basis.  Towards that end arrange for flags to be generically
      validated against a mmap_supported_flags exported by 'struct
      file_operations'. By default all existing flags are implicitly
      supported, but new flags require MAP_SHARED_VALIDATE and
      per-instance-opt-in.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Suggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      1c972597
  19. Nov 02, 2017
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX license identifier to uapi header files with a license · e2be04c7
      Greg Kroah-Hartman authored
      
      
      Many user space API headers have licensing information, which is either
      incomplete, badly formatted or just a shorthand for referring to the
      license under which the file is supposed to be.  This makes it hard for
      compliance tools to determine the correct license.
      
      Update these files with an SPDX license identifier.  The identifier was
      chosen based on the license information in the file.
      
      GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license
      identifier with the added 'WITH Linux-syscall-note' exception, which is
      the officially assigned exception identifier for the kernel syscall
      exception:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      This exception makes it possible to include GPL headers into non GPL
      code, without confusing license compliance tools.
      
      Headers which have either explicit dual licensing or are just licensed
      under a non GPL license are updated with the corresponding SPDX
      identifier and the GPLv2 with syscall exception identifier.  The format
      is:
              ((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE)
      
      SPDX license identifiers are a legally binding shorthand, which can be
      used instead of the full boiler plate text.  The update does not remove
      existing license information as this has to be done on a case by case
      basis and the copyright holders might have to be consulted. This will
      happen in a separate step.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2be04c7
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX license identifier to uapi header files with no license · 6f52b16c
      Greg Kroah-Hartman authored
      
      
      Many user space API headers are missing licensing information, which
      makes it hard for compliance tools to determine the correct license.
      
      By default are files without license information under the default
      license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
      them from being included in non GPLV2 code, which is obviously not
      intended. The user space API headers fall under the syscall exception
      which is in the kernels COPYING file:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      otherwise syscall usage would not be possible.
      
      Update the files which contain no license information with an SPDX
      license identifier.  The chosen identifier is 'GPL-2.0 WITH
      Linux-syscall-note' which is the officially assigned identifier for the
      Linux syscall exception.  SPDX license identifiers are a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f52b16c
  20. Sep 07, 2017
    • Rik van Riel's avatar
      mm,fork: introduce MADV_WIPEONFORK · d2cd9ede
      Rik van Riel authored
      Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
      in the child process after fork.  This differs from MADV_DONTFORK in one
      important way.
      
      If a child process accesses memory that was MADV_WIPEONFORK, it will get
      zeroes.  The address ranges are still valid, they are just empty.
      
      If a child process accesses memory that was MADV_DONTFORK, it will get a
      segmentation fault, since those address ranges are no longer valid in
      the child after fork.
      
      Since MADV_DONTFORK also seems to be used to allow very large programs
      to fork in systems with strict memory overcommit restrictions, changing
      the semantics of MADV_DONTFORK might break existing programs.
      
      MADV_WIPEONFORK only works on private, anonymous VMAs.
      
      The use case is libraries that store or cache information, and want to
      know that they need to regenerate it in the child process after fork.
      
      Examples of this would be:
       - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
         check, which is too slow without a PID cache)
       - PKCS#11 API reinitialization check (mandated by specification)
       - glibc's upcoming PRNG (reseed after fork)
       - OpenSSL PRNG (reseed after fork)
      
      The security benefits of a forking server having a re-inialized PRNG in
      every child process are pretty obvious.  However, due to libraries
      having all kinds of internal state, and programs getting compiled with
      many different versions of each library, it is unreasonable to expect
      calling programs to re-initialize everything manually after fork.
      
      A further complication is the proliferation of clone flags, programs
      bypassing glibc's functions to call clone directly, and programs calling
      unshare, causing the glibc pthread_atfork hook to not get called.
      
      It would be better to have the kernel take care of this automatically.
      
      The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
      MADV_WIPEONFORK.
      
      This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
      
          https://man.openbsd.org/minherit.2
      
      [akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
      Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.com
      
      
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Reported-by: default avatarFlorian Weimer <fweimer@redhat.com>
      Reported-by: default avatarColm MacCártaigh <colm@allcosts.net>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Cc: <linux-api@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d2cd9ede
    • Mike Kravetz's avatar
      mm: arch: consolidate mmap hugetlb size encodings · aafd4562
      Mike Kravetz authored
      A non-default huge page size can be encoded in the flags argument of the
      mmap system call.  The definitions for these encodings are in arch
      specific header files.  However, all architectures use the same values.
      
      Consolidate all the definitions in the primary user header file
      (uapi/linux/mman.h).  Include definitions for all known huge page sizes.
      Use the generic encoding definitions in hugetlb_encode.h as the basis
      for these definitions.
      
      Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.com
      
      
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aafd4562
  21. Aug 22, 2017
  22. Aug 04, 2017
  23. Jul 17, 2017
  24. Jul 11, 2017
    • Masahiro Yamada's avatar
      parisc: move generic-y of exported headers to uapi/asm/Kbuild · bd78acad
      Masahiro Yamada authored
      
      
      Since commit fcc8487d ("uapi: export all headers under uapi
      directories"), all (and only) headers under uapi directories are
      exported, but asm-generic wrappers are still exceptions.
      
      To complete de-coupling the uapi from kernel headers, move generic-y
      of exported headers to uapi/asm/Kbuild.
      
      With this change, "make headers_install" will just need to parse
      uapi/asm/Kbuild to build up exported headers.
      
      Also, move "generic-y += kprobes.h" up in order to keep the entries
      sorted.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      bd78acad
  25. Jun 21, 2017
    • David Rheinsberg's avatar
      net: introduce SO_PEERGROUPS getsockopt · 28b5ba2a
      David Rheinsberg authored
      
      
      This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
      retrieve the auxiliary groups of the remote peer. It is designed to
      naturally extend SO_PEERCRED. That is, the underlying data is from the
      same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
      is, if the provided buffer is too small, ERANGE is returned and @optlen
      is updated. Otherwise, the information is copied, @optlen is set to the
      actual size, and 0 is returned.
      
      While SO_PEERCRED (and thus `struct ucred') already returns the primary
      group, it lacks the auxiliary group vector. However, nearly all access
      controls (including kernel side VFS and SYSVIPC, but also user-space
      polkit, DBus, ...) consider the entire set of groups, rather than just
      the primary group. But this is currently not possible with pure
      SO_PEERCRED. Instead, user-space has to work around this and query the
      system database for the auxiliary groups of a UID retrieved via
      SO_PEERCRED.
      
      Unfortunately, there is no race-free way to query the auxiliary groups
      of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
      solution is to use getgrouplist(3p), which itself falls back to NSS and
      whatever is configured in nsswitch.conf(3). This effectively checks
      which groups we *would* assign to the user if it logged in *now*. On
      normal systems it is as easy as reading /etc/group, but with NSS it can
      resort to quering network databases (eg., LDAP), using IPC or network
      communication.
      
      Long story short: Whenever we want to use auxiliary groups for access
      checks on IPC, we need further IPC to talk to the user/group databases,
      rather than just relying on SO_PEERCRED and the incoming socket. This
      is unfortunate, and might even result in dead-locks if the database
      query uses the same IPC as the original request.
      
      So far, those recursions / dead-locks have been avoided by using
      primitive IPC for all crucial NSS modules. However, we want to avoid
      re-inventing the wheel for each NSS module that might be involved in
      user/group queries. Hence, we would preferably make DBus (and other IPC
      that supports access-management based on groups) work without resorting
      to the user/group database. This new SO_PEERGROUPS ioctl would allow us
      to make dbus-daemon work without ever calling into NSS.
      
      Cc: Michal Sekletar <msekleta@redhat.com>
      Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
      Reviewed-by: default avatarTom Gundersen <teg@jklm.no>
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28b5ba2a
  26. Jun 09, 2017
    • Aleksa Sarai's avatar
      tty: add TIOCGPTPEER ioctl · 54ebbfb1
      Aleksa Sarai authored
      
      
      When opening the slave end of a PTY, it is not possible for userspace to
      safely ensure that /dev/pts/$num is actually a slave (in cases where the
      mount namespace in which devpts was mounted is controlled by an
      untrusted process). In addition, there are several unresolvable
      race conditions if userspace were to attempt to detect attacks through
      stat(2) and other similar methods [in addition it is not clear how
      userspace could detect attacks involving FUSE].
      
      Resolve this by providing an interface for userpace to safely open the
      "peer" end of a PTY file descriptor by using the dentry cached by
      devpts. Since it is not possible to have an open master PTY without
      having its slave exposed in /dev/pts this interface is safe. This
      interface currently does not provide a way to get the master pty (since
      it is not clear whether such an interface is safe or even useful).
      
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Valentin Rothberg <vrothberg@suse.com>
      Signed-off-by: default avatarAleksa Sarai <asarai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54ebbfb1
Loading