Skip to content
  1. Jan 16, 2019
  2. Dec 18, 2018
  3. Dec 14, 2018
  4. Aug 07, 2018
    • Cong Wang's avatar
      vsock: split dwork to avoid reinitializations · 455f05ec
      Cong Wang authored
      
      
      syzbot reported that we reinitialize an active delayed
      work in vsock_stream_connect():
      
      	ODEBUG: init active (active state 0) object type: timer_list hint:
      	delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
      	WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
      	debug_print_object+0x16a/0x210 lib/debugobjects.c:326
      
      The pattern is apparently wrong, we should only initialize
      the dealyed work once and could repeatly schedule it. So we
      have to move out the initializations to allocation side.
      And to avoid confusion, we can split the shared dwork
      into two, instead of re-using the same one.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reported-by: default avatar <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
      Cc: Andy king <acking@vmware.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      455f05ec
  5. Jun 28, 2018
    • Linus Torvalds's avatar
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds authored
      
      
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  6. Jun 22, 2018
  7. May 26, 2018
  8. Apr 17, 2018
    • Stefan Hajnoczi's avatar
      VSOCK: make af_vsock.ko removable again · 05e489b1
      Stefan Hajnoczi authored
      
      
      Commit c1eef220 ("vsock: always call
      vsock_init_tables()") introduced a module_init() function without a
      corresponding module_exit() function.
      
      Modules with an init function can only be removed if they also have an
      exit function.  Therefore the vsock module was considered "permanent"
      and could not be removed.
      
      This patch adds an empty module_exit() function so that "rmmod vsock"
      works.  No explicit cleanup is required because:
      
      1. Transports call vsock_core_exit() upon exit and cannot be removed
         while sockets are still alive.
      2. vsock_diag.ko does not perform any action that requires cleanup by
         vsock.ko.
      
      Fixes: c1eef220 ("vsock: always call vsock_init_tables()")
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarJorgen Hansen <jhansen@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05e489b1
  9. Feb 12, 2018
    • Denys Vlasenko's avatar
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko authored
      
      
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  10. Feb 11, 2018
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      
      
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  11. Jan 26, 2018
  12. Dec 05, 2017
  13. Nov 28, 2017
  14. Nov 27, 2017
  15. Nov 25, 2017
  16. Nov 02, 2017
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      
      
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  17. Oct 26, 2017
  18. Oct 21, 2017
    • Dexuan Cui's avatar
      hv_sock: add locking in the open/close/release code paths · b4562ca7
      Dexuan Cui authored
      
      
      Without the patch, when hvs_open_connection() hasn't completely established
      a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't
      inserted the sock into the connected queue), vsock_stream_connect() may see
      the sk_state change and return the connection to the userspace, and next
      when the userspace closes the connection quickly, hvs_release() may not see
      the connection in the connected queue; finally hvs_open_connection()
      inserts the connection into the queue, but we won't be able to purge the
      connection for ever.
      
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
      Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4562ca7
  19. Oct 06, 2017
    • Stefan Hajnoczi's avatar
      VSOCK: add sock_diag interface · 413a4317
      Stefan Hajnoczi authored
      
      
      This patch adds the sock_diag interface for querying sockets from
      userspace.  Tools like ss(8) and netstat(8) can use this interface to
      list open sockets.
      
      The userspace ABI is defined in <linux/vm_sockets_diag.h> and includes
      netlink request and response structs.  The request can query sockets
      based on their sk_state (e.g. listening sockets only) and the response
      contains socket information fields including the local/remote addresses,
      inode number, etc.
      
      This patch does not dump VMCI pending sockets because I have only tested
      the virtio transport, which does not use pending sockets.  Support can
      be added later by extending vsock_diag_dump() if needed by VMCI users.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      413a4317
    • Stefan Hajnoczi's avatar
      VSOCK: use TCP state constants for sk_state · 3b4477d2
      Stefan Hajnoczi authored
      
      
      There are two state fields: socket->state and sock->sk_state.  The
      socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
      sock->sk_state typically uses values that match TCP state constants
      (TCP_CLOSE, TCP_ESTABLISHED).  AF_VSOCK does not follow this convention
      and instead uses SS_* constants for both fields.
      
      The sk_state field will be exposed to userspace through the vsock_diag
      interface for ss(8), netstat(8), and other programs.
      
      This patch switches sk_state to TCP state constants so that the meaning
      of this field is consistent with other address families.  Not just
      AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.
      
      The following mapping was used to convert the code:
      
        SS_FREE -> TCP_CLOSE
        SS_UNCONNECTED -> TCP_CLOSE
        SS_CONNECTING -> TCP_SYN_SENT
        SS_CONNECTED -> TCP_ESTABLISHED
        SS_DISCONNECTING -> TCP_CLOSING
        VSOCK_SS_LISTEN -> TCP_LISTEN
      
      In __vsock_create() the sk_state initialization was dropped because
      sock_init_data() already initializes sk_state to TCP_CLOSE.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b4477d2
    • Stefan Hajnoczi's avatar
      VSOCK: move __vsock_in_bound/connected_table() to af_vsock.h · bf359b81
      Stefan Hajnoczi authored
      
      
      The vsock_diag.ko module will need to check socket table membership.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf359b81
    • Stefan Hajnoczi's avatar
      VSOCK: export socket tables for sock_diag interface · 44f20980
      Stefan Hajnoczi authored
      
      
      The socket table symbols need to be exported from vsock.ko so that the
      vsock_diag.ko module will be able to traverse sockets.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44f20980
  20. Sep 19, 2017
  21. Aug 28, 2017
    • Dexuan Cui's avatar
      hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK) · ae0078fc
      Dexuan Cui authored
      Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
      mechanism between the host and the guest. It uses VMBus ringbuffer as the
      transportation layer.
      
      With hv_sock, applications between the host (Windows 10, Windows Server
      2016 or newer) and the guest can talk with each other using the traditional
      socket APIs.
      
      More info about Hyper-V Sockets is available here:
      
      "Make your own integration services":
      https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service
      
      
      
      The patch implements the necessary support in Linux guest by introducing a new
      vsock transport for AF_VSOCK.
      
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Andy King <acking@vmware.com>
      Cc: Dmitry Torokhov <dtor@vmware.com>
      Cc: George Zhang <georgezhang@vmware.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: Reilly Grant <grantr@vmware.com>
      Cc: Asias He <asias@redhat.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
      Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae0078fc
  22. Jun 20, 2017
  23. Jun 16, 2017
    • Johannes Berg's avatar
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg authored
      
      
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4df864c1
    • Johannes Berg's avatar
      networking: introduce and use skb_put_data() · 59ae1d12
      Johannes Berg authored
      
      
      A common pattern with skb_put() is to just want to memcpy()
      some data into the new space, introduce skb_put_data() for
      this.
      
      An spatch similar to the one for skb_put_zero() converts many
      of the places using it:
      
          @@
          identifier p, p2;
          expression len, skb, data;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, len);
          |
          -memcpy(p, data, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb, data;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, sizeof(*p));
          |
          -memcpy(p, data, sizeof(*p));
          )
      
          @@
          expression skb, len, data;
          @@
          -memcpy(skb_put(skb, len), data, len);
          +skb_put_data(skb, data, len);
      
      (again, manually post-processed to retain some comments)
      
      Reviewed-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59ae1d12
  24. May 22, 2017
  25. May 02, 2017
  26. Apr 24, 2017
  27. Mar 30, 2017
  28. Mar 21, 2017
  29. Mar 10, 2017
    • David Howells's avatar
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells authored
      
      
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdfbabfb
  30. Mar 02, 2017
  31. Feb 27, 2017
  32. Dec 17, 2016
  33. Dec 15, 2016
Loading