- Jun 12, 2015
-
-
Chuck Lever authored
WARN during transport destruction if ib_dealloc_pd() fails. This is a sign that xprtrdma orphaned one or more RDMA API objects at some point, which can pin lower layer kernel modules and cause shutdown to hang. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Reviewed-by:
Steve Wise <swise@opengridcomputing.com> Reviewed-by:
Sagi Grimberg <sagig@mellanox.com> Reviewed-by:
Devesh Sharma <devesh.sharma@avagotech.com> Tested-By:
Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by:
Doug Ledford <dledford@redhat.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- May 04, 2015
-
-
Scott Mayhew authored
In an environment where the KDC is running Active Directory, the exported composite name field returned in the context could be large enough to span a page boundary. Attaching a scratch buffer to the decoding xdr_stream helps deal with those cases. The case where we saw this was actually due to behavior that's been fixed in newer gss-proxy versions, but we're fixing it here too. Signed-off-by:
Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Reviewed-by:
Simo Sorce <simo@redhat.com> Signed-off-by:
J. Bruce Fields <bfields@redhat.com>
-
- Apr 23, 2015
-
-
Jeff Layton authored
v2: gracefully handle the case where some dentry pointers end up NULL and be more dilligent about zeroing out dentry pointers We currently have a problem that SELinux policy is being enforced when creating debugfs files. If a debugfs file is created as a side effect of doing some syscall, then that creation can fail if the SELinux policy for that process prevents it. This seems wrong. We don't do that for files under /proc, for instance, so Bruce has proposed a patch to fix that. While discussing that patch however, Greg K.H. stated: "No kernel code should care / fail if a debugfs function fails, so please fix up the sunrpc code first." This patch converts all of the sunrpc debugfs setup code to be void return functins, and the callers to not look for errors from those functions. This should allow rpc_clnt and rpc_xprt creation to work, even if the kernel fails to create debugfs files for some reason. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by:
"J. Bruce Fields" <bfields@fieldses.org> Signed-off-by:
Jeff Layton <jeff.layton@primarydata.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Apr 15, 2015
-
-
Rasmus Villemoes authored
The current semantics of string_escape_mem are inadequate for one of its current users, vsnprintf(). If that is to honour its contract, it must know how much space would be needed for the entire escaped buffer, and string_escape_mem provides no way of obtaining that (short of allocating a large enough buffer (~4 times input string) to let it play with, and that's definitely a big no-no inside vsnprintf). So change the semantics for string_escape_mem to be more snprintf-like: Return the size of the output that would be generated if the destination buffer was big enough, but of course still only write to the part of dst it is allowed to, and (contrary to snprintf) don't do '\0'-termination. It is then up to the caller to detect whether output was truncated and to append a '\0' if desired. Also, we must output partial escape sequences, otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever they previously contained. This also fixes a bug in the escaped_string() helper function, which used to unconditionally pass a length of "end-buf" to string_escape_mem(); since the latter doesn't check osz for being insanely large, it would happily write to dst. For example, kasprintf(GFP_KERNEL, "something and then %pE", ...); is an easy way to trigger an oops. In test-string_helpers.c, the -ENOMEM test is replaced with testing for getting the expected return value even if the buffer is too small. We also ensure that nothing is written (by relying on a NULL pointer deref) if the output size is 0 by passing NULL - this has to work for kasprintf("%pE") to work. In net/sunrpc/cache.c, I think qword_add still has the same semantics. Someone should definitely double-check this. In fs/proc/array.c, I made the minimum possible change, but longer-term it should stop poking around in seq_file internals. [andriy.shevchenko@linux.intel.com: simplify qword_add] [andriy.shevchenko@linux.intel.com: add missed curly braces] Signed-off-by:
Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Iulia Manda authored
There are a lot of embedded systems that run most or all of their functionality in init, running as root:root. For these systems, supporting multiple users is not necessary. This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for non-root users, non-root groups, and capabilities optional. It is enabled under CONFIG_EXPERT menu. When this symbol is not defined, UID and GID are zero in any possible case and processes always have all capabilities. The following syscalls are compiled out: setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset. Also, groups.c is compiled out completely. In kernel/capability.c, capable function was moved in order to avoid adding two ifdef blocks. This change saves about 25 KB on a defconfig build. The most minimal kernels have total text sizes in the high hundreds of kB rather than low MB. (The 25k goes down a bit with allnoconfig, but not that much. The kernel was booted in Qemu. All the common functionalities work. Adding users/groups is not possible, failing with -ENOSYS. Bloat-o-meter output: add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by:
Iulia Manda <iulia.manda21@gmail.com> Reviewed-by:
Josh Triplett <josh@joshtriplett.org> Acked-by:
Geert Uytterhoeven <geert@linux-m68k.org> Tested-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Howells authored
socket inodes and sunrpc filesystems - inodes owned by that code Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Apr 11, 2015
-
-
Al Viro authored
it's equal to iov_iter_count(&msg->msg_iter) in all cases Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Mar 31, 2015
-
-
Jeff Layton authored
We currently have a problem that SELinux policy is being enforced when creating debugfs files. If a debugfs file is created as a side effect of doing some syscall, then that creation can fail if the SELinux policy for that process prevents it. This seems wrong. We don't do that for files under /proc, for instance, so Bruce has proposed a patch to fix that. While discussing that patch however, Greg K.H. stated: "No kernel code should care / fail if a debugfs function fails, so please fix up the sunrpc code first." This patch converts all of the sunrpc debugfs setup code to be void return functins, and the callers to not look for errors from those functions. This should allow rpc_clnt and rpc_xprt creation to work, even if the kernel fails to create debugfs files for some reason. Symptoms were failing krb5 mounts on systems using gss-proxy and selinux. Fixes: 388f0c77 "sunrpc: add a debugfs rpc_xprt directory..." Cc: stable@vger.kernel.org Signed-off-by:
Jeff Layton <jeff.layton@primarydata.com> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
J. Bruce Fields <bfields@redhat.com>
-
Chuck Lever authored
These functions are called in a loop for each page transferred via RDMA READ or WRITE. Extract loop invariants and inline them to reduce CPU overhead. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Allow each memory registration mode to plug in a callout that handles the completion of a memory registration operation. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Reviewed-by:
Sagi Grimberg <sagig@mellanox.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
The open op determines the size of various transport data structures based on device capabilities and memory registration mode. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Memory Region objects associated with a transport instance are destroyed before the instance is shutdown and destroyed. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Reviewed-by:
Sagi Grimberg <sagig@mellanox.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
This method is invoked when a transport instance is about to be reconnected. Each Memory Region object is reset to its initial state. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Reviewed-by:
Sagi Grimberg <sagig@mellanox.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
This method is used when setting up a new transport instance to create a pool of Memory Region objects that will be used to register memory during operation. Memory Regions are not needed for "physical" registration, since ->prepare and ->release are no-ops for that mode. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Reviewed-by:
Sagi Grimberg <sagig@mellanox.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
There is very little common processing among the different external memory deregistration functions. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
There is very little common processing among the different external memory registration functions. Have rpcrdma_create_chunks() call the registration method directly. This removes a stack frame and a switch statement from the external registration path. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
The max_payload computation is generalized to ensure that the payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of data segments that can be transmitted in an inline buffer. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Reviewed-by:
Sagi Grimberg <sagig@mellanox.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Instead of employing switch() statements, let's use the typical Linux kernel idiom for handling behavioral variation: virtual functions. Start by defining a vector of operations for each supported memory registration mode, and by adding a source file for each mode. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Reviewed-by:
Sagi Grimberg <sagig@mellanox.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
If a provider advertizes a zero max_fast_reg_page_list_len, FRWR depth detection loops forever. Instead of just failing the mount, try other memory registration modes. Fixes: 0fc6c4e7 ("xprtrdma: mind the device's max fast . . .") Reported-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
The RPC/RDMA transport's FRWR registration logic registers whole pages. This means areas in the first and last pages that are not involved in the RDMA I/O are needlessly exposed to the server. Buffered I/O is typically page-aligned, so not a problem there. But for direct I/O, which can be byte-aligned, and for reply chunks, which are nearly always smaller than a page, the transport could expose memory outside the I/O buffer. FRWR allows byte-aligned memory registration, so let's use it as it was intended. Reported-by:
Sagi Grimberg <sagig@mellanox.com> Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Commit 6ab59945 ("xprtrdma: Update rkeys after transport reconnect" added logic in the ->send_request path to update the chunk list when an RPC/RDMA request is retransmitted. Note that rpc_xdr_encode() resets and re-encodes the entire RPC send buffer for each retransmit of an RPC. The RPC send buffer is not preserved from the previous transmission of an RPC. Revert 6ab59945, and instead, just force each request to be fully marshaled every time through ->send_request. This should preserve the fix from 6ab59945, while also performing pullup during retransmits. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Acked-by:
Sagi Grimberg <sagig@mellanox.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Reviewed-by:
Sagi Grimberg <sagig@mellanox.com> Tested-by:
Devesh Sharma <Devesh.Sharma@Emulex.Com> Tested-by:
Meghana Cheripady <Meghana.Cheripady@Emulex.Com> Tested-by:
Veeresh U. Kokatnur <veereshuk@chelsio.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- Mar 27, 2015
-
-
Trond Myklebust authored
If the task needs to give up the socket lock in order to allow a reconnect to occur, then it must also clear the 'rq_bytes_sent' field so that when it retransmits, it knows to start from the beginning. Fixes: 718ba5b8 ("SUNRPC: Add helpers to prevent socket create from racing") Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Mar 13, 2015
-
-
Nicholas Mc Guire authored
fix build-warning introduced by commit: f0eede10 ("SUNRPC: use jiffies_to_msecs for converting jiffies") which did not fixup the format properly (my bad). Signed-off-by:
Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Mar 12, 2015
-
-
Nicholas Mc Guire authored
Use jiffies_to_msecs for converting jiffies as it handles all of the corner cases reliably and also helps readability. Signed-off-by:
Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Mar 08, 2015
-
-
Al Viro authored
POLL_OUT isn't what callers of ->poll() are expecting to see; it's actually __SI_POLL | 2 and it's a siginfo code, not a poll bitmap bit... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Cc: Bruce Fields <bfields@fieldses.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 06, 2015
-
-
Masanari Iida authored
This patch fix spelling typo in printk messages. Signed-off-by:
Masanari Iida <standby24x7@gmail.com> Acked-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Jiri Kosina <jkosina@suse.cz>
-
- Feb 26, 2015
-
-
Dan Carpenter authored
If we call groups_alloc() with invalid values then it's might lead to memory corruption. For example, with a negative value then we might not allocate enough for sizeof(struct group_info). (We're doing this in the caller for consistency with other callers of groups_alloc(). The other alternative might be to move the check out of all the callers into groups_alloc().) Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by:
Simo Sorce <simo@redhat.com> Signed-off-by:
J. Bruce Fields <bfields@redhat.com>
-
- Feb 23, 2015
-
-
Chuck Lever authored
Dan Carpenter's static checker pointed out: net/sunrpc/xprtrdma/rpc_rdma.c:879 rpcrdma_reply_handler() warn: can 'credits' be negative? "credits" is defined as an int. The credits value comes from the server as a 32-bit unsigned integer. A malicious or broken server can plant a large unsigned integer in that field which would result in an underflow in the following logic, potentially triggering a deadlock of the mount point by blocking the client from issuing more RPC requests. net/sunrpc/xprtrdma/rpc_rdma.c: 876 credits = be32_to_cpu(headerp->rm_credit); 877 if (credits == 0) 878 credits = 1; /* don't deadlock */ 879 else if (credits > r_xprt->rx_buf.rb_max_requests) 880 credits = r_xprt->rx_buf.rb_max_requests; 881 882 cwnd = xprt->cwnd; 883 xprt->cwnd = credits << RPC_CWNDSHIFT; 884 if (xprt->cwnd > cwnd) 885 xprt_release_rqst_cong(rqst->rq_task); Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Fixes: eba8ff66 ("xprtrdma: Move credit update to RPC . . .") Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- Feb 17, 2015
-
-
David Ramos authored
Our UC-KLEE tool found a kernel memory leak of 512 bytes (on x86_64) for each call to gssp_accept_sec_context_upcall() (net/sunrpc/auth_gss/gss_rpc_upcall.c). Since it appears that this call can be triggered by remote connections (at least, from a cursory a glance at the call chain), it may be exploitable to cause kernel memory exhaustion. We found the bug in kernel 3.16.3, but it appears to date back to commit 9dfd87da (2013-08-20). The gssp_accept_sec_context_upcall() function performs a pair of calls to gssp_alloc_receive_pages() and gssp_free_receive_pages(). The first allocates memory for arg->pages. The second then frees the pages pointed to by the arg->pages array, but not the array itself. Reported-by:
David A. Ramos <daramos@stanford.edu> Fixes: 9dfd87da ("rpc: fix huge kmalloc's in gss-proxy”) Signed-off-by:
David A. Ramos <daramos@stanford.edu> Signed-off-by:
J. Bruce Fields <bfields@redhat.com>
-
- Feb 13, 2015
-
-
Chuck Lever authored
Other code that accesses rq_bc_pa_list holds xprt->bc_pa_lock. xprt_complete_bc_request() should do the same. Fixes: 2ea24497 ("SUNRPC: RPC callbacks may be split . . .") Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Feb 10, 2015
-
-
Trond Myklebust authored
xs_tcp_close() is now just a call to xs_tcp_shutdown(), so remove it, and replace the entry in xs_tcp_ops. Suggested-by:
Anna Schumaker <anna.schumaker@netapp.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Feb 09, 2015
-
-
Trond Myklebust authored
Yes, kernel_setsockopt() hates you for using a char argument. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Now that the linger code is gone, the xs_tcp_fin_timeout variable has no real function. Keep it for now, since it is part of the /proc interface, but only define it if that /proc interface is enabled. Suggested-by:
Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
If the connection reset is due to an active call on our side, then the state change is sometimes not reported. Catch those instances using xs_error_report() instead. Also remove the xs_tcp_shutdown() call in xs_tcp_send_request() as the change in behaviour makes it redundant. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Use of socket shutdown() means that we monitor the shutdown process through the xs_tcp_state_change() callback, so it is preferable to a full close in all cases unless we're destroying the transport. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
The previous behaviour left the connection half-open in order to try to scrape the last replies from the socket. Now that we have more reliable reconnection, change the behaviour to close down the socket faster. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Now that we no longer use the partial shutdown code when closing the socket, we no longer need to worry about the TCP linger2 state. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-