- Aug 20, 2019
-
-
Chuck Lever authored
Clean up. There is only one remaining rpcrdma_mr_put call site, and it can be directly replaced with unmap_and_put because mr->mr_dir is set to DMA_NONE just before the call. Now all the call sites do a DMA unmap, and we can just rename mr_unmap_and_put to mr_put, which nicely matches mr_get. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: rpcrdma_mr_pop call sites check if the list is empty first. Let's replace the list_empty with less costly logic. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: There are other "all" list heads. For code clarity distinguish this one as for use only for MRs by renaming it. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Make the field name the same for all trace points that handle pointers to struct rpcrdma_rep. That makes it easy to grep for matching rep points in trace output. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Although I haven't seen any performance results that justify it, I've received several complaints that NFS/RDMA no longer supports a maximum rsize and wsize of 1MB. These days it is somewhat smaller. To simplify the logic that determines whether a chunk list is necessary, the implementation uses a fixed maximum size of the transport header. Currently that maximum size is 256 bytes, one quarter of the default inline threshold size for RPC/RDMA v1. Since commit a7886849 ("xprtrdma: Reduce max_frwr_depth"), the size of chunks is also smaller to take advantage of inline page lists in device internal MR data structures. The combination of these two design choices has reduced the maximum NFS rsize and wsize that can be used for most RNIC/HCAs. Increasing the maximum transport header size and the maximum number of RDMA segments it can contain increases the negotiated maximum rsize/wsize on common RNIC/HCAs. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- Jul 09, 2019
-
-
Chuck Lever authored
Clean up. There is only one remaining function, rpcrdma_buffer_put(), that uses this field. Its caller can supply a pointer to the correct rpcrdma_buffer, enabling the removal of an 8-byte pointer field from a frequently-allocated shared data structure. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
rb_lock is contended between rpcrdma_buffer_create, rpcrdma_buffer_put, and rpcrdma_post_recvs. Commit e340c2d6 ("xprtrdma: Reduce the doorbell rate (Receive)") causes rpcrdma_post_recvs to take the rb_lock repeatedly when it determines more Receives are needed. Streamline this code path so it takes the lock just once in most cases to build the Receive chain that is about to be posted. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up. Commit 7c8d9e7c ("xprtrdma: Move Receive posting to Receive handler") reduced the number of rpcrdma_rep_create call sites to one. After that commit, the backchannel code no longer invokes it. Therefore the free list logic added by commit d698c4a0 ("xprtrdma: Fix backchannel allocation of extra rpcrdma_reps") is no longer necessary, and in fact adds some extra overhead that we can do without. Simply post any newly created reps. They will get added back to the rb_recv_bufs list when they subsequently complete. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Eliminate a context switch in the path that handles RPC wake-ups when a Receive completion has to wait for a Send completion. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Since commit ba69cd12 ("xprtrdma: Remove support for FMR memory registration"), FRWR is the only supported memory registration mode. We can take advantage of the asynchronous nature of FRWR's LOCAL_INV Work Requests to get rid of the completion wait by having the LOCAL_INV completion handler take care of DMA unmapping MRs and waking the upper layer RPC waiter. This eliminates two context switches when local invalidation is necessary. As a side benefit, we will no longer need the per-xprt deferred completion work queue. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Under high I/O workloads, I've noticed that an RPC/RDMA transport occasionally deadlocks (IOPS goes to zero, and doesn't recover). Diagnosis shows that the sendctx queue is empty, but when sendctxs are returned to the queue, the xprt_write_space wake-up never occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy. I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented via an atomic bit. Just one of those is sufficient. Removing EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock un-reproducible. Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and is therefore removed. Unfortunately this patch does not apply cleanly to stable. If needed, someone will have to port it and test it. Fixes: 2fad6592 ("xprtrdma: Wait on empty sendctx queue") Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- Jul 02, 2019
-
-
Chuck Lever authored
Dereference wr->next /before/ the memory backing wr has been released. This issue was found by code inspection. It is not expected to be a significant problem because it is in an error path that is almost never executed. Fixes: 7c8d9e7c ("xprtrdma: Move Receive posting to ... ") Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- May 28, 2019
-
-
Gustavo A. R. Silva authored
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- Apr 25, 2019
-
-
Chuck Lever authored
Commit e1ede312 ("xprtrdma: Fix helper that drains the transport") replaced the ib_drain_qp() call, so update documenting comments to reflect current operation. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: rely on the trace points instead. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up. Move the remaining field in rpcrdma_create_data_internal so the structure can be removed. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up. The inline settings are actually a characteristic of the endpoint, and not related to the device. They are also modified after the transport instance is created, so they do not belong in the cdata structure either. Lastly, let's use names that are more natural to RDMA than to NFS: inline_write -> inline_send and inline_read -> inline_recv. The /proc files retain their names to avoid breaking user space. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up. Since commit 54cbd6b0 ("xprtrdma: Delay DMA mapping Send and Receive buffers"), a pointer to the device is now saved in each regbuf when it is DMA mapped. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Instead of using a fixed number, allow the amount of Send completion batching to vary based on the client's maximum credit limit. - A larger default gives a small boost to IOPS throughput - Reducing it based on max_requests gives a safe result when the max credit limit is cranked down (eg. when the device has a small max_qp_wr). Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Minor clean-ups I've stumbled on since sendctx was merged last year. In particular, making Send completion processing more efficient appears to have a measurable impact on IOPS throughput. Note: test_and_clear_bit() returns a value, thus an explicit memory barrier is not necessary. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
For code legibility, clean up the function names to be consistent with the pattern: "rpcrdma" _ object-type _ action Also rpcrdma_regbuf_alloc and rpcrdma_regbuf_free no longer have any callers outside of verbs.c, and can thus be made static. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up by providing an API to do this common task. At this point, the difference between rpcrdma_get_sendbuf and rpcrdma_get_recvbuf has become tiny. These can be collapsed into a single helper. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Allocating an rpcrdma_req's regbufs at xprt create time enables a pair of micro-optimizations: First, if these regbufs are always there, we can eliminate two conditional branches from the hot xprt_rdma_allocate path. Second, by allocating a 1KB buffer, it places a lower bound on the size of these buffers, without adding yet another conditional branch. The lower bound reduces the number of hardway re- allocations. In fact, for some workloads it completely eliminates hardway allocations. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Allocate the struct rpcrdma_regbuf separately from the I/O buffer to better guarantee the alignment of the I/O buffer and eliminate the wasted space between the rpcrdma_regbuf metadata and the buffer itself. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
For code legibility, clean up the function names to be consistent with the pattern: "rpcrdma" _ object-type _ action Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Eventually, I'd like to invoke rpcrdma_create_req() during the call_reserve step. Memory allocation there probably needs to use GFP_NOIO. Therefore a set of GFP flags needs to be passed in. As an additional clean up, just return a pointer or NULL, because the only error return code here is -ENOMEM. Lastly, clean up the function names to be consistent with the pattern: "rpcrdma" _ object-type _ action Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- Apr 11, 2019
-
-
Chuck Lever authored
We want to drain only the RQ first. Otherwise the transport can deadlock on ->close if there are outstanding Send completions. Fixes: 6d2d0ee2 ("xprtrdma: Replace rpcrdma_receive_wq ... ") Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # v5.0+ Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
- Feb 13, 2019
-
-
Chuck Lever authored
Post RECV WRs in batches to reduce the hardware doorbell rate per transport. This helps the RPC-over-RDMA client scale better in number of transports. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- Feb 12, 2019
-
-
Nicolas Morey-Chaisemartin authored
Make sure the device has at least 2 completion vectors before allocating to compvec#1 Fixes: a4699f56 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode) Signed-off-by:
Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com> Reviewed-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- Jan 08, 2019
-
-
Dan Carpenter authored
The clean up is handled by the caller, rpcrdma_buffer_create(), so this call to rpcrdma_sendctxs_destroy() leads to a double free. Fixes: ae72950a ("xprtrdma: Add data structure to manage RDMA Send arguments") Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Dan Carpenter authored
This should return -ENOMEM if __alloc_workqueue_key() fails, but it returns success. Fixes: 6d2d0ee2 ("xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueue") Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- Jan 02, 2019
-
-
Chuck Lever authored
Make a note of the function's dependency on an earlier ib_drain_qp. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Since commit 7c8d9e7c ("xprtrdma: Move Receive posting to Receive handler"), rpcrdma_ep_post is no longer responsible for posting Receive buffers. Update the documenting comment to reflect this change. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
These are rare, but can be helpful at tracking down DMAR and other problems. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: Remove dprintk() call sites that report rare or impossible errors. Leave a few that display high-value low noise status information. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: There's little chance of contention between the use of rb_lock and rb_reqslock, so merge the two. This avoids having to take both in some (possibly future) cases. Transport tear-down is already serialized, thus there is no need for locking at all when destroying rpcrdma_reqs. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: Now that there is only FRWR, there is no need for a memory registration switch. The indirect calls to the memreg operations can be replaced with faster direct calls. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
FMR is not supported on most recent RDMA devices. It is also less secure than FRWR because an FMR memory registration can expose adjacent bytes to remote reading or writing. As discussed during the RDMA BoF at LPC 2018, it is time to remove support for FMR in the NFS/RDMA client stack. Note that NFS/RDMA server-side uses either local memory registration or FRWR. FMR is not used. There are a few Infiniband/RoCE devices in the kernel tree that do not appear to support MEM_MGT_EXTENSIONS (FRWR), and therefore will not support client-side NFS/RDMA after this patch. These are: - mthca - qib - hns (RoCE) Users of these devices can use NFS/TCP on IPoIB instead. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Transport disconnect processing does a "wake pending tasks" at various points. Suppose an RPC Reply is being processed. The RPC task that Reply goes with is waiting on the pending queue. If a disconnect wake-up happens before reply processing is done, that reply, even if it is good, is thrown away, and the RPC has to be sent again. This window apparently does not exist for socket transports because there is a lock held while a reply is being received which prevents the wake-up call until after reply processing is done. To resolve this, all RPC replies being processed on an RPC-over-RDMA transport have to complete before pending tasks are awoken due to a transport disconnect. Callers that already hold the transport write lock may invoke ->ops->close directly. Others use a generic helper that schedules a close when the write lock can be taken safely. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
After thinking about this more, and auditing other kernel ULP imple- mentations, I believe that a DISCONNECT cm_event will occur after a fatal QP event. If that's the case, there's no need for an explicit disconnect in the QP event handler. Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-