xprtrdma: Fix occasional transport deadlock
Under high I/O workloads, I've noticed that an RPC/RDMA transport occasionally deadlocks (IOPS goes to zero, and doesn't recover). Diagnosis shows that the sendctx queue is empty, but when sendctxs are returned to the queue, the xprt_write_space wake-up never occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy. I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented via an atomic bit. Just one of those is sufficient. Removing EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock un-reproducible. Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and is therefore removed. Unfortunately this patch does not apply cleanly to stable. If needed, someone will have to port it and test it. Fixes: 2fad6592 ("xprtrdma: Wait on empty sendctx queue") Signed-off-by:Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
Showing
- include/trace/events/rpcrdma.h 27 additions, 0 deletionsinclude/trace/events/rpcrdma.h
- net/sunrpc/xprtrdma/frwr_ops.c 5 additions, 1 deletionnet/sunrpc/xprtrdma/frwr_ops.c
- net/sunrpc/xprtrdma/rpc_rdma.c 12 additions, 14 deletionsnet/sunrpc/xprtrdma/rpc_rdma.c
- net/sunrpc/xprtrdma/verbs.c 3 additions, 8 deletionsnet/sunrpc/xprtrdma/verbs.c
- net/sunrpc/xprtrdma/xprt_rdma.h 0 additions, 6 deletionsnet/sunrpc/xprtrdma/xprt_rdma.h
Loading
Please register or sign in to comment