Skip to content
  • David Howells's avatar
    c7e86acf
    rxrpc: Fix lockup due to no error backoff after ack transmit error · c7e86acf
    David Howells authored
    
    
    If the network becomes (partially) unavailable, say by disabling IPv6, the
    background ACK transmission routine can get itself into a tizzy by
    proposing immediate ACK retransmission.  Since we're in the call event
    processor, that happens immediately without returning to the workqueue
    manager.
    
    The condition should clear after a while when either the network comes back
    or the call times out.
    
    Fix this by:
    
     (1) When re-proposing an ACK on failed Tx, don't schedule it immediately.
         This will allow a certain amount of time to elapse before we try
         again.
    
     (2) Enforce a return to the workqueue manager after a certain number of
         iterations of the call processing loop.
    
     (3) Add a backoff delay that increases the delay on deferred ACKs by a
         jiffy per failed transmission to a limit of HZ.  The backoff delay is
         cleared on a successful return from kernel_sendmsg().
    
     (4) Cancel calls immediately if the opening sendmsg fails.  The layer
         above can arrange retransmission or rotate to another server.
    
    Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    c7e86acf
    rxrpc: Fix lockup due to no error backoff after ack transmit error
    David Howells authored
    
    
    If the network becomes (partially) unavailable, say by disabling IPv6, the
    background ACK transmission routine can get itself into a tizzy by
    proposing immediate ACK retransmission.  Since we're in the call event
    processor, that happens immediately without returning to the workqueue
    manager.
    
    The condition should clear after a while when either the network comes back
    or the call times out.
    
    Fix this by:
    
     (1) When re-proposing an ACK on failed Tx, don't schedule it immediately.
         This will allow a certain amount of time to elapse before we try
         again.
    
     (2) Enforce a return to the workqueue manager after a certain number of
         iterations of the call processing loop.
    
     (3) Add a backoff delay that increases the delay on deferred ACKs by a
         jiffy per failed transmission to a limit of HZ.  The backoff delay is
         cleared on a successful return from kernel_sendmsg().
    
     (4) Cancel calls immediately if the opening sendmsg fails.  The layer
         above can arrange retransmission or rotate to another server.
    
    Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
    Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
Loading