Skip to content
  1. Apr 26, 2021
  2. Apr 14, 2021
    • Chris Dion's avatar
      SUNRPC: Handle major timeout in xprt_adjust_timeout() · 09252177
      Chris Dion authored
      
      
      Currently if a major timeout value is reached, but the minor value has
      not been reached, an ETIMEOUT will not be sent back to the caller.
      This can occur if the v4 server is not responding to requests and
      retrans is configured larger than the default of two.
      
      For example, A TCP mount with a configured timeout value of 50 and a
      retransmission count of 3 to a v4 server which is not responding:
      
      1. Initial value and increment set to 5s, maxval set to 20s, retries at 3
      2. Major timeout is set to 20s, minor timeout set to 5s initially
      3. xport_adjust_timeout() is called after 5s, retry with 10s timeout,
         minor timeout is bumped to 10s
      4. And again after another 10s, 15s total time with minor timeout set
         to 15s
      5. After 20s total time xport_adjust_timeout is called as major timeout is
         reached, but skipped because the minor timeout is not reached
             - After this time the cpu spins continually calling
             	 xport_adjust_timeout() and returning 0 for 10 seconds.
      	 As seen on perf sched:
         	 39243.913182 [0005]  mount.nfs[3794] 4607.938      0.017   9746.863
      6. This continues until the 15s minor timeout condition is reached (in
         this case for 10 seconds). After which the ETIMEOUT is processed
         back to the caller, the cpu spinning stops, and normal operations
         continue
      
      Fixes: 7de62bc0 ("SUNRPC dont update timeout value on connection reset")
      Signed-off-by: default avatarChris Dion <Christopher.Dion@dell.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      09252177
    • Chuck Lever's avatar
      SUNRPC: Remove trace_xprt_transmit_queued · 6cf23783
      Chuck Lever authored
      
      
      This tracepoint can crash when dereferencing snd_task because
      when some transports connect, they put a cookie in that field
      instead of a pointer to an rpc_task.
      
      BUG: KASAN: use-after-free in trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
      Read of size 2 at addr ffff8881a83bd3a0 by task git/331872
      
      CPU: 11 PID: 331872 Comm: git Tainted: G S                5.12.0-rc2-00007-g3ab6e585a7f9 #1453
      Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Call Trace:
       dump_stack+0x9c/0xcf
       print_address_description.constprop.0+0x18/0x239
       kasan_report+0x174/0x1b0
       trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
       xprt_prepare_transmit+0x8e/0xc1 [sunrpc]
       call_transmit+0x4d/0xc6 [sunrpc]
      
      Fixes: 9ce07ae5 ("SUNRPC: Replace dprintk() call site in xprt_prepare_transmit")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      6cf23783
    • Chuck Lever's avatar
      SUNRPC: Add tracepoint that fires when an RPC is retransmitted · e936a597
      Chuck Lever authored
      
      
      A separate tracepoint can be left enabled all the time to capture
      rare but important retransmission events. So for example:
      
      kworker/u26:3-568   [009]   156.967933: xprt_retransmit:      task:44093@5 xid=0xa25dbc79 nfsv3 WRITE ntrans=2
      
      Or, for example, enable all nfs and nfs4 tracepoints, and set up a
      trigger to disable tracing when xprt_retransmit fires to capture
      everything that leads up to it.
      
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      e936a597
    • Chuck Lever's avatar
      SUNRPC: Move fault injection call sites · 7638e0bf
      Chuck Lever authored
      
      
      I've hit some crashes that occur in the xprt_rdma_inject_disconnect
      path. It appears that, for some provides, rdma_disconnect() can
      take so long that the transport can disconnect and release its
      hardware resources while rdma_disconnect() is still running,
      resulting in a UAF in the provider.
      
      The transport's fault injection method may depend on the stability
      of transport data structures. That means it needs to be invoked
      only from contexts that hold the transport write lock.
      
      Fixes: 4a068258 ("SUNRPC: Transport fault injection")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      7638e0bf
  3. Apr 05, 2021
  4. Mar 23, 2021
    • Vladimir Oltean's avatar
      net: bridge: don't notify switchdev for local FDB addresses · 6ab4c311
      Vladimir Oltean authored
      As explained in this discussion:
      https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
      
      the switchdev notifiers for FDB entries managed to have a zero-day bug.
      The bridge would not say that this entry is local:
      
      ip link add br0 type bridge
      ip link set swp0 master br0
      bridge fdb add dev swp0 00:01:02:03:04:05 master local
      
      and the switchdev driver would be more than happy to offload it as a
      normal static FDB entry. This is despite the fact that 'local' and
      non-'local' entries have completely opposite directions: a local entry
      is locally terminated and not forwarded, whereas a static entry is
      forwarded and not locally terminated. So, for example, DSA would install
      this entry on swp0 instead of installing it on the CPU port as it should.
      
      There is an even sadder part, which is that the 'local' flag is implicit
      if 'static' is not specified, meaning that this command produces the
      same result of adding a 'local' entry:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master
      
      I've updated the man pages for 'bridge', and after reading it now, it
      should be pretty clear to any user that the commands above were broken
      and should have never resulted in the 00:01:02:03:04:05 address being
      forwarded (this behavior is coherent with non-switchdev interfaces):
      https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/
      
      
      If you're a user reading this and this is what you want, just use:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master static
      
      Because switchdev should have given drivers the means from day one to
      classify FDB entries as local/non-local, but didn't, it means that all
      drivers are currently broken. So we can just as well omit the switchdev
      notifications for local FDB entries, which is exactly what this patch
      does to close the bug in stable trees. For further development work
      where drivers might want to trap the local FDB entries to the host, we
      can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and
      selectively make drivers act upon that bit, while all the others ignore
      those entries if the 'is_local' bit is set.
      
      Fixes: 6b26b51b ("net: bridge: Add support for notifying devices about FDB add/del")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ab4c311
    • Marcelo Ricardo Leitner's avatar
      net/sched: act_ct: clear post_ct if doing ct_clear · 8ca1b090
      Marcelo Ricardo Leitner authored
      
      
      Invalid detection works with two distinct moments: act_ct tries to find
      a conntrack entry and set post_ct true, indicating that that was
      attempted. Then, when flow dissector tries to dissect CT info and no
      entry is there, it knows that it was tried and no entry was found, and
      synthesizes/sets
                        key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
                                        TCA_FLOWER_KEY_CT_FLAGS_INVALID;
      mimicing what OVS does.
      
      OVS has this a bit more streamlined, as it recomputes the key after
      trying to find a conntrack entry for it.
      
      Issue here is, when we have 'tc action ct clear', it didn't clear
      post_ct, causing a subsequent match on 'ct_state -trk' to fail, due to
      the above. The fix, thus, is to clear it.
      
      Reproducer rules:
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 0 \
      	protocol ip flower ip_proto tcp ct_state -trk \
      	action ct zone 1 pipe \
      	action goto chain 2
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 2 \
      	protocol ip flower \
      	action ct clear pipe \
      	action goto chain 4
      tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 4 \
      	protocol ip flower ct_state -trk \
      	action mirred egress redirect dev enp130s0f1np1_0
      
      With the fix, the 3rd rule matches, like it does with OVS kernel
      datapath.
      
      Fixes: 7baf2429 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ca1b090
    • George McCollister's avatar
      net: dsa: don't assign an error value to tag_ops · e0c755a4
      George McCollister authored
      
      
      Use a temporary variable to hold the return value from
      dsa_tag_driver_get() instead of assigning it to dst->tag_ops. Leaving
      an error value in dst->tag_ops can result in deferencing an invalid
      pointer when a deferred switch configuration happens later.
      
      Fixes: 357f203b ("net: dsa: keep a copy of the tagging protocol in the DSA switch tree")
      
      Signed-off-by: default avatarGeorge McCollister <george.mccollister@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0c755a4
  5. Mar 22, 2021
    • Vladimir Oltean's avatar
      net: ipconfig: ic_dev can be NULL in ic_close_devs · a50a151e
      Vladimir Oltean authored
      
      
      ic_close_dev contains a generalization of the logic to not close a
      network interface if it's the host port for a DSA switch. This logic is
      disguised behind an iteration through the lowers of ic_dev in
      ic_close_dev.
      
      When no interface for ipconfig can be found, ic_dev is NULL, and
      ic_close_dev:
      - dereferences a NULL pointer when assigning selected_dev
      - would attempt to search through the lower interfaces of a NULL
        net_device pointer
      
      So we should protect against that case.
      
      The "lower_dev" iterator variable was shortened to "lower" in order to
      keep the 80 character limit.
      
      Fixes: f68cbaed ("net: ipconfig: avoid use-after-free in ic_close_devs")
      Fixes: 46acf7bd ("Revert "net: ipv4: handle DSA enabled master network devices"")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarHeiko Thiery <heiko.thiery@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a50a151e
  6. Mar 20, 2021
  7. Mar 19, 2021
  8. Mar 18, 2021
  9. Mar 17, 2021
  10. Mar 16, 2021
Loading