Skip to content
Snippets Groups Projects
  1. Jan 23, 2023
  2. Dec 19, 2022
  3. Dec 13, 2022
  4. Dec 12, 2022
  5. Dec 10, 2022
  6. Dec 09, 2022
    • Willem de Bruijn's avatar
      net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP · b534dc46
      Willem de Bruijn authored
      
      Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
      write_seq sockets instead of snd_una.
      
      This should have been the behavior from the start. Because processes
      may now exist that rely on the established behavior, do not change
      behavior of the existing option, but add the right behavior with a new
      flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on
      stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID.
      
      Intuitively the contract is that the counter is zero after the
      setsockopt, so that the next write N results in a notification for
      the last byte N - 1.
      
      On idle sockets snd_una == write_seq and this holds for both. But on
      sockets with data in transmission, snd_una records the unacked offset
      in the stream. This depends on the ACK response from the peer. A
      process cannot learn this in a race free manner (ioctl SIOCOUTQ is one
      racy approach).
      
      write_seq records the offset at the last byte written by the process.
      This is a better starting point. It matches the intuitive contract in
      all circumstances, unaffected by external behavior.
      
      The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
      Move the field in struct sock to avoid growing the socket (for some
      common CONFIG variants). The UAPI interface so_timestamping.flags is
      already int, so 32 bits wide.
      
      Reported-by: default avatarSotirios Delimanolis <sotodel@meta.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20221207143701.29861-1-willemdebruijn.kernel@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b534dc46
  7. Dec 08, 2022
  8. Dec 06, 2022
  9. Dec 05, 2022
  10. Dec 02, 2022
  11. Dec 01, 2022
  12. Nov 30, 2022
  13. Nov 24, 2022
  14. Nov 23, 2022
  15. Nov 18, 2022
  16. Nov 16, 2022
    • Kuniyuki Iwashima's avatar
      udp: Introduce optional per-netns hash table. · 9804985b
      Kuniyuki Iwashima authored
      The maximum hash table size is 64K due to the nature of the protocol. [0]
      It's smaller than TCP, and fewer sockets can cause a performance drop.
      
      On an EC2 c5.24xlarge instance (192 GiB memory), after running iperf3 in
      different netns, creating 32Mi sockets without data transfer in the root
      netns causes regression for the iperf3's connection.
      
        uhash_entries		sockets		length		Gbps
      	    64K		      1		     1		5.69
      			    1Mi		    16		5.27
      			    2Mi		    32		4.90
      			    4Mi		    64		4.09
      			    8Mi		   128		2.96
      			   16Mi		   256		2.06
      			   32Mi		   512		1.12
      
      The per-netns hash table breaks the lengthy lists into shorter ones.  It is
      useful on a multi-tenant system with thousands of netns.  With smaller hash
      tables, we can look up sockets faster, isolate noisy neighbours, and reduce
      lock contention.
      
      The max size of the per-netns table is 64K as well.  This is because the
      possible hash range by udp_hashfn() always fits in 64K within the same
      netns and we cannot make full use of the whole buckets larger than 64K.
      
        /* 0 < num < 64K  ->  X < hash < X + 64K */
        (num + net_hash_mix(net)) & mask;
      
      Also, the min size is 128.  We use a bitmap to search for an available
      port in udp_lib_get_port().  To keep the bitmap on the stack and not
      fire the CONFIG_FRAME_WARN error at build time, we round up the table
      size to 128.
      
      The sysctl usage is the same with TCP:
      
        $ dmesg | cut -d ' ' -f 6- | grep "UDP hash"
        UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)
      
        # sysctl net.ipv4.udp_hash_entries
        net.ipv4.udp_hash_entries = 65536  # can be changed by uhash_entries
      
        # sysctl net.ipv4.udp_child_hash_entries
        net.ipv4.udp_child_hash_entries = 0  # disabled by default
      
        # ip netns add test1
        # ip netns exec test1 sysctl net.ipv4.udp_hash_entries
        net.ipv4.udp_hash_entries = -65536  # share the global table
      
        # sysctl -w net.ipv4.udp_child_hash_entries=100
        net.ipv4.udp_child_hash_entries = 100
      
        # ip netns add test2
        # ip netns exec test2 sysctl net.ipv4.udp_hash_entries
        net.ipv4.udp_hash_entries = 128  # own a per-netns table with 2^n buckets
      
      We could optimise the hash table lookup/iteration further by removing
      the netns comparison for the per-netns one in the future.  Also, we
      could optimise the sparse udp_hslot layout by putting it in udp_table.
      
      [0]: https://lore.kernel.org/netdev/4ACC2815.7010101@gmail.com/
      
      
      
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9804985b
    • Walter Heymans's avatar
      Documentation: nfp: update documentation · 1ec6360d
      Walter Heymans authored
      
      The NFP documentation is updated to include information about Corigine,
      and the new NFP3800 chips. The 'Acquiring Firmware' section is updated
      with new information about where to find firmware.
      
      Two new sections are added to expand the coverage of the documentation.
      The new sections include:
      - Devlink Info
      - Configure Device
      
      Signed-off-by: default avatarWalter Heymans <walter.heymans@corigine.com>
      Reviewed-by: default avatarNiklas Söderlund <niklas.soderlund@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20221115090834.738645-1-simon.horman@corigine.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ec6360d
  17. Nov 10, 2022
  18. Nov 08, 2022
  19. Nov 05, 2022
  20. Oct 28, 2022
  21. Oct 25, 2022
  22. Oct 19, 2022
  23. Oct 11, 2022
  24. Oct 06, 2022
  25. Oct 04, 2022
    • Oleksij Rempel's avatar
      ethtool: add interface to interact with Ethernet Power Equipment · 18ff0bcd
      Oleksij Rempel authored
      
      Add interface to support Power Sourcing Equipment. At current step it
      provides generic way to address all variants of PSE devices as defined
      in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4
      PoDL Power Sourcing Equipment (PSE).
      
      Currently supported and mandatory objects are:
      IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus
      IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState
      IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl
      
      This is minimal interface needed to control PSE on each separate
      ethernet port but it provides not all mandatory objects specified in
      IEEE 802.3-2018.
      
      Since "PoDL PSE" and "PSE" have similar names, but some different values
      I decide to not merge them and keep separate naming schema. This should
      allow as to be as close to IEEE 802.3 spec as possible and avoid name
      conflicts in the future.
      
      This implementation is connected to PHYs instead of MACs because PSE
      auto classification can potentially interfere with PHY auto negotiation.
      So, may be some extra PHY related initialization will be needed.
      
      With WIP version of ethtools interaction with PSE capable link looks
      as following:
      
      $ ip l
      ...
      5: t1l1@eth0: <BROADCAST,MULTICAST> ..
      ...
      
      $ ethtool --show-pse t1l1
      PSE attributs for t1l1:
      PoDL PSE Admin State: disabled
      PoDL PSE Power Detection Status: disabled
      
      $ ethtool --set-pse t1l1 podl-pse-admin-control enable
      $ ethtool --show-pse t1l1
      PSE attributs for t1l1:
      PoDL PSE Admin State: enabled
      PoDL PSE Power Detection Status: delivering power
      
      Signed-off-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      18ff0bcd
  26. Sep 23, 2022
    • Sean Anderson's avatar
      net: phy: Add support for rate matching · 0c3e10cb
      Sean Anderson authored
      
      This adds support for rate matching (also known as rate adaptation) to
      the phy subsystem. The general idea is that the phy interface runs at
      one speed, and the MAC throttles the rate at which it sends packets to
      the link speed. There's a good overview of several techniques for
      achieving this at [1]. This patch adds support for three: pause-frame
      based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and
      2BASE-TL), and open-loop-based (such as in 10GBASE-W).
      
      This patch makes a few assumptions and a few non assumptions about the
      types of rate matching available. First, it assumes that different phys
      may use different forms of rate matching. Second, it assumes that phys
      can use rate matching for any of their supported link speeds (e.g. if a
      phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T).
      Third, it does not assume that all interface modes will use the same
      form of rate matching. Fourth, it does not assume that all phy devices
      will support rate matching (even if some do). Relaxing or strengthening
      these (non-)assumptions could result in a different API. For example, if
      all interface modes were assumed to use the same form of rate matching,
      then a bitmask of interface modes supportting rate matching would
      suffice.
      
      For some better visibility into the process, the current rate matching
      mode is exposed as part of the ethtool ksettings. For the moment, only
      read access is supported. I'm not sure what userspace might want to
      configure yet (disable it altogether, disable just one mode, specify the
      mode to use, etc.). For the moment, since only pause-based rate
      adaptation support is added in the next few commits, rate matching can
      be disabled altogether by adjusting the advertisement.
      
      802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and
      "rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls
      this feature "rate adaptation". I chose "rate matching" because it is
      shorter, and because Russell doesn't think "adaptation" is correct in this
      context.
      
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c3e10cb
  27. Sep 22, 2022
    • Tony Lu's avatar
      net/smc: Unbind r/w buffer size from clcsock and make them tunable · 0227f058
      Tony Lu authored
      
      Currently, SMC uses smc->sk.sk_{rcv|snd}buf to create buffers for
      send buffer and RMB. And the values of buffer size are from tcp_{w|r}mem
      in clcsock.
      
      The buffer size from TCP socket doesn't fit SMC well. Generally, buffers
      are usually larger than TCP for SMC-R/-D to get higher performance, for
      they are different underlay devices and paths.
      
      So this patch unbinds buffer size from TCP, and introduces two sysctl
      knobs to tune them independently. Also, these knobs are per net
      namespace and work for containers.
      
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0227f058
Loading