Skip to content
  1. May 07, 2020
  2. Mar 27, 2020
    • Vladimir Oltean's avatar
      net: dsa: configure the MTU for switch ports · bfcb8132
      Vladimir Oltean authored
      
      
      It is useful be able to configure port policers on a switch to accept
      frames of various sizes:
      
      - Increase the MTU for better throughput from the default of 1500 if it
        is known that there is no 10/100 Mbps device in the network.
      - Decrease the MTU to limit the latency of high-priority frames under
        congestion, or work around various network segments that add extra
        headers to packets which can't be fragmented.
      
      For DSA slave ports, this is mostly a pass-through callback, called
      through the regular ndo ops and at probe time (to ensure consistency
      across all supported switches).
      
      The CPU port is called with an MTU equal to the largest configured MTU
      of the slave ports. The assumption is that the user might want to
      sustain a bidirectional conversation with a partner over any switch
      port.
      
      The DSA master is configured the same as the CPU port, plus the tagger
      overhead. Since the MTU is by definition L2 payload (sans Ethernet
      header), it is up to each individual driver to figure out if it needs to
      do anything special for its frame tags on the CPU port (it shouldn't
      except in special cases). So the MTU does not contain the tagger
      overhead on the CPU port.
      However the MTU of the DSA master, minus the tagger overhead, is used as
      a proxy for the MTU of the CPU port, which does not have a net device.
      This is to avoid uselessly calling the .change_mtu function on the CPU
      port when nothing should change.
      
      So it is safe to assume that the DSA master and the CPU port MTUs are
      apart by exactly the tagger's overhead in bytes.
      
      Some changes were made around dsa_master_set_mtu(), function which was
      now removed, for 2 reasons:
        - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to
          do the same thing in DSA
        - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method
      That is to say, there's no need for this function in DSA, we can safely
      call dev_set_mtu() directly, take the rtnl lock when necessary, and just
      propagate whatever errors get reported (since the user probably wants to
      be informed).
      
      Some inspiration (mainly in the MTU DSA notifier) was taken from a
      vaguely similar patch from Murali and Florian, who are credited as
      co-developers down below.
      
      Co-developed-by: default avatarMurali Krishna Policharla <murali.policharla@broadcom.com>
      Signed-off-by: default avatarMurali Krishna Policharla <murali.policharla@broadcom.com>
      Co-developed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfcb8132
  3. Dec 28, 2019
    • Vladimir Oltean's avatar
      net: dsa: Deny PTP on master if switch supports it · f685e609
      Vladimir Oltean authored
      It is possible to kill PTP on a DSA switch completely and absolutely,
      until a reboot, with a simple command:
      
      tcpdump -i eth2 -j adapter_unsynced
      
      where eth2 is the switch's DSA master.
      
      Why? Well, in short, the PTP API in place today is a bit rudimentary and
      relies on applications to retrieve the TX timestamps by polling the
      error queue and looking at the cmsg structure. But there is no timestamp
      identification of any sorts (except whether it's HW or SW), you don't
      know how many more timestamps are there to come, which one is this one,
      from whom it is, etc. In other words, the SO_TIMESTAMPING API is
      fundamentally limited in that you can get a single HW timestamp from the
      stack.
      
      And the "-j adapter_unsynced" flag of tcpdump enables hardware
      timestamping.
      
      So let's imagine what happens when the DSA master decides it wants to
      deliver TX timestamps to the skb's socket too:
      - The timestamp that the user space sees is taken by the DSA master.
        Whereas the RX timestamp will eventually be overwritten by the DSA
        switch. So the RX and TX timestamps will be in different time bases
        (aka garbage).
      - The user space applications have no way to deal with the second (real)
        TX timestamp finally delivered by the DSA switch, or even to know to
        wait for it.
      
      Take ptp4l from the linuxptp project, for example. This is its behavior
      after running tcpdump, before the patch:
      
      ptp4l[172]: [6469.594] Unexpected data on socket err queue:
      ptp4l[172]: [6469.693] rms    8 max   16 freq -21257 +/-  11 delay   748 +/-   0
      ptp4l[172]: [6469.711] Unexpected data on socket err queue:
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 05 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: [6469.721] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
      ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b1 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: [6469.838] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
      ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 06 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: [6469.848] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 13 02
      ptp4l[172]: 0010 00 36 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 04 1a 45 05 7f
      ptp4l[172]: 0030 00 00 5e 05 41 32 27 c2 1a 68 00 04 9f ff fe 05
      ptp4l[172]: 0040 de 06 00 01
      ptp4l[172]: [6469.855] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
      ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b2 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: [6469.974] Unexpected data on socket err queue:
      ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02
      ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00
      ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 07 00 fd
      ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00
      
      The ptp4l program itself is heavily patched to show this (more details
      here [0]). Otherwise, by default it just hangs.
      
      On the other hand, with the DSA patch to disallow HW timestamping
      applied:
      
      tcpdump -i eth2 -j adapter_unsynced
      tcpdump: SIOCSHWTSTAMP failed: Device or resource busy
      
      So it is a fact of life that PTP timestamping on the DSA master is
      incompatible with timestamping on the switch MAC, at least with the
      current API. And if the switch supports PTP, taking the timestamps from
      the switch MAC is highly preferable anyway, due to the fact that those
      don't contain the queuing latencies of the switch. So just disallow PTP
      on the DSA master if there is any PTP-capable switch attached.
      
      [0]: https://sourceforge.net/p/linuxptp/mailman/message/36880648/
      
      
      
      Fixes: 0336369d ("net: dsa: forward hardware timestamping ioctls to switch driver")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f685e609
  4. Oct 24, 2019
    • Taehee Yoo's avatar
      net: core: add generic lockdep keys · ab92d68f
      Taehee Yoo authored
      
      
      Some interface types could be nested.
      (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
      These interface types should set lockdep class because, without lockdep
      class key, lockdep always warn about unexisting circular locking.
      
      In the current code, these interfaces have their own lockdep class keys and
      these manage itself. So that there are so many duplicate code around the
      /driver/net and /net/.
      This patch adds new generic lockdep keys and some helper functions for it.
      
      This patch does below changes.
      a) Add lockdep class keys in struct net_device
         - qdisc_running, xmit, addr_list, qdisc_busylock
         - these keys are used as dynamic lockdep key.
      b) When net_device is being allocated, lockdep keys are registered.
         - alloc_netdev_mqs()
      c) When net_device is being free'd llockdep keys are unregistered.
         - free_netdev()
      d) Add generic lockdep key helper function
         - netdev_register_lockdep_key()
         - netdev_unregister_lockdep_key()
         - netdev_update_lockdep_key()
      e) Remove unnecessary generic lockdep macro and functions
      f) Remove unnecessary lockdep code of each interfaces.
      
      After this patch, each interface modules don't need to maintain
      their lockdep keys.
      
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab92d68f
  5. Aug 06, 2019
    • Vivien Didelot's avatar
      net: dsa: dump CPU port regs through master · 48e23311
      Vivien Didelot authored
      
      
      Merge the CPU port registers dump into the master interface registers
      dump through ethtool, by nesting the ethtool_drvinfo and ethtool_regs
      structures of the CPU port into the dump.
      
      drvinfo->regdump_len will contain the full data length, while regs->len
      will contain only the master interface registers dump length.
      
      This allows for example to dump the CPU port registers on a ZII Dev
      C board like this:
      
          # ethtool -d eth1
          0x004:                                              0x00000000
          0x008:                                              0x0a8000aa
          0x010:                                              0x01000000
          0x014:                                              0x00000000
          0x024:                                              0xf0000102
          0x040:                                              0x6d82c800
          0x044:                                              0x00000020
          0x064:                                              0x40000000
          0x084: RCR (Receive Control Register)               0x47c00104
              MAX_FL (Maximum frame length)                   1984
              FCE (Flow control enable)                       0
              BC_REJ (Broadcast frame reject)                 0
              PROM (Promiscuous mode)                         0
              DRT (Disable receive on transmit)               0
              LOOP (Internal loopback)                        0
          0x0c4: TCR (Transmit Control Register)              0x00000004
              RFC_PAUSE (Receive frame control pause)         0
              TFC_PAUSE (Transmit frame control pause)        0
              FDEN (Full duplex enable)                       1
              HBC (Heartbeat control)                         0
              GTS (Graceful transmit stop)                    0
          0x0e4:                                              0x76735d6d
          0x0e8:                                              0x7e9e8808
          0x0ec:                                              0x00010000
          .
          .
          .
          88E6352  Switch Port Registers
          ------------------------------
          00: Port Status                            0x4d04
                Pause Enabled                        0
                My Pause                             1
                802.3 PHY Detected                   0
                Link Status                          Up
                Duplex                               Full
                Speed                                100 or 200 Mbps
                EEE Enabled                          0
                Transmitter Paused                   0
                Flow Control                         0
                Config Mode                          0x4
          01: Physical Control                       0x003d
                RGMII Receive Timing Control         Default
                RGMII Transmit Timing Control        Default
                200 BASE Mode                        100
                Flow Control's Forced value          0
                Force Flow Control                   0
                Link's Forced value                  Up
                Force Link                           1
                Duplex's Forced value                Full
                Force Duplex                         1
                Force Speed                          100 or 200 Mbps
          .
          .
          .
      
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48e23311
  6. May 30, 2019
  7. Feb 05, 2019
    • Marc Zyngier's avatar
      net: dsa: Fix lockdep false positive splat · c8101f77
      Marc Zyngier authored
      
      
      Creating a macvtap on a DSA-backed interface results in the following
      splat when lockdep is enabled:
      
      [   19.638080] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
      [   23.041198] device lan0 entered promiscuous mode
      [   23.043445] device eth0 entered promiscuous mode
      [   23.049255]
      [   23.049557] ============================================
      [   23.055021] WARNING: possible recursive locking detected
      [   23.060490] 5.0.0-rc3-00013-g56c857a1b8d3 #118 Not tainted
      [   23.066132] --------------------------------------------
      [   23.071598] ip/2861 is trying to acquire lock:
      [   23.076171] 00000000f61990cb (_xmit_ETHER){+...}, at: dev_set_rx_mode+0x1c/0x38
      [   23.083693]
      [   23.083693] but task is already holding lock:
      [   23.089696] 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
      [   23.096774]
      [   23.096774] other info that might help us debug this:
      [   23.103494]  Possible unsafe locking scenario:
      [   23.103494]
      [   23.109584]        CPU0
      [   23.112093]        ----
      [   23.114601]   lock(_xmit_ETHER);
      [   23.117917]   lock(_xmit_ETHER);
      [   23.121233]
      [   23.121233]  *** DEADLOCK ***
      [   23.121233]
      [   23.127325]  May be due to missing lock nesting notation
      [   23.127325]
      [   23.134315] 2 locks held by ip/2861:
      [   23.137987]  #0: 000000003b766c72 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x338/0x4e0
      [   23.146231]  #1: 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
      [   23.153757]
      [   23.153757] stack backtrace:
      [   23.158243] CPU: 0 PID: 2861 Comm: ip Not tainted 5.0.0-rc3-00013-g56c857a1b8d3 #118
      [   23.166212] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
      [   23.172843] Call trace:
      [   23.175358]  dump_backtrace+0x0/0x188
      [   23.179116]  show_stack+0x14/0x20
      [   23.182524]  dump_stack+0xb4/0xec
      [   23.185928]  __lock_acquire+0x123c/0x1860
      [   23.190048]  lock_acquire+0xc8/0x248
      [   23.193724]  _raw_spin_lock_bh+0x40/0x58
      [   23.197755]  dev_set_rx_mode+0x1c/0x38
      [   23.201607]  dev_set_promiscuity+0x3c/0x50
      [   23.205820]  dsa_slave_change_rx_flags+0x5c/0x70
      [   23.210567]  __dev_set_promiscuity+0x148/0x1e0
      [   23.215136]  __dev_set_rx_mode+0x74/0x98
      [   23.219167]  dev_uc_add+0x54/0x70
      [   23.222575]  macvlan_open+0x170/0x1d0
      [   23.226336]  __dev_open+0xe0/0x160
      [   23.229830]  __dev_change_flags+0x16c/0x1b8
      [   23.234132]  dev_change_flags+0x20/0x60
      [   23.238074]  do_setlink+0x2d0/0xc50
      [   23.241658]  __rtnl_newlink+0x5f8/0x6e8
      [   23.245601]  rtnl_newlink+0x50/0x78
      [   23.249184]  rtnetlink_rcv_msg+0x360/0x4e0
      [   23.253397]  netlink_rcv_skb+0xe8/0x130
      [   23.257338]  rtnetlink_rcv+0x14/0x20
      [   23.261012]  netlink_unicast+0x190/0x210
      [   23.265043]  netlink_sendmsg+0x288/0x350
      [   23.269075]  sock_sendmsg+0x18/0x30
      [   23.272659]  ___sys_sendmsg+0x29c/0x2c8
      [   23.276602]  __sys_sendmsg+0x60/0xb8
      [   23.280276]  __arm64_sys_sendmsg+0x1c/0x28
      [   23.284488]  el0_svc_common+0xd8/0x138
      [   23.288340]  el0_svc_handler+0x24/0x80
      [   23.292192]  el0_svc+0x8/0xc
      
      This looks fairly harmless (no actual deadlock occurs), and is
      fixed in a similar way to c6894dec ("bridge: fix lockdep
      addr_list_lock false positive splat") by putting the addr_list_lock
      in its own lockdep class.
      
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8101f77
  8. Jan 17, 2019
  9. Dec 09, 2018
  10. Dec 06, 2018
  11. Dec 01, 2018
  12. Apr 27, 2018
  13. Mar 04, 2018
  14. Nov 09, 2017
  15. Oct 01, 2017
  16. Sep 19, 2017
Loading