- May 28, 2020
-
-
Christoph Hellwig authored
Add a helper to directly set the SO_KEEPALIVE sockopt from kernel space without going through a fake uaccess. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly enable timestamps instead of setting the SO_TIMESTAMP* sockopts from kernel space and going through a fake uaccess. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel space without going through a fake uaccess. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_SNDTIMEO_NEW sockopt from kernel space without going through a fake uaccess. The interface is simplified to only pass the seconds value, as that is the only thing needed at the moment. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_PRIORITY sockopt from kernel space without going through a fake uaccess. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_LINGER sockopt from kernel space with onoff set to true and a linger time of 0 without going through a fake uaccess. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christoph Hellwig authored
Add a helper to directly set the SO_REUSEADDR sockopt from kernel space without going through a fake uaccess. For this the iscsi target now has to formally depend on inet to avoid a mostly theoretical compile failure. For actual operation it already did depend on having ipv4 or ipv6 support. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Make tcp_ld_RTO_revert() helper available to IPv6, and implement RFC 6069 : Quoting this RFC : 3. Connectivity Disruption Indication For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of the ICMP destination unreachable message of code 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 destination unreachable message of code 0 (no route to destination) [RFC4443]. As with IPv4, a router should generate an ICMPv6 destination unreachable message of code 0 in response to a packet that cannot be delivered to its destination address because it lacks a matching entry in its routing table. Signed-off-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Yuchung Cheng <ycheng@google.com> Acked-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- May 27, 2020
-
-
Christoph Hellwig authored
No users left. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jonas Falkevik authored
change typo in function name "nofity" to "notify" sctp_ulpevent_nofity_peer_addr_change -> sctp_ulpevent_notify_peer_addr_change Signed-off-by:
Jonas Falkevik <jonas.falkevik@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
This patch adds a field to the tls rx offload context which enables drivers to force a send_resync call. This field can be used by drivers to request a resync at the next possible tls record. It is beneficial for hardware that provides the resync sequence number asynchronously. In such cases, the packet that triggered the resync does not contain the information required for a resync. Instead, the driver requests resync for all the following TLS record until the asynchronous notification with the resync request TCP sequence arrives. A following series for mlx5e ConnectX-6DX TLS RX offload support will use this mechanism. Signed-off-by:
Boris Pismenny <borisp@mellanox.com> Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Reviewed-by:
Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by:
Saeed Mahameed <saeedm@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Cong Wang authored
With this tracepoint, we could know when qdisc's are created, especially those default qdisc's. Sample output: tc-736 [001] ...1 56.230107: qdisc_create: dev=ens3 kind=pfifo parent=1:0 tc-736 [001] ...1 56.230113: qdisc_create: dev=ens3 kind=hfsc parent=ffff:ffff tc-738 [001] ...1 56.256816: qdisc_create: dev=ens3 kind=pfifo parent=1:100 tc-739 [001] ...1 56.267584: qdisc_create: dev=ens3 kind=pfifo parent=1:200 tc-740 [001] ...1 56.279649: qdisc_create: dev=ens3 kind=fq_codel parent=1:100 tc-741 [001] ...1 56.289996: qdisc_create: dev=ens3 kind=pfifo_fast parent=1:200 tc-745 [000] .N.1 111.687483: qdisc_create: dev=ens3 kind=ingress parent=ffff:fff1 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Cong Wang authored
Add two tracepoints for qdisc_reset() and qdisc_destroy() to track qdisc resetting and destroying. Sample output: tc-756 [000] ...3 138.355662: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355720: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355867: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-756 [000] ...1 138.355930: qdisc_destroy: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0 tc-757 [000] ...2 143.073780: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.073878: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.074114: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 tc-757 [000] ...1 143.074228: qdisc_destroy: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0 Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Horatiu Vultur authored
This patch reworks the MRP netlink interface. Before, each attribute represented a binary structure which made it hard to be extended. Therefore update the MRP netlink interface such that each existing attribute to be a nested attribute which contains the fields of the binary structures. In this way the MRP netlink interface can be extended without breaking the backwards compatibility. It is also using strict checking for attributes under the MRP top attribute. Signed-off-by:
Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Allow the user to configure where on the cable the TDR data should be retrieved, in terms of first and last sample, and the step between samples. Also add the ability to ask for TDR data for just one pair. If this configuration is not provided, it defaults to 1-150m at 1m intervals for all pairs. Signed-off-by:
Andrew Lunn <andrew@lunn.ch> v3: Move the TDR configuration into a structure Add a range check on step Use NL_SET_ERR_MSG_ATTR() when appropriate Move TDR configuration into a nest Document attributes in the request Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Add helpers for returning raw TDR helpers in netlink messages. Signed-off-by:
Andrew Lunn <andrew@lunn.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Add the generic parts of the code used to trigger a cable test and return raw TDR data. Any PHY driver which support this must implement the new driver op. Signed-off-by:
Andrew Lunn <andrew@lunn.ch> v2 Update nxp-tja11xx for API change. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Some Ethernet PHYs can return the raw time domain reflectromatry data. Add the attributes to allow this data to be requested and returned via netlink ethtool. Signed-off-by:
Andrew Lunn <andrew@lunn.ch> v2: m -> cm Report what the PHY actually used for start/stop/step. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- May 26, 2020
-
-
Russell King authored
There is a recurring pattern throughout some of the PHY code converting a devad and regnum to our packed clause 45 representation. Rather than having this scattered around the code, let's put a common translation function in mdio.h, and provide some register accessors. Convert the phylib core, phylink, bcm87xx and cortina to use these. Signed-off-by:
Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by:
Andrew Lunn <andrew@lunn.ch> Reviewed-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Guillaume Nault authored
With struct flow_dissector_key_mpls now recording the first FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of these LSEs independently. In order to avoid creating new netlink attributes for every possible depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute that contains the list of LSEs to match. Each LSE is represented by another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains the attributes representing the depth and the MPLS fields to match at this depth (label, TTL, etc.). For each MPLS field, the mask is always set to all-ones, as this is what the original API did. We could allow user configurable masks in the future if there is demand for more flexibility. The new API also allows to only specify an LSE depth. In that case, Flower only verifies that the MPLS label stack depth is greater or equal to the provided depth (that is, an LSE exists at this depth). Filters that only match on one (or more) fields of the first LSE are dumped using the old netlink attributes, to avoid confusing user space programs that don't understand the new API. Signed-off-by:
Guillaume Nault <gnault@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Guillaume Nault authored
The current MPLS dissector only parses the first MPLS Label Stack Entry (second LSE can be parsed too, but only to set a key_id). This patch adds the possibility to parse several LSEs by making __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long as the Bottom Of Stack bit hasn't been seen, up to a maximum of FLOW_DIS_MPLS_MAX entries. FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for many practical purposes, without wasting too much space. To record the parsed values, flow_dissector_key_mpls is modified to store an array of stack entries, instead of just the values of the first one. A bit field, "used_lses", is also added to keep track of the LSEs that have been set. The objective is to avoid defining a new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack. TC flower is adapted for the new struct flow_dissector_key_mpls layout. Matching on several MPLS Label Stack Entries will be added in the next patch. The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and mlx5's parse_tunnel() now verify that the rule only uses the first LSE and fail if it doesn't. Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is slightly modified. Instead of recording the first Entropy Label, it now records the last one. This shouldn't have any consequences since there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY in the tree. We'd probably better do a hash of all parsed MPLS labels instead (excluding reserved labels) anyway. That'd give better entropy and would probably also simplify the code. But that's not the purpose of this patch, so I'm keeping that as a future possible improvement. Signed-off-by:
Guillaume Nault <gnault@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Yuval Basson authored
In older FW versions the completion flag was treated as the ack flag in edpm messages. Expose the FW option of setting which mode the QP is in by adding a flag to the qedr <-> qed API. Flag is added for backward compatibility with libqedr. This flag will be set by qedr after determining whether the libqedr is using the updated version. Fixes: f1093940 ("qed: Add support for QP verbs") Signed-off-by:
Yuval Basson <yuval.bason@marvell.com> Signed-off-by:
Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- May 23, 2020
-
-
Bartosz Golaszewski authored
Provide devm_register_netdev() - a device resource managed variant of register_netdev(). This new helper will only work for net_device structs that are also already managed by devres. Signed-off-by:
Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Geert Uytterhoeven authored
On e.g. m68k, the alignment of 32-bit values is only 2 bytes, leading to the following: ./include/linux/avf/virtchnl.h:147:36: warning: division by zero [-Wdiv-by-zero] { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) } ^ ./include/linux/avf/virtchnl.h:577:1: note: in expansion of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’ VIRTCHNL_CHECK_STRUCT_LEN(272, virtchnl_filter); ^~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/avf/virtchnl.h:577:32: error: enumerator value for ‘virtchnl_static_assert_virtchnl_filter’ is not an integer constant VIRTCHNL_CHECK_STRUCT_LEN(272, virtchnl_filter); ^~~~~~~~~~~~~~~ ./include/linux/avf/virtchnl.h:147:53: note: in definition of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’ { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) } ^ ./include/linux/avf/virtchnl.h:147:36: warning: division by zero [-Wdiv-by-zero] { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) } ^ ./include/linux/avf/virtchnl.h:619:1: note: in expansion of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_pf_event); ^~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/avf/virtchnl.h:619:31: error: enumerator value for ‘virtchnl_static_assert_virtchnl_pf_event’ is not an integer constant VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_pf_event); ^~~~~~~~~~~~~~~~~ ./include/linux/avf/virtchnl.h:147:53: note: in definition of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’ { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) } ^ ./include/linux/avf/virtchnl.h:147:36: warning: division by zero [-Wdiv-by-zero] { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) } ^ ./include/linux/avf/virtchnl.h:640:1: note: in expansion of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_iwarp_qv_info); ^~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/avf/virtchnl.h:640:31: error: enumerator value for ‘virtchnl_static_assert_virtchnl_iwarp_qv_info’ is not an integer constant VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_iwarp_qv_info); ^~~~~~~~~~~~~~~~~~~~~~ ./include/linux/avf/virtchnl.h:147:53: note: in definition of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’ { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) } ^ ./include/linux/avf/virtchnl.h:147:36: warning: division by zero [-Wdiv-by-zero] { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) } ^ ./include/linux/avf/virtchnl.h:647:1: note: in expansion of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_iwarp_qvlist_info); ^~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/avf/virtchnl.h:647:31: error: enumerator value for ‘virtchnl_static_assert_virtchnl_iwarp_qvlist_info’ is not an integer constant VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_iwarp_qvlist_info); ^~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/avf/virtchnl.h:147:53: note: in definition of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’ { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) } ^ Fix this by adding explicit padding to structures with holes. Reported-by:
<noreply@ellerman.id.au> Signed-off-by:
Geert Uytterhoeven <geert@linux-m68k.org> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Eran Ben Elisha authored
When driver is reloading during recovery flow, it can't get new commands till command interface is up again. Otherwise we may get to null pointer trying to access non initialized command structures. Add cmdif state to avoid processing commands while cmdif is not ready. Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by:
Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by:
Moshe Shemesh <moshe@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Eran Ben Elisha authored
After driver creates (via FW command) an EQ for commands, the driver will be informed on new commands completion by EQE. However, due to a race in driver's internal command mode metadata update, some new commands will still be miss-handled by driver as if we are in polling mode. Such commands can get two non forced completion, leading to already freed command entry access. CREATE_EQ command, that maps EQ to the command queue must be posted to the command queue while it is empty and no other command should be posted. Add SW mechanism that once the CREATE_EQ command is about to be executed, all other commands will return error without being sent to the FW. Allow sending other commands only after successfully changing the driver's internal command mode metadata. We can safely return error to all other commands while creating the command EQ, as all other commands might be sent from the user/application during driver load. Application can rerun them later after driver's load was finished. Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by:
Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by:
Moshe Shemesh <moshe@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Moshe Shemesh authored
When FW response to commands is very slow and all command entries in use are waiting for completion we can have a race where commands can get timeout before they get out of the queue and handled. Timeout completion on uninitialized command will cause releasing command's buffers before accessing it for initialization and then we will get NULL pointer exception while trying access it. It may also cause releasing buffers of another command since we may have timeout completion before even allocating entry index for this command. Add entry handling completion to avoid this race. Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by:
Moshe Shemesh <moshe@mellanox.com> Signed-off-by:
Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
- May 22, 2020
-
-
Eli Cohen authored
Add netif_is_bareudp() so the device can be identified as a bareudp one. Signed-off-by:
Eli Cohen <eli@mellanox.com> Reviewed-by:
Roi Dayan <roid@mellanox.com> Reviewed-by:
Eli Britstein <elibr@mellanox.com> Reviewed-by:
Paul Blakey <paulb@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com>
-
Horatiu Vultur authored
Remove the variable mrp_ring_state from switchdev_attr because is not used anywhere. The ring state is set using SWITCHDEV_OBJ_ID_RING_STATE_MRP. Fixes: c284b545 ("switchdev: mrp: Extend switchdev API to offload MRP") Acked-by:
Ivan Vecera <ivecera@redhat.com> Signed-off-by:
Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Edward Cree authored
Make FLOW_ACTION_HW_STATS_DONT_CARE be all bits, rather than none, so that drivers and __flow_action_hw_stats_check can use simple bitwise checks. Pre-fill all actions with DONT_CARE in flow_rule_alloc(), rather than relying on implicit semantics of zero from kzalloc, so that callers which don't configure action stats themselves (i.e. netfilter) get the correct behaviour by default. Only the kernel's internal API semantics change; the TC uAPI is unaffected. v4: move DONT_CARE setting to flow_rule_alloc() for robustness and simplicity. v3: set DONT_CARE in nft and ct offload. v2: rebased on net-next, removed RFC tags. Signed-off-by:
Edward Cree <ecree@solarflare.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This patch adds nexthop add/del notifiers. To be used by vxlan driver in a later patch. Could possibly be used by switchdev drivers in the future. Signed-off-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
Todays vxlan mac fdb entries can point to multiple remote ips (rdsts) with the sole purpose of replicating broadcast-multicast and unknown unicast packets to those remote ips. E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (E-VPN multihoming is analogous to multi-homed LAG implementations, but with the inter-switch peerlink replaced with a vxlan tunnel). In other words it needs support for mac ecmp. Furthermore, for faster convergence, E-VPN multihoming needs the ability to update fdb ecmp nexthops independent of the fdb entries. New route nexthop API is perfect for this usecase. This patch extends the vxlan fdb code to take a nexthop id pointing to an ecmp nexthop group. Changes include: - New NDA_NH_ID attribute for fdbs - Use the newly added fdb nexthop groups - makes vxlan rdsts and nexthop handling code mutually exclusive - since this is a new use-case and the requirement is for ecmp nexthop groups, the fdb add and update path checks that the nexthop is really an ecmp nexthop group. This check can be relaxed in the future, if we want to introduce replication fdb nexthop groups and allow its use in lieu of current rdst lists. - fdb update requests with nexthop id's only allowed for existing fdb's that have nexthop id's - learning will not override an existing fdb entry with nexthop group - I have wrapped the switchdev offload code around the presence of rdst [1] E-VPN RFC https://tools.ietf.org/html/rfc7432 [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365 [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf Includes a null check fix in vxlan_xmit from Nikolay v2 - Fixed build issue: Reported-by:
kbuild test robot <lkp@intel.com> Signed-off-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
This patch introduces ecmp nexthops and nexthop groups for mac fdb entries. In subsequent patches this is used by the vxlan driver fdb entries. The use case is E-VPN multihoming [1,2,3] which requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (This is analogous to a multi-homed LAG but over vxlan). Changes include new nexthop flag NHA_FDB for nexthops referenced by fdb entries. These nexthops only have ip. This patch includes appropriate checks to avoid routes referencing such nexthops. example: $ip nexthop add id 12 via 172.16.1.2 fdb $ip nexthop add id 13 via 172.16.1.3 fdb $ip nexthop add id 102 group 12/13 fdb $bridge fdb add 02:02:00:00:00:13 dev vxlan1000 nhid 101 self [1] E-VPN https://tools.ietf.org/html/rfc7432 [2] E-VPN VxLAN: https://tools.ietf.org/html/rfc8365 [3] LPC talk with mention of nexthop groups for L2 ecmp http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf v4 - fixed uninitialized variable reported by kernel test robot Reported-by:
kernel test robot <rong.a.chen@intel.com> Signed-off-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by:
David Ahern <dsahern@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Björn Töpel authored
In order to reduce the number of function calls, the struct xsk_buff_pool definition is moved to xsk_buff_pool.h. The functions xp_get_dma(), xp_dma_sync_for_cpu(), xp_dma_sync_for_device(), xp_validate_desc() and various helper functions are explicitly inlined. Further, move xp_get_handle() and xp_release() to xsk.c, to allow for the compiler to perform inlining. rfc->v1: Make sure xp_validate_desc() is inlined for Tx perf. (Maxim) Signed-off-by:
Björn Töpel <bjorn.topel@intel.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-15-bjorn.topel@gmail.com
-
Björn Töpel authored
There are no users of MEM_TYPE_ZERO_COPY. Remove all corresponding code, including the "handle" member of struct xdp_buff. rfc->v1: Fixed spelling in commit message. (Björn) Signed-off-by:
Björn Töpel <bjorn.topel@intel.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-13-bjorn.topel@gmail.com
-
Björn Töpel authored
In order to simplify AF_XDP zero-copy enablement for NIC driver developers, a new AF_XDP buffer allocation API is added. The implementation is based on a single core (single producer/consumer) buffer pool for the AF_XDP UMEM. A buffer is allocated using the xsk_buff_alloc() function, and returned using xsk_buff_free(). If a buffer is disassociated with the pool, e.g. when a buffer is passed to an AF_XDP socket, a buffer is said to be released. Currently, the release function is only used by the AF_XDP internals and not visible to the driver. Drivers using this API should register the XDP memory model with the new MEM_TYPE_XSK_BUFF_POOL type. The API is defined in net/xdp_sock_drv.h. The buffer type is struct xdp_buff, and follows the lifetime of regular xdp_buffs, i.e. the lifetime of an xdp_buff is restricted to a NAPI context. In other words, the API is not replacing xdp_frames. In addition to introducing the API and implementations, the AF_XDP core is migrated to use the new APIs. rfc->v1: Fixed build errors/warnings for m68k and riscv. (kbuild test robot) Added headroom/chunk size getter. (Maxim/Björn) v1->v2: Swapped SoBs. (Maxim) v2->v3: Initialize struct xdp_buff member frame_sz. (Björn) Add API to query the DMA address of a frame. (Maxim) Do DMA sync for CPU till the end of the frame to handle possible growth (frame_sz). (Maxim) Signed-off-by:
Björn Töpel <bjorn.topel@intel.com> Signed-off-by:
Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-6-bjorn.topel@gmail.com
-
Björn Töpel authored
Move the XSK_NEXT_PG_CONTIG_{MASK,SHIFT}, and XDP_UMEM_USES_NEED_WAKEUP defines from xdp_sock.h to the AF_XDP internal xsk.h file. Also, start using the BIT{,_ULL} macro instead of explicit shifts. Signed-off-by:
Björn Töpel <bjorn.topel@intel.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-5-bjorn.topel@gmail.com
-
Magnus Karlsson authored
Move the AF_XDP zero-copy driver interface to its own include file called xdp_sock_drv.h. This, hopefully, will make it more clear for NIC driver implementors to know what functions to use for zero-copy support. v4->v5: Fix -Wmissing-prototypes by include header file. (Jakub) Signed-off-by:
Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-4-bjorn.topel@gmail.com
-
Björn Töpel authored
The XSKMAP is partly implemented by net/xdp/xsk.c. Move xskmap.c from kernel/bpf/ to net/xdp/, which is the logical place for AF_XDP related code. Also, move AF_XDP struct definitions, and function declarations only used by AF_XDP internals into net/xdp/xsk.h. Signed-off-by:
Björn Töpel <bjorn.topel@intel.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-3-bjorn.topel@gmail.com
-
Björn Töpel authored
Calculating the "data_hard_end" for an XDP buffer coming from AF_XDP zero-copy mode, the return value of xsk_umem_xdp_frame_sz() is added to "data_hard_start". Currently, the chunk size of the UMEM is returned by xsk_umem_xdp_frame_sz(). This is not correct, if the fixed UMEM headroom is non-zero. Fix this by returning the chunk_size without the UMEM headroom. Fixes: 2a637c5b ("xdp: For Intel AF_XDP drivers add XDP frame_sz") Signed-off-by:
Björn Töpel <bjorn.topel@intel.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-2-bjorn.topel@gmail.com
-