- Aug 05, 2019
-
-
Nikolay Aleksandrov authored
Most of the bridge device's vlan init bugs come from the fact that its default pvid is created at the wrong time, way too early in ndo_init() before the device is even assigned an ifindex. It introduces a bug when the bridge's dev_addr is added as fdb during the initial default pvid creation the notification has ifindex/NDA_MASTER both equal to 0 (see example below) which really makes no sense for user-space[0] and is wrong. Usually user-space software would ignore such entries, but they are actually valid and will eventually have all necessary attributes. It makes much more sense to send a notification *after* the device has registered and has a proper ifindex allocated rather than before when there's a chance that the registration might still fail or to receive it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush from br_vlan_flush() since that case can no longer happen. At NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything depending why it was called (if called due to NETDEV_REGISTER error it'll still be == 1, otherwise it could be any value changed during the device life time). For the demonstration below a small change to iproute2 for printing all fdb notifications is added, because it contained a workaround not to show entries with ifindex == 0. Command executed while monitoring: $ ip l add br0 type bridge Before (both ifindex and master == 0): $ bridge monitor fdb 36:7e:8a:b3:56:ba dev * vlan 1 master * permanent After (proper br0 ifindex): $ bridge monitor fdb e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER v3: send the correct v2 patch with all changes (stub should return 0) v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in the br_vlan_bridge_event stub when bridge vlans are disabled [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389 Reported-by:
michael-dev <michael-dev@fami-braun.de> Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid") Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN triggers a fallback to TCP if the socket is in state SMC_INIT. But if a nonblocking connect is already started, fallback to TCP is no longer possible, even though the socket may still be in state SMC_INIT. And if a nonblocking connect is already started, a listen() call does not make sense. Reported-by:
<syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com> Fixes: 50717a37 ("net/smc: nonblocking connect rework") Signed-off-by:
Ursula Braun <ubraun@linux.ibm.com> Signed-off-by:
Karsten Graul <kgraul@linux.ibm.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ursula Braun authored
The setsockopts options TCP_NODELAY and TCP_CORK may schedule the tx worker. Make sure the socket is not yet moved into SMC_CLOSED state (for instance by a shutdown SHUT_RDWR call). Reported-by:
<syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com> Reported-by:
<syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com> Fixes: 01d2f7e2 ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Signed-off-by:
Ursula Braun <ubraun@linux.ibm.com> Signed-off-by:
Karsten Graul <kgraul@linux.ibm.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David Ahern authored
The nexthop path in rt6_update_exception_stamp_rt needs to call rcu_read_unlock if it fails to find a fib6_nh match rather than just returning. Fixes: e659ba31 ("ipv6: Handle all fib6_nh in a nexthop in exception handling") Signed-off-by:
David Ahern <dsahern@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Looks like we were slightly overzealous with the shutdown() cleanup. Even though the sock->sk_state can reach CLOSED again, socket->state will not got back to SS_UNCONNECTED once connections is ESTABLISHED. Meaning we will see EISCONN if we try to reconnect, and EINVAL if we try to listen. Only listen sockets can be shutdown() and reused, but since ESTABLISHED sockets can never be re-connected() or used for listen() we don't need to try to clean up the ULP state early. Fixes: 32857cf5 ("net/tls: fix transition through disconnect with close") Signed-off-by:
Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
When generic-XDP was moved to a later processing step by commit 458bf2f2 ("net: core: support XDP generic on stacked devices.") a regression was introduced when using bpf_xdp_adjust_head. The issue is that after this commit the skb->network_header is now changed prior to calling generic XDP and not after. Thus, if the header is changed by XDP (via bpf_xdp_adjust_head), then skb->network_header also need to be updated again. Fix by calling skb_reset_network_header(). Fixes: 458bf2f2 ("net: core: support XDP generic on stacked devices.") Reported-by:
Brandon Cazander <brandon.cazander@multapplied.net> Signed-off-by:
Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Dmytro Linkin authored
Currently init call of all actions (except ipt) init their 'parm' structure as a direct pointer to nla data in skb. This leads to race condition when some of the filter actions were initialized successfully (and were assigned with idr action index that was written directly into nla data), but then were deleted and retried (due to following action module missing or classifier-initiated retry), in which case action init code tries to insert action to idr with index that was assigned on previous iteration. During retry the index can be reused by another action that was inserted concurrently, which causes unintended action sharing between filters. To fix described race condition, save action idr index to temporary stack-allocated variable instead on nla data. Fixes: 0190c1d4 ("net: sched: atomically check-allocate action") Signed-off-by:
Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by:
Vlad Buslov <vladbu@mellanox.com> Acked-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Aug 03, 2019
-
-
Dexuan Cui authored
There is a race condition for an established connection that is being closed by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the 'remove_sock' is false): 1 for the initial value; 1 for the sk being in the bound list; 1 for the sk being in the connected list; 1 for the delayed close_work. After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may* decrease the refcnt to 3. Concurrently, hvs_close_connection() runs in another thread: calls vsock_remove_sock() to decrease the refcnt by 2; call sock_put() to decrease the refcnt to 0, and free the sk; next, the "release_sock(sk)" may hang due to use-after-free. In the above, after hvs_release() finishes, if hvs_close_connection() runs faster than "__vsock_release() -> sock_put(sk)", then there is not any issue, because at the beginning of hvs_close_connection(), the refcnt is still 4. The issue can be resolved if an extra reference is taken when the connection is established. Fixes: a9eeb998 ("hv_sock: Add support for delayed close") Signed-off-by:
Dexuan Cui <decui@microsoft.com> Reviewed-by:
Sunil Muthuswamy <sunilmut@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Aug 01, 2019
-
-
Taras Kondratiuk authored
Commit 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit") broke older tipc tools that use compat interface (e.g. tipc-config from tipcutils package): % tipc-config -p operation not supported The commit started to reject TIPC netlink compat messages that do not have attributes. It is too restrictive because some of such messages are valid (they don't need any arguments): % grep 'tx none' include/uapi/linux/tipc_config.h #define TIPC_CMD_NOOP 0x0000 /* tx none, rx none */ #define TIPC_CMD_GET_MEDIA_NAMES 0x0002 /* tx none, rx media_name(s) */ #define TIPC_CMD_GET_BEARER_NAMES 0x0003 /* tx none, rx bearer_name(s) */ #define TIPC_CMD_SHOW_PORTS 0x0006 /* tx none, rx ultra_string */ #define TIPC_CMD_GET_REMOTE_MNG 0x4003 /* tx none, rx unsigned */ #define TIPC_CMD_GET_MAX_PORTS 0x4004 /* tx none, rx unsigned */ #define TIPC_CMD_GET_NETID 0x400B /* tx none, rx unsigned */ #define TIPC_CMD_NOT_NET_ADMIN 0xC001 /* tx none, rx none */ This patch relaxes the original fix and rejects messages without arguments only if such arguments are expected by a command (reg_type is non zero). Fixes: 2753ca5d ("tipc: fix uninit-value in tipc_nl_compat_doit") Cc: stable@vger.kernel.org Signed-off-by:
Taras Kondratiuk <takondra@cisco.com> Acked-by:
Ying Xue <ying.xue@windriver.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 31, 2019
-
-
Nikolay Aleksandrov authored
When permanent entries were introduced by the commit below, they were exempt from timing out and thus igmp leave wouldn't affect them unless fast leave was enabled on the port which was added before permanent entries existed. It shouldn't matter if fast leave is enabled or not if the user added a permanent entry it shouldn't be deleted on igmp leave. Before: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show $ After: $ echo 1 > /sys/class/net/eth4/brport/multicast_fast_leave $ bridge mdb add dev br0 port eth4 grp 229.1.1.1 permanent $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent < join and leave 229.1.1.1 on eth4 > $ bridge mdb show dev br0 port eth4 grp 229.1.1.1 permanent Fixes: ccb1c31a ("bridge: add flags to distinguish permanent mdb entires") Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 30, 2019
-
-
Arnd Bergmann authored
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in linux-2.5.69 along with hundreds of other commands, but was always broken sincen only the structure is compatible, but the command number is not, due to the size being sizeof(size_t), or at first sizeof(sizeof((struct sockaddr_pppox)), which is different on 64-bit architectures. Guillaume Nault adds: And the implementation was broken until 2016 (see 29e73269 ("pppoe: fix reference counting in PPPoE proxy")), and nobody ever noticed. I should probably have removed this ioctl entirely instead of fixing it. Clearly, it has never been used. Fix it by adding a compat_ioctl handler for all pppoe variants that translates the command number and then calls the regular ioctl function. All other ioctl commands handled by pppoe are compatible between 32-bit and 64-bit, and require compat_ptr() conversion. This should apply to all stable kernels. Acked-by:
Guillaume Nault <g.nault@alphalink.fr> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jon Maloy authored
Our test suite somtimes provokes the following crash: Description of problem: [ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8 [ 1092.605072] PGD 0 P4D 0 [ 1092.607620] Oops: 0000 [#1] SMP PTI [ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1 [ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018 [ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc] [ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7 [ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282 [ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045 [ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000 [ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a [ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000 [ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a [ 1092.692614] FS: 0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000 [ 1092.700702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0 [ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1092.727843] PKRU: 55555554 [ 1092.730556] Call Trace: [ 1092.733010] <IRQ> [ 1092.735034] tipc_sk_filter_rcv+0x7ca/0xb80 [tipc] [ 1092.739828] ? __kmalloc_node_track_caller+0x1cb/0x290 [ 1092.744974] ? dev_hard_start_xmit+0xa5/0x210 [ 1092.749332] tipc_sk_rcv+0x389/0x640 [tipc] [ 1092.753519] tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc] [ 1092.758224] tipc_rcv+0x57a/0xf20 [tipc] [ 1092.762154] ? ktime_get_real_ts64+0x40/0xe0 [ 1092.766432] ? tpacket_rcv+0x50/0x9f0 [ 1092.770098] tipc_l2_rcv_msg+0x4a/0x70 [tipc] [ 1092.774452] __netif_receive_skb_core+0xb62/0xbd0 [ 1092.779164] ? enqueue_entity+0xf6/0x630 [ 1092.783084] ? kmem_cache_alloc+0x158/0x1c0 [ 1092.787272] ? __build_skb+0x25/0xd0 [ 1092.790849] netif_receive_skb_internal+0x42/0xf0 [ 1092.795557] napi_gro_receive+0xba/0xe0 [ 1092.799417] mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core] [ 1092.804564] mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core] [ 1092.809536] mlx5e_napi_poll+0xb2/0xce0 [mlx5_core] [ 1092.814415] ? __wake_up_common_lock+0x89/0xc0 [ 1092.818861] net_rx_action+0x149/0x3b0 [ 1092.822616] __do_softirq+0xe3/0x30a [ 1092.826193] irq_exit+0x100/0x110 [ 1092.829512] do_IRQ+0x85/0xd0 [ 1092.832483] common_interrupt+0xf/0xf [ 1092.836147] </IRQ> [ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0 [ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f [ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd [ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f [ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000 [ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940 [ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78 [ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000 [ 1092.905196] ? cpuidle_enter_state+0x92/0x2a0 [ 1092.909555] do_idle+0x236/0x280 [ 1092.912785] cpu_startup_entry+0x6f/0x80 [ 1092.916715] start_secondary+0x1a7/0x200 [ 1092.920642] secondary_startup_64+0xb7/0xc0 [...] The reason is that the skb list tipc_socket::mc_method.deferredq only is initialized for connectionless sockets, while nothing stops arriving multicast messages from being filtered by connection oriented sockets, with subsequent access to the said list. We fix this by initializing the list unconditionally at socket creation. This eliminates the crash, while the message still is dropped further down in tipc_sk_filter_rcv() as it should be. Reported-by:
Li Shuang <shuali@redhat.com> Signed-off-by:
Jon Maloy <jon.maloy@ericsson.com> Reviewed-by:
Xin Long <lucien.xin@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David Howells authored
Fix the fact that a notification isn't sent to the recvmsg side to indicate a call failed when sendmsg() fails to transmit a DATA packet with the error ENETUNREACH, EHOSTUNREACH or ECONNREFUSED. Without this notification, the afs client just sits there waiting for the call to complete in some manner (which it's not now going to do), which also pins the rxrpc call in place. This can be seen if the client has a scope-level IPv6 address, but not a global-level IPv6 address, and we try and transmit an operation to a server's IPv6 address. Looking in /proc/net/rxrpc/calls shows completed calls just sat there with an abort code of RX_USER_ABORT and an error code of -ENETUNREACH. Fixes: c54e43d7 ("rxrpc: Fix missing start of call timeout") Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Marc Dionne <marc.dionne@auristor.com> Reviewed-by:
Jeffrey Altman <jaltman@auristor.com>
-
David Howells authored
There is a potential deadlock in rxrpc_peer_keepalive_dispatch() whereby rxrpc_put_peer() is called with the peer_hash_lock held, but if it reduces the peer's refcount to 0, rxrpc_put_peer() calls __rxrpc_put_peer() - which the tries to take the already held lock. Fix this by providing a version of rxrpc_put_peer() that can be called in situations where the lock is already held. The bug may produce the following lockdep report: ============================================ WARNING: possible recursive locking detected 5.2.0-next-20190718 #41 Not tainted -------------------------------------------- kworker/0:3/21678 is trying to acquire lock: 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh /./include/linux/spinlock.h:343 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: __rxrpc_put_peer /net/rxrpc/peer_object.c:415 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_put_peer+0x2d3/0x6a0 /net/rxrpc/peer_object.c:435 but task is already holding lock: 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: spin_lock_bh /./include/linux/spinlock.h:343 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_peer_keepalive_dispatch /net/rxrpc/peer_event.c:378 [inline] 00000000aa5eecdf (&(&rxnet->peer_hash_lock)->rlock){+.-.}, at: rxrpc_peer_keepalive_worker+0x6b3/0xd02 /net/rxrpc/peer_event.c:430 Fixes: 330bdcfa ("rxrpc: Fix the keepalive generator [ver #2]") Reported-by:
<syzbot+72af434e4b3417318f84@syzkaller.appspotmail.com> Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Marc Dionne <marc.dionne@auristor.com> Reviewed-by:
Jeffrey Altman <jaltman@auristor.com>
-
Johannes Berg authored
Revert this for now, it has been reported multiple times that it completely breaks connectivity on various devices. Cc: stable@vger.kernel.org Fixes: 8dbb000e ("mac80211: set NETIF_F_LLTX when using intermediate tx queues") Reported-by:
Jean Delvare <jdelvare@suse.de> Reported-by:
Peter Lebbing <peter@digitalbrains.com> Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Florian Westphal authored
ebtables doesn't include the base chain policies in the rule count, so we need to add them manually when we call into the x_tables core to allocate space for the comapt offset table. This lead syzbot to trigger: WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649 xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649 Reported-by:
<syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com> Fixes: 2035f3ff ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present") Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- Jul 29, 2019
-
-
Enrico Weigelt authored
IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by:
Enrico Weigelt <info@metux.net> Acked-by:
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jozsef Kadlecsik authored
Shijie Luo reported that when stress-testing ipset with multiple concurrent create, rename, flush, list, destroy commands, it can result ipset <version>: Broken LIST kernel message: missing DATA part! error messages and broken list results. The problem was the rename operation was not properly handled with respect of listing. The patch fixes the issue. Reported-by:
Shijie Luo <luoshijie1@huawei.com> Signed-off-by:
Jozsef Kadlecsik <kadlec@netfilter.org>
-
Stefano Brivio authored
In commit 8cc4ccf5 ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the KADT functions for sets matching on MAC addreses the copy of source or destination MAC address depending on the configured match. This was done correctly for hash:mac, but for hash:ip,mac and bitmap:ip,mac, copying and pasting the same code block presents an obvious problem: in these two set types, the MAC address is the second dimension, not the first one, and we are actually selecting the MAC address depending on whether the first dimension (IP address) specifies source or destination. Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags. This way, mixing source and destination matches for the two dimensions of ip,mac set types works as expected. With this setup: ip netns add A ip link add veth1 type veth peer name veth2 netns A ip addr add 192.0.2.1/24 dev veth1 ip -net A addr add 192.0.2.2/24 dev veth2 ip link set veth1 up ip -net A link set veth2 up dst=$(ip netns exec A cat /sys/class/net/veth2/address) ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16 ip netns exec A ipset add test_bitmap 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP ip netns exec A ipset create test_hash hash:ip,mac ip netns exec A ipset add test_hash 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP ipset correctly matches a test packet: # ping -c1 192.0.2.2 >/dev/null # echo $? 0 Reported-by:
Chen Yi <yiche@redhat.com> Fixes: 8cc4ccf5 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by:
Stefano Brivio <sbrivio@redhat.com> Signed-off-by:
Jozsef Kadlecsik <kadlec@netfilter.org>
-
Stefano Brivio authored
In commit 8cc4ccf5 ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the KADT check that prevents matching on destination MAC addresses for hash:mac sets, but forgot to remove the same check for hash:ip,mac set. Drop this check: functionality is now commented in man pages and there's no reason to restrict to source MAC address matching anymore. Reported-by:
Chen Yi <yiche@redhat.com> Fixes: 8cc4ccf5 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by:
Stefano Brivio <sbrivio@redhat.com> Signed-off-by:
Jozsef Kadlecsik <kadlec@netfilter.org>
-
Jiri Pirko authored
Commit aca51397 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.") introduced a possibility to hit a BUG in case device is returning back to init_net and two following conditions are met: 1) dev->ifindex value is used in a name of another "dev%d" device in init_net. 2) dev->name is used by another device in init_net. Under real life circumstances this is hard to get. Therefore this has been present happily for over 10 years. To reproduce: $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff 3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff $ ip netns add ns1 $ ip -n ns1 link add dummy1ns1 type dummy $ ip -n ns1 link add dummy2ns1 type dummy $ ip link set enp0s2 netns ns1 $ ip -n ns1 link set enp0s2 name dummy0 [ 100.858894] virtio_net virtio0 dummy0: renamed from enp0s2 $ ip link add dev4 type dummy $ ip -n ns1 a 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff 3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff 4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff 4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff $ ip netns del ns1 [ 158.717795] default_device_exit: failed to move dummy0 to init_net: -17 [ 158.719316] ------------[ cut here ]------------ [ 158.720591] kernel BUG at net/core/dev.c:9824! [ 158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI [ 158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18 [ 158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 158.727508] Workqueue: netns cleanup_net [ 158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f [ 158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e [ 158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282 [ 158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000 [ 158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64 [ 158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c [ 158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000 [ 158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72 [ 158.750638] FS: 0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000 [ 158.752944] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0 [ 158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 158.762758] Call Trace: [ 158.763882] ? dev_change_net_namespace+0xbb0/0xbb0 [ 158.766148] ? devlink_nl_cmd_set_doit+0x520/0x520 [ 158.768034] ? dev_change_net_namespace+0xbb0/0xbb0 [ 158.769870] ops_exit_list.isra.0+0xa8/0x150 [ 158.771544] cleanup_net+0x446/0x8f0 [ 158.772945] ? unregister_pernet_operations+0x4a0/0x4a0 [ 158.775294] process_one_work+0xa1a/0x1740 [ 158.776896] ? pwq_dec_nr_in_flight+0x310/0x310 [ 158.779143] ? do_raw_spin_lock+0x11b/0x280 [ 158.780848] worker_thread+0x9e/0x1060 [ 158.782500] ? process_one_work+0x1740/0x1740 [ 158.784454] kthread+0x31b/0x420 [ 158.786082] ? __kthread_create_on_node+0x3f0/0x3f0 [ 158.788286] ret_from_fork+0x3a/0x50 [ 158.789871] ---[ end trace defd6c657c71f936 ]--- [ 158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f [ 158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e [ 158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282 [ 158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000 [ 158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64 [ 158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c [ 158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000 [ 158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72 [ 158.829899] FS: 0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000 [ 158.834923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0 [ 158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fix this by checking if a device with the same name exists in init_net and fallback to original code - dev%d to allocate name - in case it does. This was found using syzkaller. Fixes: aca51397 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.") Signed-off-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
Mark switch cases where we are expecting to fall through. This patch fixes the following warnings: net/iucv/af_iucv.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 537:3, 519:6, 2246:6, 510:6 Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. Reported-by:
Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
On initialization failure we have to delete the local fdb which was inserted due to the default pvid creation. This problem has been present since the inception of default_pvid. Note that currently there are 2 cases: 1) in br_dev_init() when br_multicast_init() fails 2) if register_netdevice() fails after calling ndo_init() This patch takes care of both since br_vlan_flush() is called on both occasions. Also the new fdb delete would be a no-op on normal bridge device destruction since the local fdb would've been already flushed by br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is called last when adding a port thus nothing can fail after it. Reported-by:
<syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com> Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid") Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jia-Ju Bai authored
In dequeue_func(), there is an if statement on line 74 to check whether skb is NULL: if (skb) When skb is NULL, it is used on line 77: prefetch(&skb->end); Thus, a possible null-pointer dereference may occur. To fix this bug, skb->end is used when skb is not NULL. This bug is found by a static analysis tool STCheck written by us. Fixes: 76e3cc12 ("codel: Controlled Delay AQM") Signed-off-by:
Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by:
Jiri Pirko <jiri@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Brian Norris authored
In a very similar spirit to commit c470bdc1 ("mac80211: don't WARN on bad WMM parameters from buggy APs"), an AP may not transmit a fully-formed WMM IE. For example, it may miss or repeat an Access Category. The above loop won't catch that and will instead leave one of the four ACs zeroed out. This triggers the following warning in drv_conf_tx() wlan0: invalid CW_min/CW_max: 0/0 and it may leave one of the hardware queues unconfigured. If we detect such a case, let's just print a warning and fall back to the defaults. Tested with a hacked version of hostapd, intentionally corrupting the IEs in hostapd_eid_wmm(). Cc: stable@vger.kernel.org Signed-off-by:
Brian Norris <briannorris@chromium.org> Link: https://lore.kernel.org/r/20190726224758.210953-1-briannorris@chromium.org Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
- Jul 27, 2019
-
-
Jia-Ju Bai authored
In rds_rdma_cm_event_handler_cmn(), there are some if statements to check whether conn is NULL, such as on lines 65, 96 and 112. But conn is not checked before being used on line 108: trans->cm_connect_complete(conn, event); and on lines 140-143: rdsdebug("DISCONNECT event - dropping connection " "%pI6c->%pI6c\n", &conn->c_laddr, &conn->c_faddr); rds_conn_drop(conn); Thus, possible null-pointer dereferences may occur. To fix these bugs, conn is checked before being used. These bugs are found by a static analysis tool STCheck written by us. Signed-off-by:
Jia-Ju Bai <baijiaju1990@gmail.com> Acked-by:
Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 26, 2019
-
-
Haishuang Yan authored
ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which can cause a possible use-after-free accessing iph/ipv6h pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb. Fixes: 0e9a7095 ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets") Signed-off-by:
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Manikanta Pubbisetty authored
Commit 33d915d9 ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices") has introduced a change which allows 4addr operation on crypto controlled devices (ex: ath10k). This change has inadvertently impacted the interface combinations logic on such devices. General rule is that software interfaces like AP/VLAN should not be listed under supported interface combinations and should not be considered during validation of these combinations; because of the aforementioned change, AP/VLAN interfaces(if present) will be checked against interfaces supported by the device and blocks valid interface combinations. Consider a case where an AP and AP/VLAN are up and running; when a second AP device is brought up on the same physical device, this AP will be checked against the AP/VLAN interface (which will not be part of supported interface combinations of the device) and blocks second AP to come up. Add a new API cfg80211_iftype_allowed() to fix the problem, this API works for all devices with/without SW crypto control. Signed-off-by:
Manikanta Pubbisetty <mpubbise@codeaurora.org> Fixes: 33d915d9 ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices") Link: https://lore.kernel.org/r/1563779690-9716-1-git-send-email-mpubbise@codeaurora.org Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Haishuang Yan authored
We need the same checks introduced by commit cb9f1b78 ("ip: validate header length on virtual device xmit") for ipip tunnel. Fixes: cb9f1b78 ("ip: validate header length on virtual device xmit") Signed-off-by:
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 25, 2019
-
-
Cong Wang authored
act_ife at least requires TCA_IFE_PARMS, so we have to bail out when there is no attribute passed in. Reported-by:
<syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com> Fixes: ef6980b6 ("introduce IFE action") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
The label is used just once and the code it points at is not reused, no point in keeping it. Signed-off-by:
Phil Sutter <phil@nwl.cc> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Phil Sutter authored
nft_meta_get_eval()'s tendency to bail out setting NFT_BREAK verdict in situations where required data is missing leads to unexpected behaviour with inverted checks like so: | meta iifname != eth0 accept This rule will never match if there is no input interface (or it is not known) which is not intuitive and, what's worse, breaks consistency of iptables-nft with iptables-legacy. Fix this by falling back to placing a value in dreg which never matches (avoiding accidental matches), i.e. zero for interface index and an empty string for interface name. Signed-off-by:
Phil Sutter <phil@nwl.cc> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- Jul 24, 2019
-
-
Cong Wang authored
sock_efree() releases the sock refcnt, if we don't hold this refcnt when setting skb->destructor to it, the refcnt would not be balanced. This leads to several bug reports from syzbot. I have checked other users of sock_efree(), all of them hold the sock refcnt. Fixes: c8c8218e ("netrom: fix a memory leak in nr_rx_frame()") Reported-and-tested-by:
<syzbot+622bdabb128acc33427d@syzkaller.appspotmail.com> Reported-and-tested-by:
<syzbot+6eaef7158b19e3fec3a0@syzkaller.appspotmail.com> Reported-and-tested-by:
<syzbot+9399c158fcc09b21d0d2@syzkaller.appspotmail.com> Reported-and-tested-by:
<syzbot+a34e5f3d0300163f0c87@syzkaller.appspotmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
Some functions in the datapath code are factored out so that each one has a stack frame smaller than 1024 bytes with gcc. However, when compiling with clang, the functions are inlined more aggressively and combined again so we get net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=] Marking both get_flow_actions() and ovs_nla_init_match_and_action() as 'noinline_for_stack' gives us the same behavior that we see with gcc, and no warning. Note that this does not mean we actually use less stack, as the functions call each other, and we still get three copies of the large 'struct sw_flow_key' type on the stack. The comment tells us that this was previously considered safe, presumably since the netlink parsing functions are called with a known backchain that does not also use a lot of stack space. Fixes: 9cc9a5cb ("datapath: Avoid using stack larger than 1024.") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Haishuang Yan authored
Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull() which may change skb->data, so we need to re-load ipv6h at the right place. Fixes: 898b2979 ("ip6_gre: Refactor ip6gre xmit codes") Cc: William Tu <u9012063@gmail.com> Signed-off-by:
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by:
William Tu <u9012063@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Pavel Machek authored
Cleanup testing for error condition. Signed-off-by:
Pavel Machek <pavel@denx.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
YueHaibing authored
This patch add error path for cgw_module_init to avoid possible crash if some error occurs. Fixes: c1aabdf3 ("can-gw: add netlink based CAN routing") Signed-off-by:
YueHaibing <yuehaibing@huawei.com> Acked-by:
Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by:
Marc Kleine-Budde <mkl@pengutronix.de>
-
- Jul 23, 2019
-
-
Eric Dumazet authored
It is possible we reach bpf_convert_ctx_access() with si->dst_reg == si->src_reg Therefore, we need to load BPF_REG_AX before eventually mangling si->src_reg. syzbot generated this x86 code : 3: 55 push %rbp 4: 48 89 e5 mov %rsp,%rbp 7: 48 81 ec 00 00 00 00 sub $0x0,%rsp // Might be avoided ? e: 53 push %rbx f: 41 55 push %r13 11: 41 56 push %r14 13: 41 57 push %r15 15: 6a 00 pushq $0x0 17: 31 c0 xor %eax,%eax 19: 48 8b bf c0 00 00 00 mov 0xc0(%rdi),%rdi 20: 44 8b 97 bc 00 00 00 mov 0xbc(%rdi),%r10d 27: 4c 01 d7 add %r10,%rdi 2a: 48 0f b7 7f 06 movzwq 0x6(%rdi),%rdi // Crash 2f: 5b pop %rbx 30: 41 5f pop %r15 32: 41 5e pop %r14 34: 41 5d pop %r13 36: 5b pop %rbx 37: c9 leaveq 38: c3 retq Fixes: d9ff286a ("bpf: allow BPF programs access skb_shared_info->gso_segs field") Signed-off-by:
Eric Dumazet <edumazet@google.com> Reported-by:
syzbot <syzkaller@googlegroups.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org>
-
- Jul 22, 2019
-
-
John Fastabend authored
When a map free is called and in parallel a socket is closed we have two paths that can potentially reset the socket prot ops, the bpf close() path and the map free path. This creates a problem with which prot ops should be used from the socket closed side. If the map_free side completes first then we want to call the original lowest level ops. However, if the tls path runs first we want to call the sockmap ops. Additionally there was no locking around prot updates in TLS code paths so the prot ops could be changed multiple times once from TLS path and again from sockmap side potentially leaving ops pointed at either TLS or sockmap when psock and/or tls context have already been destroyed. To fix this race first only update ops inside callback lock so that TLS, sockmap and lowest level all agree on prot state. Second and a ULP callback update() so that lower layers can inform the upper layer when they are being removed allowing the upper layer to reset prot ops. This gets us close to allowing sockmap and tls to be stacked in arbitrary order but will save that patch for *next trees. v4: - make sure we don't free things for device; - remove the checks which swap the callbacks back only if TLS is at the top. Reported-by:
<syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com> Fixes: 02c558b2 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress") Signed-off-by:
John Fastabend <john.fastabend@gmail.com> Signed-off-by:
Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by:
Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net>
-
John Fastabend authored
Sockmap does not currently support adding sockets after TLS has been enabled. There never was a real use case for this so it was never added. But, we lost the test for ULP at some point so add it here and fail the socket insert if TLS is enabled. Future work could make sockmap support this use case but fixup the bug here. Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by:
John Fastabend <john.fastabend@gmail.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net>
-