Skip to content
  1. Jan 15, 2020
  2. Jan 14, 2020
    • Chuck Lever's avatar
      xprtrdma: Fix oops in Receive handler after device removal · 671c450b
      Chuck Lever authored
      
      
      Since v5.4, a device removal occasionally triggered this oops:
      
      Dec  2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
      Dec  2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
      Dec  2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
      Dec  2 17:13:53 manet kernel: PGD 0 P4D 0
      Dec  2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
      Dec  2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G        W         5.4.0-00050-g53717e43af61 #883
      Dec  2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Dec  2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
      Dec  2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
      Dec  2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
      Dec  2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
      Dec  2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
      Dec  2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
      Dec  2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
      Dec  2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
      Dec  2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
      Dec  2 17:13:53 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
      Dec  2 17:13:53 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Dec  2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
      Dec  2 17:13:53 manet kernel: Call Trace:
      Dec  2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
      Dec  2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
      Dec  2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: kthread+0xf4/0xf9
      Dec  2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
      Dec  2 17:13:53 manet kernel: ret_from_fork+0x24/0x30
      
      The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
      is still pointing to the old ib_device, which has been freed. The
      only way that is possible is if this rpcrdma_rep was not destroyed
      by rpcrdma_ia_remove.
      
      Debugging showed that was indeed the case: this rpcrdma_rep was
      still in use by a completing RPC at the time of the device removal,
      and thus wasn't on the rep free list. So, it was not found by
      rpcrdma_reps_destroy().
      
      The fix is to introduce a list of all rpcrdma_reps so that they all
      can be found when a device is removed. That list is used to perform
      only regbuf DMA unmapping, replacing that call to
      rpcrdma_reps_destroy().
      
      Meanwhile, to prevent corruption of this list, I've moved the
      destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
      rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
      not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
      protecting the rb_all_reps list.
      
      Fixes: b0b227f0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      671c450b
    • Chuck Lever's avatar
      xprtrdma: Fix completion wait during device removal · 13cb886c
      Chuck Lever authored
      
      
      I've found that on occasion, "rmmod <dev>" will hang while if an NFS
      is under load.
      
      Ensure that ri_remove_done is initialized only just before the
      transport is woken up to force a close. This avoids the completion
      possibly getting initialized again while the CM event handler is
      waiting for a wake-up.
      
      Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA from under an NFS mount")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      13cb886c
    • Chuck Lever's avatar
      xprtrdma: Fix create_qp crash on device unload · b32b9ed4
      Chuck Lever authored
      
      
      On device re-insertion, the RDMA device driver crashes trying to set
      up a new QP:
      
      Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
      Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
      Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
      Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
      Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
      Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G        W         5.4.0 #852
      Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
      Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
      Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
      Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
      Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
      Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
      Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
      Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
      Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
      Nov 27 16:32:06 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
      Nov 27 16:32:06 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
      Nov 27 16:32:06 manet kernel: Call Trace:
      Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
      Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
      Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
      Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
      Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]
      
      The fix is to copy the qp_init_attr struct that was just created by
      rpcrdma_ep_create() instead of using the one from the previous
      connection instance.
      
      Fixes: 98ef77d1 ("xprtrdma: Send Queue size grows after a reconnect")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      b32b9ed4
  3. Jan 08, 2020
  4. Jan 07, 2020
    • Eric Dumazet's avatar
      vlan: vlan_changelink() should propagate errors · eb8ef2a3
      Eric Dumazet authored
      
      
      Both vlan_dev_change_flags() and vlan_dev_set_egress_priority()
      can return an error. vlan_changelink() should not ignore them.
      
      Fixes: 07b5b17e ("[VLAN]: Use rtnl_link API")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb8ef2a3
    • Eric Dumazet's avatar
      vlan: fix memory leak in vlan_dev_set_egress_priority · 9bbd917e
      Eric Dumazet authored
      
      
      There are few cases where the ndo_uninit() handler might be not
      called if an error happens while device is initialized.
      
      Since vlan_newlink() calls vlan_changelink() before
      trying to register the netdevice, we need to make sure
      vlan_dev_uninit() has been called at least once,
      or we might leak allocated memory.
      
      BUG: memory leak
      unreferenced object 0xffff888122a206c0 (size 32):
        comm "syz-executor511", pid 7124, jiffies 4294950399 (age 32.240s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 61 73 00 00 00 00 00 00 00 00  ......as........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000000eb3bb85>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<000000000eb3bb85>] slab_post_alloc_hook mm/slab.h:586 [inline]
          [<000000000eb3bb85>] slab_alloc mm/slab.c:3320 [inline]
          [<000000000eb3bb85>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549
          [<000000007b99f620>] kmalloc include/linux/slab.h:556 [inline]
          [<000000007b99f620>] vlan_dev_set_egress_priority+0xcc/0x150 net/8021q/vlan_dev.c:194
          [<000000007b0cb745>] vlan_changelink+0xd6/0x140 net/8021q/vlan_netlink.c:126
          [<0000000065aba83a>] vlan_newlink+0x135/0x200 net/8021q/vlan_netlink.c:181
          [<00000000fb5dd7a2>] __rtnl_newlink+0x89a/0xb80 net/core/rtnetlink.c:3305
          [<00000000ae4273a1>] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3363
          [<00000000decab39f>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424
          [<00000000accba4ee>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
          [<00000000319fe20f>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
          [<00000000d51938dc>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
          [<00000000d51938dc>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
          [<00000000e539ac79>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
          [<000000006250c27e>] sock_sendmsg_nosec net/socket.c:639 [inline]
          [<000000006250c27e>] sock_sendmsg+0x54/0x70 net/socket.c:659
          [<00000000e2a156d1>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
          [<000000008c87466e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
          [<00000000110e3054>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
          [<00000000d71077c8>] __do_sys_sendmsg net/socket.c:2426 [inline]
          [<00000000d71077c8>] __se_sys_sendmsg net/socket.c:2424 [inline]
          [<00000000d71077c8>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
      
      Fixe: 07b5b17e ("[VLAN]: Use rtnl_link API")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bbd917e
  5. Jan 06, 2020
    • Xin Long's avatar
      sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY · be7a7729
      Xin Long authored
      
      
      This patch is to fix a memleak caused by no place to free cmd->obj.chunk
      for the unprocessed SCTP_CMD_REPLY. This issue occurs when failing to
      process a cmd while there're still SCTP_CMD_REPLY cmds on the cmd seq
      with an allocated chunk in cmd->obj.chunk.
      
      So fix it by freeing cmd->obj.chunk for each SCTP_CMD_REPLY cmd left on
      the cmd seq when any cmd returns error. While at it, also remove 'nomem'
      label.
      
      Reported-by: default avatar <syzbot+107c4aff5f392bf1517f@syzkaller.appspotmail.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be7a7729
    • Ying Xue's avatar
      tipc: eliminate KMSAN: uninit-value in __tipc_nl_compat_dumpit error · a7869e5f
      Ying Xue authored
      
      
      syzbot found the following crash on:
      =====================================================
      BUG: KMSAN: uninit-value in __nlmsg_parse include/net/netlink.h:661 [inline]
      BUG: KMSAN: uninit-value in nlmsg_parse_deprecated
      include/net/netlink.h:706 [inline]
      BUG: KMSAN: uninit-value in __tipc_nl_compat_dumpit+0x553/0x11e0
      net/tipc/netlink_compat.c:215
      CPU: 0 PID: 12425 Comm: syz-executor062 Not tainted 5.5.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x1c9/0x220 lib/dump_stack.c:118
        kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
        __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245
        __nlmsg_parse include/net/netlink.h:661 [inline]
        nlmsg_parse_deprecated include/net/netlink.h:706 [inline]
        __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215
        tipc_nl_compat_dumpit+0x761/0x910 net/tipc/netlink_compat.c:308
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x444179
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 1b d8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffd2d6409c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000444179
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
      RBP: 00000000006ce018 R08: 0000000000000000 R09: 00000000004002e0
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401e20
      R13: 0000000000401eb0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
        kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
        kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:86
        slab_alloc_node mm/slub.c:2774 [inline]
        __kmalloc_node_track_caller+0xe47/0x11f0 mm/slub.c:4382
        __kmalloc_reserve net/core/skbuff.c:141 [inline]
        __alloc_skb+0x309/0xa50 net/core/skbuff.c:209
        alloc_skb include/linux/skbuff.h:1049 [inline]
        nlmsg_new include/net/netlink.h:888 [inline]
        tipc_nl_compat_dumpit+0x6e4/0x910 net/tipc/netlink_compat.c:301
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      =====================================================
      
      The complaint above occurred because the memory region pointed by attrbuf
      variable was not initialized. To eliminate this warning, we use kcalloc()
      rather than kmalloc_array() to allocate memory for attrbuf.
      
      Reported-by: default avatar <syzbot+b1fd2bf2c89d8407e15f@syzkaller.appspotmail.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7869e5f
    • Pablo Neira Ayuso's avatar
      netfilter: flowtable: add nf_flowtable_time_stamp · fb46f1b7
      Pablo Neira Ayuso authored
      
      
      This patch adds nf_flowtable_time_stamp and updates the existing code to
      use it.
      
      This patch is also implicitly fixing up hardware statistic fetching via
      nf_flow_offload_stats() where casting to u32 is missing. Use
      nf_flow_timeout_delta() to fix this.
      
      Fixes: c29f74e0 ("netfilter: nf_flow_table: hardware offload support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarwenxu <wenxu@ucloud.cn>
      fb46f1b7
  6. Jan 05, 2020
    • Carl Huang's avatar
      net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue · ce57785b
      Carl Huang authored
      
      
      The len used for skb_put_padto is wrong, it need to add len of hdr.
      
      In qrtr_node_enqueue, local variable size_t len is assign with
      skb->len, then skb_push(skb, sizeof(*hdr)) will add skb->len with
      sizeof(*hdr), so local variable size_t len is not same with skb->len
      after skb_push(skb, sizeof(*hdr)).
      
      Then the purpose of skb_put_padto(skb, ALIGN(len, 4)) is to add add
      pad to the end of the skb's data if skb->len is not aligned to 4, but
      unfortunately it use len instead of skb->len, at this line, skb->len
      is 32 bytes(sizeof(*hdr)) more than len, for example, len is 3 bytes,
      then skb->len is 35 bytes(3 + 32), and ALIGN(len, 4) is 4 bytes, so
      __skb_put_padto will do nothing after check size(35) < len(4), the
      correct value should be 36(sizeof(*hdr) + ALIGN(len, 4) = 32 + 4),
      then __skb_put_padto will pass check size(35) < len(36) and add 1 byte
      to the end of skb's data, then logic is correct.
      
      function of skb_push:
      void *skb_push(struct sk_buff *skb, unsigned int len)
      {
      	skb->data -= len;
      	skb->len  += len;
      	if (unlikely(skb->data < skb->head))
      		skb_under_panic(skb, len, __builtin_return_address(0));
      	return skb->data;
      }
      
      function of skb_put_padto
      static inline int skb_put_padto(struct sk_buff *skb, unsigned int len)
      {
      	return __skb_put_padto(skb, len, true);
      }
      
      function of __skb_put_padto
      static inline int __skb_put_padto(struct sk_buff *skb, unsigned int len,
      				  bool free_on_error)
      {
      	unsigned int size = skb->len;
      
      	if (unlikely(size < len)) {
      		len -= size;
      		if (__skb_pad(skb, len, free_on_error))
      			return -ENOMEM;
      		__skb_put(skb, len);
      	}
      	return 0;
      }
      
      Signed-off-by: default avatarCarl Huang <cjhuang@codeaurora.org>
      Signed-off-by: default avatarWen Gong <wgong@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce57785b
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: unbind callbacks from flowtable destroy path · 5acab914
      Pablo Neira Ayuso authored
      
      
      Callback unbinding needs to be done after nf_flow_table_free(),
      otherwise entries are not removed from the hardware.
      
      Update nft_unregister_flowtable_net_hooks() to call
      nf_unregister_net_hook() instead since the commit/abort paths do not
      deal with the callback unbinding anymore.
      
      Add a comment to nft_flowtable_event() to clarify that
      flow_offload_netdev_event() already removes the entries before the
      callback unbinding.
      
      Fixes: 8bb69f3b ("netfilter: nf_tables: add flowtable offload control plane")
      Fixes ff4bf2f4 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarwenxu <wenxu@ucloud.cn>
      5acab914
    • wenxu's avatar
      netfilter: nf_flow_table_offload: fix the nat port mangle. · 73327d47
      wenxu authored
      
      
      Shift on 32-bit word to define the port number depends on the flow
      direction.
      
      Fixes: c29f74e0 ("netfilter: nf_flow_table: hardware offload support")
      Fixes: 7acd9378 ("netfilter: nf_flow_table_offload: Correct memcpy size for flow_overload_mangle()")
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      73327d47
    • wenxu's avatar
      netfilter: nf_flow_table_offload: check the status of dst_neigh · f31ad71c
      wenxu authored
      
      
      It is better to get the dst_neigh with neigh->lock and check the
      nud_state is VALID. If there is not neigh previous, the lookup will
      Create a non NUD_VALID with 00:00:00:00:00:00 mac.
      
      Fixes: c29f74e0 ("netfilter: nf_flow_table: hardware offload support")
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f31ad71c
    • wenxu's avatar
      netfilter: nf_flow_table_offload: fix incorrect ethernet dst address · 1b67e506
      wenxu authored
      
      
      Ethernet destination for original traffic takes the source ethernet address
      in the reply direction. For reply traffic, this takes the source
      ethernet address of the original direction.
      
      Fixes: c29f74e0 ("netfilter: nf_flow_table: hardware offload support")
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1b67e506
    • wenxu's avatar
      netfilter: nft_flow_offload: fix underflow in flowtable reference counter · 8ca79606
      wenxu authored
      
      
      The .deactivate and .activate interfaces already deal with the reference
      counter. Otherwise, this results in spurious "Device is busy" errors.
      
      Fixes: a3c90f7a ("netfilter: nf_tables: flow offload expression")
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8ca79606
  7. Jan 03, 2020
  8. Jan 02, 2020
    • Pengcheng Yang's avatar
      tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK · c9655008
      Pengcheng Yang authored
      
      
      When we receive a D-SACK, where the sequence number satisfies:
      	undo_marker <= start_seq < end_seq <= prior_snd_una
      we consider this is a valid D-SACK and tcp_is_sackblock_valid()
      returns true, then this D-SACK is discarded as "old stuff",
      but the variable first_sack_index is not marked as negative
      in tcp_sacktag_write_queue().
      
      If this D-SACK also carries a SACK that needs to be processed
      (for example, the previous SACK segment was lost), this SACK
      will be treated as a D-SACK in the following processing of
      tcp_sacktag_write_queue(), which will eventually lead to
      incorrect updates of undo_retrans and reordering.
      
      Fixes: fd6dad61 ("[TCP]: Earlier SACK block verification & simplify access to them")
      Signed-off-by: default avatarPengcheng Yang <yangpc@wangsu.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9655008
  9. Dec 31, 2019
    • Taehee Yoo's avatar
      hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() · 04b69426
      Taehee Yoo authored
      
      
      hsr slave interfaces don't have debugfs directory.
      So, hsr_debugfs_rename() shouldn't be called when hsr slave interface name
      is changed.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
          ip link set dummy0 name ap
      
      Splat looks like:
      [21071.899367][T22666] ap: renamed from dummy0
      [21071.914005][T22666] ==================================================================
      [21071.919008][T22666] BUG: KASAN: slab-out-of-bounds in hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.923640][T22666] Read of size 8 at addr ffff88805febcd98 by task ip/22666
      [21071.926941][T22666]
      [21071.927750][T22666] CPU: 0 PID: 22666 Comm: ip Not tainted 5.5.0-rc2+ #240
      [21071.929919][T22666] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [21071.935094][T22666] Call Trace:
      [21071.935867][T22666]  dump_stack+0x96/0xdb
      [21071.936687][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.937774][T22666]  print_address_description.constprop.5+0x1be/0x360
      [21071.939019][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940081][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940949][T22666]  __kasan_report+0x12a/0x16f
      [21071.941758][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.942674][T22666]  kasan_report+0xe/0x20
      [21071.943325][T22666]  hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.944187][T22666]  hsr_netdev_notify+0x1fe/0x9b0 [hsr]
      [21071.945052][T22666]  ? __module_text_address+0x13/0x140
      [21071.945897][T22666]  notifier_call_chain+0x90/0x160
      [21071.946743][T22666]  dev_change_name+0x419/0x840
      [21071.947496][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.948600][T22666]  ? netdev_adjacent_rename_links+0x280/0x280
      [21071.949577][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.950672][T22666]  ? lock_downgrade+0x6e0/0x6e0
      [21071.951345][T22666]  ? do_setlink+0x811/0x2ef0
      [21071.951991][T22666]  do_setlink+0x811/0x2ef0
      [21071.952613][T22666]  ? is_bpf_text_address+0x81/0xe0
      [ ... ]
      
      Reported-by: default avatar <syzbot+9328206518f08318a5fd@syzkaller.appspotmail.com>
      Fixes: 4c2d5e33 ("hsr: rename debugfs file when interface name is changed")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04b69426
    • Davide Caratti's avatar
      net/sched: add delete_empty() to filters and use it in cls_flower · a5b72a08
      Davide Caratti authored
      
      
      Revert "net/sched: cls_u32: fix refcount leak in the error path of
      u32_change()", and fix the u32 refcount leak in a more generic way that
      preserves the semantic of rule dumping.
      On tc filters that don't support lockless insertion/removal, there is no
      need to guard against concurrent insertion when a removal is in progress.
      Therefore, for most of them we can avoid a full walk() when deleting, and
      just decrease the refcount, like it was done on older Linux kernels.
      This fixes situations where walk() was wrongly detecting a non-empty
      filter, like it happened with cls_u32 in the error path of change(), thus
      leading to failures in the following tdc selftests:
      
       6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
       6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
       74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
      
      On cls_flower, and on (future) lockless filters, this check is necessary:
      move all the check_empty() logic in a callback so that each filter
      can have its own implementation. For cls_flower, it's sufficient to check
      if no IDRs have been allocated.
      
      This reverts commit 275c44aa.
      
      Changes since v1:
       - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
         is used, thanks to Vlad Buslov
       - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
       - squash revert and new fix in a single patch, to be nice with bisect
         tests that run tdc on u32 filter, thanks to Dave Miller
      
      Fixes: 275c44aa ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
      Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty")
      Suggested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Suggested-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Tested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5b72a08
    • Cambda Zhu's avatar
      tcp: Fix highest_sack and highest_sack_seq · 85369750
      Cambda Zhu authored
      
      
      >From commit 50895b9d ("tcp: highest_sack fix"), the logic about
      setting tp->highest_sack to the head of the send queue was removed.
      Of course the logic is error prone, but it is logical. Before we
      remove the pointer to the highest sack skb and use the seq instead,
      we need to set tp->highest_sack to NULL when there is no skb after
      the last sack, and then replace NULL with the real skb when new skb
      inserted into the rtx queue, because the NULL means the highest sack
      seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
      the next ACK with sack option will increase tp->reordering unexpectedly.
      
      This patch sets tp->highest_sack to the tail of the rtx queue if
      it's NULL and new data is sent. The patch keeps the rule that the
      highest_sack can only be maintained by sack processing, except for
      this only case.
      
      Fixes: 50895b9d ("tcp: highest_sack fix")
      Signed-off-by: default avatarCambda Zhu <cambda@linux.alibaba.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85369750
  10. Dec 30, 2019
    • Florian Westphal's avatar
      netfilter: arp_tables: init netns pointer in xt_tgchk_param struct · 1b789577
      Florian Westphal authored
      
      
      We get crash when the targets checkentry function tries to make
      use of the network namespace pointer for arptables.
      
      When the net pointer got added back in 2010, only ip/ip6/ebtables were
      changed to initialize it, so arptables has this set to NULL.
      
      This isn't a problem for normal arptables because no existing
      arptables target has a checkentry function that makes use of par->net.
      
      However, direct users of the setsockopt interface can provide any
      target they want as long as its registered for ARP or UNPSEC protocols.
      
      syzkaller managed to send a semi-valid arptables rule for RATEEST target
      which is enough to trigger NULL deref:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      RIP: xt_rateest_tg_checkentry+0x11d/0xb40 net/netfilter/xt_RATEEST.c:109
      [..]
       xt_check_target+0x283/0x690 net/netfilter/x_tables.c:1019
       check_target net/ipv4/netfilter/arp_tables.c:399 [inline]
       find_check_entry net/ipv4/netfilter/arp_tables.c:422 [inline]
       translate_table+0x1005/0x1d70 net/ipv4/netfilter/arp_tables.c:572
       do_replace net/ipv4/netfilter/arp_tables.c:977 [inline]
       do_arpt_set_ctl+0x310/0x640 net/ipv4/netfilter/arp_tables.c:1456
      
      Fixes: add67461 ("netfilter: add struct net * to target parameters")
      Reported-by: default avatar <syzbot+d7358a458d8a81aee898@syzkaller.appspotmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1b789577
  11. Dec 28, 2019
    • Shmulik Ladkani's avatar
      net/sched: act_mirred: Pull mac prior redir to non mac_header_xmit device · 70cf3dc7
      Shmulik Ladkani authored
      
      
      There's no skb_pull performed when a mirred action is set at egress of a
      mac device, with a target device/action that expects skb->data to point
      at the network header.
      
      As a result, either the target device is errornously given an skb with
      data pointing to the mac (egress case), or the net stack receives the
      skb with data pointing to the mac (ingress case).
      
      E.g:
       # tc qdisc add dev eth9 root handle 1: prio
       # tc filter add dev eth9 parent 1: prio 9 protocol ip handle 9 basic \
         action mirred egress redirect dev tun0
      
       (tun0 is a tun device. result: tun0 errornously gets the eth header
        instead of the iph)
      
      Revise the push/pull logic of tcf_mirred_act() to not rely on the
      skb_at_tc_ingress() vs tcf_mirred_act_wants_ingress() comparison, as it
      does not cover all "pull" cases.
      
      Instead, calculate whether the required action on the target device
      requires the data to point at the network header, and compare this to
      whether skb->data points to network header - and make the push/pull
      adjustments as necessary.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarShmulik Ladkani <sladkani@proofpoint.com>
      Tested-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70cf3dc7
  12. Dec 26, 2019
    • Eric Dumazet's avatar
      net_sched: sch_fq: properly set sk->sk_pacing_status · bb3d0b8b
      Eric Dumazet authored
      
      
      If fq_classify() recycles a struct fq_flow because
      a socket structure has been reallocated, we do not
      set sk->sk_pacing_status immediately, but later if the
      flow becomes detached.
      
      This means that any flow requiring pacing (BBR, or SO_MAX_PACING_RATE)
      might fallback to TCP internal pacing, which requires a per-socket
      high resolution timer, and therefore more cpu cycles.
      
      Fixes: 218af599 ("tcp: internal implementation for pacing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb3d0b8b
    • Taehee Yoo's avatar
      hsr: reset network header when supervision frame is created · 3ed0a1d5
      Taehee Yoo authored
      
      
      The supervision frame is L2 frame.
      When supervision frame is created, hsr module doesn't set network header.
      If tap routine is enabled, dev_queue_xmit_nit() is called and it checks
      network_header. If network_header pointer wasn't set(or invalid),
      it resets network_header and warns.
      In order to avoid unnecessary warning message, resetting network_header
      is needed.
      
      Test commands:
          ip netns add nst
          ip link add veth0 type veth peer name veth1
          ip link add veth2 type veth peer name veth3
          ip link set veth1 netns nst
          ip link set veth3 netns nst
          ip link set veth0 up
          ip link set veth2 up
          ip link add hsr0 type hsr slave1 veth0 slave2 veth2
          ip a a 192.168.100.1/24 dev hsr0
          ip link set hsr0 up
          ip netns exec nst ip link set veth1 up
          ip netns exec nst ip link set veth3 up
          ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
          ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
          ip netns exec nst ip link set hsr1 up
          tcpdump -nei veth0
      
      Splat looks like:
      [  175.852292][    C3] protocol 88fb is buggy, dev veth0
      
      Fixes: f421436a ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ed0a1d5
Loading