Skip to content
  1. Aug 22, 2020
    • Nikolay Aleksandrov's avatar
      net: nexthop: don't allow empty NHA_GROUP · eeaac363
      Nikolay Aleksandrov authored
      
      
      Currently the nexthop code will use an empty NHA_GROUP attribute, but it
      requires at least 1 entry in order to function properly. Otherwise we
      end up derefencing null or random pointers all over the place due to not
      having any nh_grp_entry members allocated, nexthop code relies on having at
      least the first member present. Empty NHA_GROUP doesn't make any sense so
      just disallow it.
      Also add a WARN_ON for any future users of nexthop_create_group().
      
       BUG: kernel NULL pointer dereference, address: 0000000000000080
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] SMP
       CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       RIP: 0010:fib_check_nexthop+0x4a/0xaa
       Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
       RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
       RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
       RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
       RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
       R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
       R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
       FS:  00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
       Call Trace:
        fib_create_info+0x64d/0xaf7
        fib_table_insert+0xf6/0x581
        ? __vma_adjust+0x3b6/0x4d4
        inet_rtm_newroute+0x56/0x70
        rtnetlink_rcv_msg+0x1e3/0x20d
        ? rtnl_calcit.isra.0+0xb8/0xb8
        netlink_rcv_skb+0x5b/0xac
        netlink_unicast+0xfa/0x17b
        netlink_sendmsg+0x334/0x353
        sock_sendmsg_nosec+0xf/0x3f
        ____sys_sendmsg+0x1a0/0x1fc
        ? copy_msghdr_from_user+0x4c/0x61
        ___sys_sendmsg+0x63/0x84
        ? handle_mm_fault+0xa39/0x11b5
        ? sockfd_lookup_light+0x72/0x9a
        __sys_sendmsg+0x50/0x6e
        do_syscall_64+0x54/0xbe
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f10dacc0bb7
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
       RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
       RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
       RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
       R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
       Modules linked in:
       CR2: 0000000000000080
      
      CC: David Ahern <dsahern@gmail.com>
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Reported-by: default avatar <syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeaac363
  2. Aug 20, 2020
  3. Aug 19, 2020
  4. Aug 18, 2020
  5. Aug 17, 2020
  6. Aug 16, 2020
  7. Aug 15, 2020
  8. Aug 14, 2020
  9. Aug 13, 2020
    • Tonghao Zhang's avatar
      net: openvswitch: introduce common code for flushing flows · 1f3a090b
      Tonghao Zhang authored
      
      
      To avoid some issues, for example RCU usage warning and double free,
      we should flush the flows under ovs_lock. This patch refactors
      table_instance_destroy and introduces table_instance_flow_flush
      which can be invoked by __dp_destroy or ovs_flow_tbl_flush.
      
      Fixes: 50b0e61b ("net: openvswitch: fix possible memleak on destroy flow-table")
      Reported-by: default avatarJohan Knöös <jknoos@google.com>
      Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html
      
      
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Reviewed-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f3a090b
    • John Ogness's avatar
      af_packet: TPACKET_V3: fix fill status rwlock imbalance · 88fd1cb8
      John Ogness authored
      
      
      After @blk_fill_in_prog_lock is acquired there is an early out vnet
      situation that can occur. In that case, the rwlock needs to be
      released.
      
      Also, since @blk_fill_in_prog_lock is only acquired when @tp_version
      is exactly TPACKET_V3, only release it on that exact condition as
      well.
      
      And finally, add sparse annotation so that it is clearer that
      prb_fill_curr_block() and prb_clear_blk_fill_status() are acquiring
      and releasing @blk_fill_in_prog_lock, respectively. sparse is still
      unable to understand the balance, but the warnings are now on a
      higher level that make more sense.
      
      Fixes: 632ca50f ("af_packet: TPACKET_V3: replace busy-wait loop")
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88fd1cb8
    • John Fastabend's avatar
      bpf: sock_ops sk access may stomp registers when dst_reg = src_reg · 84f44df6
      John Fastabend authored
      
      
      Similar to patch ("bpf: sock_ops ctx access may stomp registers") if the
      src_reg = dst_reg when reading the sk field of a sock_ops struct we
      generate xlated code,
      
        53: (61) r9 = *(u32 *)(r9 +28)
        54: (15) if r9 == 0x0 goto pc+3
        56: (79) r9 = *(u64 *)(r9 +0)
      
      This stomps on the r9 reg to do the sk_fullsock check and then when
      reading the skops->sk field instead of the sk pointer we get the
      sk_fullsock. To fix use similar pattern noted in the previous fix
      and use the temp field to save/restore a register used to do
      sk_fullsock check.
      
      After the fix the generated xlated code reads,
      
        52: (7b) *(u64 *)(r9 +32) = r8
        53: (61) r8 = *(u32 *)(r9 +28)
        54: (15) if r9 == 0x0 goto pc+3
        55: (79) r8 = *(u64 *)(r9 +32)
        56: (79) r9 = *(u64 *)(r9 +0)
        57: (05) goto pc+1
        58: (79) r8 = *(u64 *)(r9 +32)
      
      Here r9 register was in-use so r8 is chosen as the temporary register.
      In line 52 r8 is saved in temp variable and at line 54 restored in case
      fullsock != 0. Finally we handle fullsock == 0 case by restoring at
      line 58.
      
      This adds a new macro SOCK_OPS_GET_SK it is almost possible to merge
      this with SOCK_OPS_GET_FIELD, but I found the extra branch logic a
      bit more confusing than just adding a new macro despite a bit of
      duplicating code.
      
      Fixes: 1314ef56 ("bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159718349653.4728.6559437186853473612.stgit@john-Precision-5820-Tower
      84f44df6
    • John Fastabend's avatar
      bpf: sock_ops ctx access may stomp registers in corner case · fd09af01
      John Fastabend authored
      I had a sockmap program that after doing some refactoring started spewing
      this splat at me:
      
      [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      [...]
      [18610.807359] Call Trace:
      [18610.807370]  ? 0xffffffffc114d0d5
      [18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
      [18610.807391]  tcp_connect+0x895/0xd50
      [18610.807400]  tcp_v4_connect+0x465/0x4e0
      [18610.807407]  __inet_stream_connect+0xd6/0x3a0
      [18610.807412]  ? __inet_stream_connect+0x5/0x3a0
      [18610.807417]  inet_stream_connect+0x3b/0x60
      [18610.807425]  __sys_connect+0xed/0x120
      
      After some debugging I was able to build this simple reproducer,
      
       __section("sockops/reproducer_bad")
       int bpf_reproducer_bad(struct bpf_sock_ops *skops)
       {
              volatile __maybe_unused __u32 i = skops->snd_ssthresh;
              return 0;
       }
      
      And along the way noticed that below program ran without splat,
      
      __section("sockops/reproducer_good")
      int bpf_reproducer_good(struct bpf_sock_ops *skops)
      {
              volatile __maybe_unused __u32 i = skops->snd_ssthresh;
              volatile __maybe_unused __u32 family;
      
              compiler_barrier();
      
              family = skops->family;
              return 0;
      }
      
      So I decided to check out the code we generate for the above two
      programs and noticed each generates the BPF code you would expect,
      
      0000000000000000 <bpf_reproducer_bad>:
      ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
             0:       r1 = *(u32 *)(r1 + 96)
             1:       *(u32 *)(r10 - 4) = r1
      ;       return 0;
             2:       r0 = 0
             3:       exit
      
      0000000000000000 <bpf_reproducer_good>:
      ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
             0:       r2 = *(u32 *)(r1 + 96)
             1:       *(u32 *)(r10 - 4) = r2
      ;       family = skops->family;
             2:       r1 = *(u32 *)(r1 + 20)
             3:       *(u32 *)(r10 - 8) = r1
      ;       return 0;
             4:       r0 = 0
             5:       exit
      
      So we get reasonable assembly, but still something was causing the null
      pointer dereference. So, we load the programs and dump the xlated version
      observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
      translated by the skops access helpers.
      
      int bpf_reproducer_bad(struct bpf_sock_ops * skops):
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         0: (61) r1 = *(u32 *)(r1 +28)
         1: (15) if r1 == 0x0 goto pc+2
         2: (79) r1 = *(u64 *)(r1 +0)
         3: (61) r1 = *(u32 *)(r1 +2340)
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         4: (63) *(u32 *)(r10 -4) = r1
      ; return 0;
         5: (b7) r0 = 0
         6: (95) exit
      
      int bpf_reproducer_good(struct bpf_sock_ops * skops):
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         0: (61) r2 = *(u32 *)(r1 +28)
         1: (15) if r2 == 0x0 goto pc+2
         2: (79) r2 = *(u64 *)(r1 +0)
         3: (61) r2 = *(u32 *)(r2 +2340)
      ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
         4: (63) *(u32 *)(r10 -4) = r2
      ; family = skops->family;
         5: (79) r1 = *(u64 *)(r1 +0)
         6: (69) r1 = *(u16 *)(r1 +16)
      ; family = skops->family;
         7: (63) *(u32 *)(r10 -8) = r1
      ; return 0;
         8: (b7) r0 = 0
         9: (95) exit
      
      Then we look at lines 0 and 2 above. In the good case we do the zero
      check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
      into the bpf_sock_ops check and we can confirm that is the 'struct
      sock *sk' pointer field. But, in the bad case,
      
         0: (61) r1 = *(u32 *)(r1 +28)
         1: (15) if r1 == 0x0 goto pc+2
         2: (79) r1 = *(u64 *)(r1 +0)
      
      Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
      line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
      
      [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
      
      The 0x01 makes sense because that is exactly the fullsock value. And
      its not a valid dereference so we splat.
      
      To fix we need to guard the case when a program is doing a sock_ops field
      access with src_reg == dst_reg. This is already handled in the load case
      where the ctx_access handler uses a tmp register being careful to
      store the old value and restore it. To fix the get case test if
      src_reg == dst_reg and in this case do the is_fullsock test in the
      temporary register. Remembering to restore the temporary register before
      writing to either dst_reg or src_reg to avoid smashing the pointer into
      the struct holding the tmp variable.
      
      Adding this inline code to test_tcpbpf_kern will now be generated
      correctly from,
      
        9: r2 = *(u32 *)(r2 + 96)
      
      to xlated code,
      
        12: (7b) *(u64 *)(r2 +32) = r9
        13: (61) r9 = *(u32 *)(r2 +28)
        14: (15) if r9 == 0x0 goto pc+4
        15: (79) r9 = *(u64 *)(r2 +32)
        16: (79) r2 = *(u64 *)(r2 +0)
        17: (61) r2 = *(u32 *)(r2 +2348)
        18: (05) goto pc+1
        19: (79) r9 = *(u64 *)(r2 +32)
      
      And in the normal case we keep the original code, because really this
      is an edge case. From this,
      
        9: r2 = *(u32 *)(r6 + 96)
      
      to xlated code,
      
        22: (61) r2 = *(u32 *)(r6 +28)
        23: (15) if r2 == 0x0 goto pc+2
        24: (79) r2 = *(u64 *)(r6 +0)
        25: (61) r2 = *(u32 *)(r2 +2348)
      
      So three additional instructions if dst == src register, but I scanned
      my current code base and did not see this pattern anywhere so should
      not be a big deal. Further, it seems no one else has hit this or at
      least reported it so it must a fairly rare pattern.
      
      Fixes: 9b1f3d6e
      
       ("bpf: Refactor sock_ops_convert_ctx_access")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/159718347772.4728.2781381670567919577.stgit@john-Precision-5820-Tower
      fd09af01
    • Florian Westphal's avatar
      netfilter: nf_tables: free chain context when BINDING flag is missing · 59136aa3
      Florian Westphal authored
      
      
      syzbot found a memory leak in nf_tables_addchain() because the chain
      object is not free'd correctly on error.
      
      Fixes: d0e2c7de ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
      Reported-by: default avatar <syzbot+c99868fde67014f7e9f5@syzkaller.appspotmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      59136aa3
    • Florian Westphal's avatar
      netfilter: avoid ipv6 -> nf_defrag_ipv6 module dependency · 2404b73c
      Florian Westphal authored
      
      
      nf_ct_frag6_gather is part of nf_defrag_ipv6.ko, not ipv6 core.
      
      The current use of the netfilter ipv6 stub indirections  causes a module
      dependency between ipv6 and nf_defrag_ipv6.
      
      This prevents nf_defrag_ipv6 module from being removed because ipv6 can't
      be unloaded.
      
      Remove the indirection and always use a direct call.  This creates a
      depency from nf_conntrack_bridge to nf_defrag_ipv6 instead:
      
      modinfo nf_conntrack
      depends:        nf_conntrack,nf_defrag_ipv6,bridge
      
      .. and nf_conntrack already depends on nf_defrag_ipv6 anyway.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2404b73c
Loading