Skip to content
  1. Mar 27, 2020
  2. Mar 12, 2020
    • Vinicius Costa Gomes's avatar
      taprio: Fix sending packets without dequeueing them · b09fe70e
      Vinicius Costa Gomes authored
      
      
      There was a bug that was causing packets to be sent to the driver
      without first calling dequeue() on the "child" qdisc. And the KASAN
      report below shows that sending a packet without calling dequeue()
      leads to bad results.
      
      The problem is that when checking the last qdisc "child" we do not set
      the returned skb to NULL, which can cause it to be sent to the driver,
      and so after the skb is sent, it may be freed, and in some situations a
      reference to it may still be in the child qdisc, because it was never
      dequeued.
      
      The crash log looks like this:
      
      [   19.937538] ==================================================================
      [   19.938300] BUG: KASAN: use-after-free in taprio_dequeue_soft+0x620/0x780
      [   19.938968] Read of size 4 at addr ffff8881128628cc by task swapper/1/0
      [   19.939612]
      [   19.939772] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc3+ #97
      [   19.940397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qe4
      [   19.941523] Call Trace:
      [   19.941774]  <IRQ>
      [   19.941985]  dump_stack+0x97/0xe0
      [   19.942323]  print_address_description.constprop.0+0x3b/0x60
      [   19.942884]  ? taprio_dequeue_soft+0x620/0x780
      [   19.943325]  ? taprio_dequeue_soft+0x620/0x780
      [   19.943767]  __kasan_report.cold+0x1a/0x32
      [   19.944173]  ? taprio_dequeue_soft+0x620/0x780
      [   19.944612]  kasan_report+0xe/0x20
      [   19.944954]  taprio_dequeue_soft+0x620/0x780
      [   19.945380]  __qdisc_run+0x164/0x18d0
      [   19.945749]  net_tx_action+0x2c4/0x730
      [   19.946124]  __do_softirq+0x268/0x7bc
      [   19.946491]  irq_exit+0x17d/0x1b0
      [   19.946824]  smp_apic_timer_interrupt+0xeb/0x380
      [   19.947280]  apic_timer_interrupt+0xf/0x20
      [   19.947687]  </IRQ>
      [   19.947912] RIP: 0010:default_idle+0x2d/0x2d0
      [   19.948345] Code: 00 00 41 56 41 55 65 44 8b 2d 3f 8d 7c 7c 41 54 55 53 0f 1f 44 00 00 e8 b1 b2 c5 fd e9 07 00 3
      [   19.950166] RSP: 0018:ffff88811a3efda0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
      [   19.950909] RAX: 0000000080000000 RBX: ffff88811a3a9600 RCX: ffffffff8385327e
      [   19.951608] RDX: 1ffff110234752c0 RSI: 0000000000000000 RDI: ffffffff8385262f
      [   19.952309] RBP: ffffed10234752c0 R08: 0000000000000001 R09: ffffed10234752c1
      [   19.953009] R10: ffffed10234752c0 R11: ffff88811a3a9607 R12: 0000000000000001
      [   19.953709] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
      [   19.954408]  ? default_idle_call+0x2e/0x70
      [   19.954816]  ? default_idle+0x1f/0x2d0
      [   19.955192]  default_idle_call+0x5e/0x70
      [   19.955584]  do_idle+0x3d4/0x500
      [   19.955909]  ? arch_cpu_idle_exit+0x40/0x40
      [   19.956325]  ? _raw_spin_unlock_irqrestore+0x23/0x30
      [   19.956829]  ? trace_hardirqs_on+0x30/0x160
      [   19.957242]  cpu_startup_entry+0x19/0x20
      [   19.957633]  start_secondary+0x2a6/0x380
      [   19.958026]  ? set_cpu_sibling_map+0x18b0/0x18b0
      [   19.958486]  secondary_startup_64+0xa4/0xb0
      [   19.958921]
      [   19.959078] Allocated by task 33:
      [   19.959412]  save_stack+0x1b/0x80
      [   19.959747]  __kasan_kmalloc.constprop.0+0xc2/0xd0
      [   19.960222]  kmem_cache_alloc+0xe4/0x230
      [   19.960617]  __alloc_skb+0x91/0x510
      [   19.960967]  ndisc_alloc_skb+0x133/0x330
      [   19.961358]  ndisc_send_ns+0x134/0x810
      [   19.961735]  addrconf_dad_work+0xad5/0xf80
      [   19.962144]  process_one_work+0x78e/0x13a0
      [   19.962551]  worker_thread+0x8f/0xfa0
      [   19.962919]  kthread+0x2ba/0x3b0
      [   19.963242]  ret_from_fork+0x3a/0x50
      [   19.963596]
      [   19.963753] Freed by task 33:
      [   19.964055]  save_stack+0x1b/0x80
      [   19.964386]  __kasan_slab_free+0x12f/0x180
      [   19.964830]  kmem_cache_free+0x80/0x290
      [   19.965231]  ip6_mc_input+0x38a/0x4d0
      [   19.965617]  ipv6_rcv+0x1a4/0x1d0
      [   19.965948]  __netif_receive_skb_one_core+0xf2/0x180
      [   19.966437]  netif_receive_skb+0x8c/0x3c0
      [   19.966846]  br_handle_frame_finish+0x779/0x1310
      [   19.967302]  br_handle_frame+0x42a/0x830
      [   19.967694]  __netif_receive_skb_core+0xf0e/0x2a90
      [   19.968167]  __netif_receive_skb_one_core+0x96/0x180
      [   19.968658]  process_backlog+0x198/0x650
      [   19.969047]  net_rx_action+0x2fa/0xaa0
      [   19.969420]  __do_softirq+0x268/0x7bc
      [   19.969785]
      [   19.969940] The buggy address belongs to the object at ffff888112862840
      [   19.969940]  which belongs to the cache skbuff_head_cache of size 224
      [   19.971202] The buggy address is located 140 bytes inside of
      [   19.971202]  224-byte region [ffff888112862840, ffff888112862920)
      [   19.972344] The buggy address belongs to the page:
      [   19.972820] page:ffffea00044a1800 refcount:1 mapcount:0 mapping:ffff88811a2bd1c0 index:0xffff8881128625c0 compo0
      [   19.973930] flags: 0x8000000000010200(slab|head)
      [   19.974388] raw: 8000000000010200 ffff88811a2ed650 ffff88811a2ed650 ffff88811a2bd1c0
      [   19.975151] raw: ffff8881128625c0 0000000000190013 00000001ffffffff 0000000000000000
      [   19.975915] page dumped because: kasan: bad access detected
      [   19.976461] page_owner tracks the page as allocated
      [   19.976946] page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NO)
      [   19.978332]  prep_new_page+0x24b/0x330
      [   19.978707]  get_page_from_freelist+0x2057/0x2c90
      [   19.979170]  __alloc_pages_nodemask+0x218/0x590
      [   19.979619]  new_slab+0x9d/0x300
      [   19.979948]  ___slab_alloc.constprop.0+0x2f9/0x6f0
      [   19.980421]  __slab_alloc.constprop.0+0x30/0x60
      [   19.980870]  kmem_cache_alloc+0x201/0x230
      [   19.981269]  __alloc_skb+0x91/0x510
      [   19.981620]  alloc_skb_with_frags+0x78/0x4a0
      [   19.982043]  sock_alloc_send_pskb+0x5eb/0x750
      [   19.982476]  unix_stream_sendmsg+0x399/0x7f0
      [   19.982904]  sock_sendmsg+0xe2/0x110
      [   19.983262]  ____sys_sendmsg+0x4de/0x6d0
      [   19.983660]  ___sys_sendmsg+0xe4/0x160
      [   19.984032]  __sys_sendmsg+0xab/0x130
      [   19.984396]  do_syscall_64+0xe7/0xae0
      [   19.984761] page last free stack trace:
      [   19.985142]  __free_pages_ok+0x432/0xbc0
      [   19.985533]  qlist_free_all+0x56/0xc0
      [   19.985907]  quarantine_reduce+0x149/0x170
      [   19.986315]  __kasan_kmalloc.constprop.0+0x9e/0xd0
      [   19.986791]  kmem_cache_alloc+0xe4/0x230
      [   19.987182]  prepare_creds+0x24/0x440
      [   19.987548]  do_faccessat+0x80/0x590
      [   19.987906]  do_syscall_64+0xe7/0xae0
      [   19.988276]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   19.988775]
      [   19.988930] Memory state around the buggy address:
      [   19.989402]  ffff888112862780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   19.990111]  ffff888112862800: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      [   19.990822] >ffff888112862880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   19.991529]                                               ^
      [   19.992081]  ffff888112862900: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
      [   19.992796]  ffff888112862980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler")
      Reported-by: default avatarMichael Schmidt <michael.schmidt@eti.uni-siegen.de>
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Acked-by: default avatarAndre Guedes <andre.guedes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b09fe70e
    • Eric Dumazet's avatar
      net: memcg: fix lockdep splat in inet_csk_accept() · 06669ea3
      Eric Dumazet authored
      
      
      Locking newsk while still holding the listener lock triggered
      a lockdep splat [1]
      
      We can simply move the memcg code after we release the listener lock,
      as this can also help if multiple threads are sharing a common listener.
      
      Also fix a typo while reading socket sk_rmem_alloc.
      
      [1]
      WARNING: possible recursive locking detected
      5.6.0-rc3-syzkaller #0 Not tainted
      --------------------------------------------
      syz-executor598/9524 is trying to acquire lock:
      ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
      ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492
      
      but task is already holding lock:
      ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
      ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_INET6);
        lock(sk_lock-AF_INET6);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      1 lock held by syz-executor598/9524:
       #0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
       #0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445
      
      stack backtrace:
      CPU: 0 PID: 9524 Comm: syz-executor598 Not tainted 5.6.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       print_deadlock_bug kernel/locking/lockdep.c:2370 [inline]
       check_deadlock kernel/locking/lockdep.c:2411 [inline]
       validate_chain kernel/locking/lockdep.c:2954 [inline]
       __lock_acquire.cold+0x114/0x288 kernel/locking/lockdep.c:3954
       lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4484
       lock_sock_nested+0xc5/0x110 net/core/sock.c:2947
       lock_sock include/net/sock.h:1541 [inline]
       inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492
       inet_accept+0xe9/0x7c0 net/ipv4/af_inet.c:734
       __sys_accept4_file+0x3ac/0x5b0 net/socket.c:1758
       __sys_accept4+0x53/0x90 net/socket.c:1809
       __do_sys_accept4 net/socket.c:1821 [inline]
       __se_sys_accept4 net/socket.c:1818 [inline]
       __x64_sys_accept4+0x93/0xf0 net/socket.c:1818
       do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4445c9
      Code: e8 0c 0d 03 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffc35b37608 EFLAGS: 00000246 ORIG_RAX: 0000000000000120
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004445c9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 0000000000306777 R09: 0000000000306777
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00000000004053d0 R14: 0000000000000000 R15: 0000000000000000
      
      Fixes: d752a498 ("net: memcg: late association of sock to memcg")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06669ea3
    • Paolo Lungaroni's avatar
      seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number · 26776253
      Paolo Lungaroni authored
      The Internet Assigned Numbers Authority (IANA) has recently assigned
      a protocol number value of 143 for Ethernet [1].
      
      Before this assignment, encapsulation mechanisms such as Segment Routing
      used the IPv6-NoNxt protocol number (59) to indicate that the encapsulated
      payload is an Ethernet frame.
      
      In this patch, we add the definition of the Ethernet protocol number to the
      kernel headers and update the SRv6 L2 tunnels to use it.
      
      [1] https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml
      
      
      
      Signed-off-by: default avatarPaolo Lungaroni <paolo.lungaroni@cnit.it>
      Reviewed-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Acked-by: default avatarAhmed Abdelsalam <ahmed.abdelsalam@gssi.it>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26776253
    • Andrew Lunn's avatar
      net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed · a20f9970
      Andrew Lunn authored
      
      
      By default, DSA drivers should configure CPU and DSA ports to their
      maximum speed. In many configurations this is sufficient to make the
      link work.
      
      In some cases it is necessary to configure the link to run slower,
      e.g. because of limitations of the SoC it is connected to. Or back to
      back PHYs are used and the PHY needs to be driven in order to
      establish link. In this case, phylink is used.
      
      Only instantiate phylink if it is required. If there is no PHY, or no
      fixed link properties, phylink can upset a link which works in the
      default configuration.
      
      Fixes: 0e279218 ("net: dsa: Use PHYLINK for the CPU/DSA ports")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a20f9970
    • Willem de Bruijn's avatar
      net/packet: tpacket_rcv: do not increment ring index on drop · 46e4c421
      Willem de Bruijn authored
      
      
      In one error case, tpacket_rcv drops packets after incrementing the
      ring producer index.
      
      If this happens, it does not update tp_status to TP_STATUS_USER and
      thus the reader is stalled for an iteration of the ring, causing out
      of order arrival.
      
      The only such error path is when virtio_net_hdr_from_skb fails due
      to encountering an unknown GSO type.
      
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46e4c421
    • Amol Grover's avatar
      net: caif: Add lockdep expression to RCU traversal primitive · f9fc28a8
      Amol Grover authored
      
      
      caifdevs->list is traversed using list_for_each_entry_rcu()
      outside an RCU read-side critical section but under the
      protection of rtnl_mutex. Hence, add the corresponding lockdep
      expression to silence the following false-positive warning:
      
      [   10.868467] =============================
      [   10.869082] WARNING: suspicious RCU usage
      [   10.869817] 5.6.0-rc1-00177-g06ec0a154aae4 #1 Not tainted
      [   10.870804] -----------------------------
      [   10.871557] net/caif/caif_dev.c:115 RCU-list traversed in non-reader section!!
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAmol Grover <frextrite@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9fc28a8
  3. Mar 11, 2020
  4. Mar 10, 2020
    • Karsten Graul's avatar
      net/smc: cancel event worker during device removal · ece0d7bd
      Karsten Graul authored
      
      
      During IB device removal, cancel the event worker before the device
      structure is freed.
      
      Fixes: a4cf0443 ("smc: introduce SMC as an IB-client")
      Reported-by: default avatar <syzbot+b297c6825752e7a07272@syzkaller.appspotmail.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ece0d7bd
    • Hangbin Liu's avatar
      ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface · 60380488
      Hangbin Liu authored
      
      
      Rafał found an issue that for non-Ethernet interface, if we down and up
      frequently, the memory will be consumed slowly.
      
      The reason is we add allnodes/allrouters addressed in multicast list in
      ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast
      addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up()
      for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb
      getting bigger and bigger. The call stack looks like:
      
      addrconf_notify(NETDEV_REGISTER)
      	ipv6_add_dev
      		ipv6_dev_mc_inc(ff01::1)
      		ipv6_dev_mc_inc(ff02::1)
      		ipv6_dev_mc_inc(ff02::2)
      
      addrconf_notify(NETDEV_UP)
      	addrconf_dev_config
      		/* Alas, we support only Ethernet autoconfiguration. */
      		return;
      
      addrconf_notify(NETDEV_DOWN)
      	addrconf_ifdown
      		ipv6_mc_down
      			igmp6_group_dropped(ff02::2)
      				mld_add_delrec(ff02::2)
      			igmp6_group_dropped(ff02::1)
      			igmp6_group_dropped(ff01::1)
      
      After investigating, I can't found a rule to disable multicast on
      non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM,
      tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up()
      in inetdev_event(). Even for IPv6, we don't check the dev type and call
      ipv6_add_dev(), ipv6_dev_mc_inc() after register device.
      
      So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for
      non-Ethernet interface.
      
      v2: Also check IFF_MULTICAST flag to make sure the interface supports
          multicast
      
      Reported-by: default avatarRafał Miłecki <zajec5@gmail.com>
      Tested-by: default avatarRafał Miłecki <zajec5@gmail.com>
      Fixes: 74235a25 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels")
      Fixes: 1666d49e ("mld: do not remove mld souce list info when set link down")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60380488
    • Shakeel Butt's avatar
      net: memcg: late association of sock to memcg · d752a498
      Shakeel Butt authored
      
      
      If a TCP socket is allocated in IRQ context or cloned from unassociated
      (i.e. not associated to a memcg) in IRQ context then it will remain
      unassociated for its whole life. Almost half of the TCPs created on the
      system are created in IRQ context, so, memory used by such sockets will
      not be accounted by the memcg.
      
      This issue is more widespread in cgroup v1 where network memory
      accounting is opt-in but it can happen in cgroup v2 if the source socket
      for the cloning was created in root memcg.
      
      To fix the issue, just do the association of the sockets at the accept()
      time in the process context and then force charge the memory buffer
      already used and reserved by the socket.
      
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d752a498
    • Dmitry Yakunin's avatar
      cgroup, netclassid: periodically release file_lock on classid updating · 018d26fc
      Dmitry Yakunin authored
      
      
      In our production environment we have faced with problem that updating
      classid in cgroup with heavy tasks cause long freeze of the file tables
      in this tasks. By heavy tasks we understand tasks with many threads and
      opened sockets (e.g. balancers). This freeze leads to an increase number
      of client timeouts.
      
      This patch implements following logic to fix this issue:
      аfter iterating 1000 file descriptors file table lock will be released
      thus providing a time gap for socket creation/deletion.
      
      Now update is non atomic and socket may be skipped using calls:
      
      dup2(oldfd, newfd);
      close(oldfd);
      
      But this case is not typical. Moreover before this patch skip is possible
      too by hiding socket fd in unix socket buffer.
      
      New sockets will be allocated with updated classid because cgroup state
      is updated before start of the file descriptors iteration.
      
      So in common cases this patch has no side effects.
      
      Signed-off-by: default avatarDmitry Yakunin <zeil@yandex-team.ru>
      Reviewed-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      018d26fc
  5. Mar 09, 2020
    • Dmitry Yakunin's avatar
      inet_diag: return classid for all socket types · 83f73c5b
      Dmitry Yakunin authored
      
      
      In commit 1ec17dbd ("inet_diag: fix reporting cgroup classid and
      fallback to priority") croup classid reporting was fixed. But this works
      only for TCP sockets because for other socket types icsk parameter can
      be NULL and classid code path is skipped. This change moves classid
      handling to inet_diag_msg_attrs_fill() function.
      
      Also inet_diag_msg_attrs_size() helper was added and addends in
      nlmsg_new() were reordered to save order from inet_sk_diag_fill().
      
      Fixes: 1ec17dbd ("inet_diag: fix reporting cgroup classid and fallback to priority")
      Signed-off-by: default avatarDmitry Yakunin <zeil@yandex-team.ru>
      Reviewed-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83f73c5b
    • Eric Dumazet's avatar
      gre: fix uninit-value in __iptunnel_pull_header · 17c25caf
      Eric Dumazet authored
      
      
      syzbot found an interesting case of the kernel reading
      an uninit-value [1]
      
      Problem is in the handling of ETH_P_WCCP in gre_parse_header()
      
      We look at the byte following GRE options to eventually decide
      if the options are four bytes longer.
      
      Use skb_header_pointer() to not pull bytes if we found
      that no more bytes were needed.
      
      All callers of gre_parse_header() are properly using pskb_may_pull()
      anyway before proceeding to next header.
      
      [1]
      BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2303 [inline]
      BUG: KMSAN: uninit-value in __iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94
      CPU: 1 PID: 11784 Comm: syz-executor940 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       pskb_may_pull include/linux/skbuff.h:2303 [inline]
       __iptunnel_pull_header+0x30c/0xbd0 net/ipv4/ip_tunnel_core.c:94
       iptunnel_pull_header include/net/ip_tunnels.h:411 [inline]
       gre_rcv+0x15e/0x19c0 net/ipv6/ip6_gre.c:606
       ip6_protocol_deliver_rcu+0x181b/0x22c0 net/ipv6/ip6_input.c:432
       ip6_input_finish net/ipv6/ip6_input.c:473 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip6_input net/ipv6/ip6_input.c:482 [inline]
       ip6_mc_input+0xdf2/0x1460 net/ipv6/ip6_input.c:576
       dst_input include/net/dst.h:442 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ipv6_rcv+0x683/0x710 net/ipv6/ip6_input.c:306
       __netif_receive_skb_one_core net/core/dev.c:5198 [inline]
       __netif_receive_skb net/core/dev.c:5312 [inline]
       netif_receive_skb_internal net/core/dev.c:5402 [inline]
       netif_receive_skb+0x66b/0xf20 net/core/dev.c:5461
       tun_rx_batched include/linux/skbuff.h:4321 [inline]
       tun_get_user+0x6aef/0x6f60 drivers/net/tun.c:1997
       tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026
       call_write_iter include/linux/fs.h:1901 [inline]
       new_sync_write fs/read_write.c:483 [inline]
       __vfs_write+0xa5a/0xca0 fs/read_write.c:496
       vfs_write+0x44a/0x8f0 fs/read_write.c:558
       ksys_write+0x267/0x450 fs/read_write.c:611
       __do_sys_write fs/read_write.c:623 [inline]
       __se_sys_write fs/read_write.c:620 [inline]
       __ia32_sys_write+0xdb/0x120 fs/read_write.c:620
       do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
       do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
       entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
      RIP: 0023:0xf7f62d99
      Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
      RSP: 002b:00000000fffedb2c EFLAGS: 00000217 ORIG_RAX: 0000000000000004
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020002580
      RDX: 0000000000000fca RSI: 0000000000000036 RDI: 0000000000000004
      RBP: 0000000000008914 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline]
       kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127
       kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82
       slab_alloc_node mm/slub.c:2793 [inline]
       __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4401
       __kmalloc_reserve net/core/skbuff.c:142 [inline]
       __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:210
       alloc_skb include/linux/skbuff.h:1051 [inline]
       alloc_skb_with_frags+0x18c/0xa70 net/core/skbuff.c:5766
       sock_alloc_send_pskb+0xada/0xc60 net/core/sock.c:2242
       tun_alloc_skb drivers/net/tun.c:1529 [inline]
       tun_get_user+0x10ae/0x6f60 drivers/net/tun.c:1843
       tun_chr_write_iter+0x1f2/0x360 drivers/net/tun.c:2026
       call_write_iter include/linux/fs.h:1901 [inline]
       new_sync_write fs/read_write.c:483 [inline]
       __vfs_write+0xa5a/0xca0 fs/read_write.c:496
       vfs_write+0x44a/0x8f0 fs/read_write.c:558
       ksys_write+0x267/0x450 fs/read_write.c:611
       __do_sys_write fs/read_write.c:623 [inline]
       __se_sys_write fs/read_write.c:620 [inline]
       __ia32_sys_write+0xdb/0x120 fs/read_write.c:620
       do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
       do_fast_syscall_32+0x3c7/0x6e0 arch/x86/entry/common.c:410
       entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
      
      Fixes: 95f5c64c ("gre: Move utility functions to common headers")
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17c25caf
  6. Mar 06, 2020
    • Pablo Neira Ayuso's avatar
      netfilter: nft_chain_nat: inet family is missing module ownership · 6a42cefb
      Pablo Neira Ayuso authored
      
      
      Set owner to THIS_MODULE, otherwise the nft_chain_nat module might be
      removed while there are still inet/nat chains in place.
      
      [  117.942096] BUG: unable to handle page fault for address: ffffffffa0d5e040
      [  117.942101] #PF: supervisor read access in kernel mode
      [  117.942103] #PF: error_code(0x0000) - not-present page
      [  117.942106] PGD 200c067 P4D 200c067 PUD 200d063 PMD 3dc909067 PTE 0
      [  117.942113] Oops: 0000 [#1] PREEMPT SMP PTI
      [  117.942118] CPU: 3 PID: 27 Comm: kworker/3:0 Not tainted 5.6.0-rc3+ #348
      [  117.942133] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
      [  117.942145] RIP: 0010:nf_tables_chain_destroy.isra.0+0x94/0x15a [nf_tables]
      [  117.942149] Code: f6 45 54 01 0f 84 d1 00 00 00 80 3b 05 74 44 48 8b 75 e8 48 c7 c7 72 be de a0 e8 56 e6 2d e0 48 8b 45 e8 48 c7 c7 7f be de a0 <48> 8b 30 e8 43 e6 2d e0 48 8b 45 e8 48 8b 40 10 48 85 c0 74 5b 8b
      [  117.942152] RSP: 0018:ffffc9000015be10 EFLAGS: 00010292
      [  117.942155] RAX: ffffffffa0d5e040 RBX: ffff88840be87fc2 RCX: 0000000000000007
      [  117.942158] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffffffffa0debe7f
      [  117.942160] RBP: ffff888403b54b50 R08: 0000000000001482 R09: 0000000000000004
      [  117.942162] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8883eda7e540
      [  117.942164] R13: dead000000000122 R14: dead000000000100 R15: ffff888403b3db80
      [  117.942167] FS:  0000000000000000(0000) GS:ffff88840e4c0000(0000) knlGS:0000000000000000
      [  117.942169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  117.942172] CR2: ffffffffa0d5e040 CR3: 00000003e4c52002 CR4: 00000000001606e0
      [  117.942174] Call Trace:
      [  117.942188]  nf_tables_trans_destroy_work.cold+0xd/0x12 [nf_tables]
      [  117.942196]  process_one_work+0x1d6/0x3b0
      [  117.942200]  worker_thread+0x45/0x3c0
      [  117.942203]  ? process_one_work+0x3b0/0x3b0
      [  117.942210]  kthread+0x112/0x130
      [  117.942214]  ? kthread_create_worker_on_cpu+0x40/0x40
      [  117.942221]  ret_from_fork+0x35/0x40
      
      nf_tables_chain_destroy() crashes on module_put() because the module is
      gone.
      
      Fixes: d164385e ("netfilter: nat: add inet family nat support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6a42cefb
    • Paolo Abeni's avatar
      mptcp: always include dack if possible. · 2398e399
      Paolo Abeni authored
      
      
      Currently passive MPTCP socket can skip including the DACK
      option - if the peer sends data before accept() completes.
      
      The above happens because the msk 'can_ack' flag is set
      only after the accept() call.
      
      Such missing DACK option may cause - as per RFC spec -
      unwanted fallback to TCP.
      
      This change addresses the issue using the key material
      available in the current subflow, if any, to create a suitable
      dack option when msk ack seq is not yet available.
      
      v1 -> v2:
       - adavance the generated ack after the initial MPC packet
      
      Fixes: d22f4988 ("mptcp: process MP_CAPABLE data option")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2398e399
    • Dan Carpenter's avatar
      net: nfc: fix bounds checking bugs on "pipe" · a3aefbfe
      Dan Carpenter authored
      
      
      This is similar to commit 674d9de0 ("NFC: Fix possible memory
      corruption when handling SHDLC I-Frame commands") and commit d7ee81ad
      ("NFC: nci: Add some bounds checking in nci_hci_cmd_received()") which
      added range checks on "pipe".
      
      The "pipe" variable comes skb->data[0] in nfc_hci_msg_rx_work().
      It's in the 0-255 range.  We're using it as the array index into the
      hdev->pipes[] array which has NFC_HCI_MAX_PIPES (128) members.
      
      Fixes: 118278f2 ("NFC: hci: Add pipes table to reference them with a tuple {gate, host}")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3aefbfe
  7. Mar 05, 2020
    • Florian Westphal's avatar
      netfilter: nf_tables: fix infinite loop when expr is not available · 1d305ba4
      Florian Westphal authored
      
      
      nft will loop forever if the kernel doesn't support an expression:
      
      1. nft_expr_type_get() appends the family specific name to the module list.
      2. -EAGAIN is returned to nfnetlink, nfnetlink calls abort path.
      3. abort path sets ->done to true and calls request_module for the
         expression.
      4. nfnetlink replays the batch, we end up in nft_expr_type_get() again.
      5. nft_expr_type_get attempts to append family-specific name. This
         one already exists on the list, so we continue
      6. nft_expr_type_get adds the generic expression name to the module
         list. -EAGAIN is returned, nfnetlink calls abort path.
      7. abort path encounters the family-specific expression which
         has 'done' set, so it gets removed.
      8. abort path requests the generic expression name, sets done to true.
      9. batch is replayed.
      
      If the expression could not be loaded, then we will end up back at 1),
      because the family-specific name got removed and the cycle starts again.
      
      Note that userspace can SIGKILL the nft process to stop the cycle, but
      the desired behaviour is to return an error after the generic expr name
      fails to load the expression.
      
      Fixes: eb014de4 ("netfilter: nf_tables: autoload modules from the abort path")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1d305ba4
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: dump NFTA_CHAIN_FLAGS attribute · d78008de
      Pablo Neira Ayuso authored
      
      
      Missing NFTA_CHAIN_FLAGS netlink attribute when dumping basechain
      definitions.
      
      Fixes: c9626a2c ("netfilter: nf_tables: add hardware offload support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d78008de
  8. Mar 04, 2020
Loading