Skip to content
  1. Jan 17, 2019
    • Gustavo A. R. Silva's avatar
      openvswitch: meter: Use struct_size() in kzalloc() · c5c3899d
      Gustavo A. R. Silva authored
      
      
      One of the more common cases of allocation size calculations is finding the
      size of a structure that has a zero-sized array at the end, along with memory
      for some number of elements for that array. For example:
      
      struct foo {
          int stuff;
          struct boo entry[];
      };
      
      instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);
      
      Instead of leaving these open-coded and prone to type mistakes, we can now
      use the new struct_size() helper:
      
      instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
      
      This code was detected with the help of Coccinelle.
      
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5c3899d
  2. Jan 16, 2019
  3. Jan 12, 2019
    • Paolo Abeni's avatar
      net: clear skb->tstamp in bridge forwarding path · 41d1c883
      Paolo Abeni authored
      
      
      Matteo reported forwarding issues inside the linux bridge,
      if the enslaved interfaces use the fq qdisc.
      
      Similar to commit 8203e2d8 ("net: clear skb->tstamp in
      forwarding paths"), we need to clear the tstamp field in
      the bridge forwarding path.
      
      Fixes: 80b14dee ("net: Add a new socket option for a future transmit time.")
      Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
      Reported-and-tested-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41d1c883
    • Taehee Yoo's avatar
      net: bpfilter: disallow to remove bpfilter module while being used · 71a85084
      Taehee Yoo authored
      
      
      The bpfilter.ko module can be removed while functions of the bpfilter.ko
      are executing. so panic can occurred. in order to protect that, locks can
      be used. a bpfilter_lock protects routines in the
      __bpfilter_process_sockopt() but it's not enough because __exit routine
      can be executed concurrently.
      
      Now, the bpfilter_umh can not run in parallel.
      So, the module do not removed while it's being used and it do not
      double-create UMH process.
      The members of the umh_info and the bpfilter_umh_ops are protected by
      the bpfilter_umh_ops.lock.
      
      test commands:
         while :
         do
      	iptables -I FORWARD -m string --string ap --algo kmp &
      	modprobe -rv bpfilter &
         done
      
      splat looks like:
      [  298.623435] BUG: unable to handle kernel paging request at fffffbfff807440b
      [  298.628512] #PF error: [normal kernel read fault]
      [  298.633018] PGD 124327067 P4D 124327067 PUD 11c1a3067 PMD 119eb2067 PTE 0
      [  298.638859] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  298.638859] CPU: 0 PID: 2997 Comm: iptables Not tainted 4.20.0+ #154
      [  298.638859] RIP: 0010:__mutex_lock+0x6b9/0x16a0
      [  298.638859] Code: c0 00 00 e8 89 82 ff ff 80 bd 8f fc ff ff 00 0f 85 d9 05 00 00 48 8b 85 80 fc ff ff 48 bf 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 38 00 0f 85 1d 0e 00 00 48 8b 85 c8 fc ff ff 49 39 47 58 c6
      [  298.638859] RSP: 0018:ffff88810e7777a0 EFLAGS: 00010202
      [  298.638859] RAX: 1ffffffff807440b RBX: ffff888111bd4d80 RCX: 0000000000000000
      [  298.638859] RDX: 1ffff110235ff806 RSI: ffff888111bd5538 RDI: dffffc0000000000
      [  298.638859] RBP: ffff88810e777b30 R08: 0000000080000002 R09: 0000000000000000
      [  298.638859] R10: 0000000000000000 R11: 0000000000000000 R12: fffffbfff168a42c
      [  298.638859] R13: ffff888111bd4d80 R14: ffff8881040e9a05 R15: ffffffffc03a2000
      [  298.638859] FS:  00007f39e3758700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      [  298.638859] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  298.638859] CR2: fffffbfff807440b CR3: 000000011243e000 CR4: 00000000001006f0
      [  298.638859] Call Trace:
      [  298.638859]  ? mutex_lock_io_nested+0x1560/0x1560
      [  298.638859]  ? kasan_kmalloc+0xa0/0xd0
      [  298.638859]  ? kmem_cache_alloc+0x1c2/0x260
      [  298.638859]  ? __alloc_file+0x92/0x3c0
      [  298.638859]  ? alloc_empty_file+0x43/0x120
      [  298.638859]  ? alloc_file_pseudo+0x220/0x330
      [  298.638859]  ? sock_alloc_file+0x39/0x160
      [  298.638859]  ? __sys_socket+0x113/0x1d0
      [  298.638859]  ? __x64_sys_socket+0x6f/0xb0
      [  298.638859]  ? do_syscall_64+0x138/0x560
      [  298.638859]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  298.638859]  ? __alloc_file+0x92/0x3c0
      [  298.638859]  ? init_object+0x6b/0x80
      [  298.638859]  ? cyc2ns_read_end+0x10/0x10
      [  298.638859]  ? cyc2ns_read_end+0x10/0x10
      [  298.638859]  ? hlock_class+0x140/0x140
      [  298.638859]  ? sched_clock_local+0xd4/0x140
      [  298.638859]  ? sched_clock_local+0xd4/0x140
      [  298.638859]  ? check_flags.part.37+0x440/0x440
      [  298.638859]  ? __lock_acquire+0x4f90/0x4f90
      [  298.638859]  ? set_rq_offline.part.89+0x140/0x140
      [ ... ]
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71a85084
    • Taehee Yoo's avatar
      net: bpfilter: restart bpfilter_umh when error occurred · 61fbf593
      Taehee Yoo authored
      
      
      The bpfilter_umh will be stopped via __stop_umh() when the bpfilter
      error occurred.
      The bpfilter_umh() couldn't start again because there is no restart
      routine.
      
      The section of the bpfilter_umh_{start/end} is no longer .init.rodata
      because these area should be reused in the restart routine. hence
      the section name is changed to .bpfilter_umh.
      
      The bpfilter_ops->start() is restart callback. it will be called when
      bpfilter_umh is stopped.
      The stop bit means bpfilter_umh is stopped. this bit is set by both
      start and stop routine.
      
      Before this patch,
      Test commands:
         $ iptables -vnL
         $ kill -9 <pid of bpfilter_umh>
         $ iptables -vnL
         [  480.045136] bpfilter: write fail -32
         $ iptables -vnL
      
      All iptables commands will fail.
      
      After this patch,
      Test commands:
         $ iptables -vnL
         $ kill -9 <pid of bpfilter_umh>
         $ iptables -vnL
         $ iptables -vnL
      
      Now, all iptables commands will work.
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61fbf593
    • Taehee Yoo's avatar
      net: bpfilter: use cleanup callback to release umh_info · 5b4cb650
      Taehee Yoo authored
      
      
      Now, UMH process is killed, do_exit() calls the umh_info->cleanup callback
      to release members of the umh_info.
      This patch makes bpfilter_umh's cleanup routine to use the
      umh_info->cleanup callback.
      
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b4cb650
  4. Jan 10, 2019
    • Yuchung Cheng's avatar
      tcp: change txhash on SYN-data timeout · c5715b8f
      Yuchung Cheng authored
      
      
      Previously upon SYN timeouts the sender recomputes the txhash to
      try a different path. However this does not apply on the initial
      timeout of SYN-data (active Fast Open). Therefore an active IPv6
      Fast Open connection may incur one second RTO penalty to take on
      a new path after the second SYN retransmission uses a new flow label.
      
      This patch removes this undesirable behavior so Fast Open changes
      the flow label just like the regular connections. This also helps
      avoid falsely disabling Fast Open on the sender which triggers
      after two consecutive SYN timeouts on Fast Open.
      
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5715b8f
    • Eric Dumazet's avatar
      ipv6: fix kernel-infoleak in ipv6_local_error() · 7d033c9f
      Eric Dumazet authored
      
      
      This patch makes sure the flow label in the IPv6 header
      forged in ipv6_local_error() is initialized.
      
      BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
      CPU: 1 PID: 24675 Comm: syz-executor1 Not tainted 4.20.0-rc7+ #4
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x173/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
       kmsan_internal_check_memory+0x455/0xb00 mm/kmsan/kmsan.c:675
       kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601
       _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
       copy_to_user include/linux/uaccess.h:177 [inline]
       move_addr_to_user+0x2e9/0x4f0 net/socket.c:227
       ___sys_recvmsg+0x5d7/0x1140 net/socket.c:2284
       __sys_recvmsg net/socket.c:2327 [inline]
       __do_sys_recvmsg net/socket.c:2337 [inline]
       __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
       __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x457ec9
      Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f8750c06c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9
      RDX: 0000000000002000 RSI: 0000000020000400 RDI: 0000000000000005
      RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8750c076d4
      R13: 00000000004c4a60 R14: 00000000004d8140 R15: 00000000ffffffff
      
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:219 [inline]
       kmsan_internal_chain_origin+0x134/0x230 mm/kmsan/kmsan.c:439
       __msan_chain_origin+0x70/0xe0 mm/kmsan/kmsan_instr.c:200
       ipv6_recv_error+0x1e3f/0x1eb0 net/ipv6/datagram.c:475
       udpv6_recvmsg+0x398/0x2ab0 net/ipv6/udp.c:335
       inet_recvmsg+0x4fb/0x600 net/ipv4/af_inet.c:830
       sock_recvmsg_nosec net/socket.c:794 [inline]
       sock_recvmsg+0x1d1/0x230 net/socket.c:801
       ___sys_recvmsg+0x4d5/0x1140 net/socket.c:2278
       __sys_recvmsg net/socket.c:2327 [inline]
       __do_sys_recvmsg net/socket.c:2337 [inline]
       __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
       __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
       kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
       kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
       kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
       slab_post_alloc_hook mm/slab.h:446 [inline]
       slab_alloc_node mm/slub.c:2759 [inline]
       __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
       __kmalloc_reserve net/core/skbuff.c:137 [inline]
       __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
       alloc_skb include/linux/skbuff.h:998 [inline]
       ipv6_local_error+0x1a7/0x9e0 net/ipv6/datagram.c:334
       __ip6_append_data+0x129f/0x4fd0 net/ipv6/ip6_output.c:1311
       ip6_make_skb+0x6cc/0xcf0 net/ipv6/ip6_output.c:1775
       udpv6_sendmsg+0x3f8e/0x45d0 net/ipv6/udp.c:1384
       inet_sendmsg+0x54a/0x720 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg net/socket.c:631 [inline]
       __sys_sendto+0x8c4/0xac0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:1796
       __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Bytes 4-7 of 28 are uninitialized
      Memory access of size 28 starts at ffff8881937bfce0
      Data copied to user address 0000000020000000
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d033c9f
    • Konstantin Khlebnikov's avatar
      net/core/neighbour: tell kmemleak about hash tables · 85704cb8
      Konstantin Khlebnikov authored
      
      
      This fixes false-positive kmemleak reports about leaked neighbour entries:
      
      unreferenced object 0xffff8885c6e4d0a8 (size 1024):
        comm "softirq", pid 0, jiffies 4294922664 (age 167640.804s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 20 2c f3 83 ff ff ff ff  ........ ,......
          08 c0 ef 5f 84 88 ff ff 01 8c 7d 02 01 00 00 00  ..._......}.....
        backtrace:
          [<00000000748509fe>] ip6_finish_output2+0x887/0x1e40
          [<0000000036d7a0d8>] ip6_output+0x1ba/0x600
          [<0000000027ea7dba>] ip6_send_skb+0x92/0x2f0
          [<00000000d6e2111d>] udp_v6_send_skb.isra.24+0x680/0x15e0
          [<000000000668a8be>] udpv6_sendmsg+0x18c9/0x27a0
          [<000000004bd5fa90>] sock_sendmsg+0xb3/0xf0
          [<000000008227b29f>] ___sys_sendmsg+0x745/0x8f0
          [<000000008698009d>] __sys_sendmsg+0xde/0x170
          [<00000000889dacf1>] do_syscall_64+0x9b/0x400
          [<0000000081cdb353>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000005767ed39>] 0xffffffffffffffff
      
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85704cb8
    • Willem de Bruijn's avatar
      ip: on queued skb use skb_header_pointer instead of pskb_may_pull · 4a06fa67
      Willem de Bruijn authored
      
      
      Commit 2efd4fca ("ip: in cmsg IP(V6)_ORIGDSTADDR call
      pskb_may_pull") avoided a read beyond the end of the skb linear
      segment by calling pskb_may_pull.
      
      That function can trigger a BUG_ON in pskb_expand_head if the skb is
      shared, which it is when when peeking. It can also return ENOMEM.
      
      Avoid both by switching to safer skb_header_pointer.
      
      Fixes: 2efd4fca ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a06fa67
  5. Jan 09, 2019
  6. Jan 08, 2019
    • Ido Schimmel's avatar
      net: bridge: Fix VLANs memory leak · 27973793
      Ido Schimmel authored
      
      
      When adding / deleting VLANs to / from a bridge port, the bridge driver
      first tries to propagate the information via switchdev and falls back to
      the 8021q driver in case the underlying driver does not support
      switchdev. This can result in a memory leak [1] when VXLAN and mlxsw
      ports are enslaved to the bridge:
      
      $ ip link set dev vxlan0 master br0
      # No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev
      # notification and the bridge driver adds the VLAN on 'vxlan0' via the
      # 8021q driver
      $ bridge vlan add vid 10 dev vxlan0 pvid untagged
      # mlxsw port is enslaved to the bridge
      $ ip link set dev swp1 master br0
      # mlxsw processes the switchdev notification and the 8021q driver is
      # skipped
      $ bridge vlan del vid 10 dev vxlan0
      
      This results in 'struct vlan_info' and 'struct vlan_vid_info' being
      leaked, as they were allocated by the 8021q driver during VLAN addition,
      but never freed as the 8021q driver was skipped during deletion.
      
      Fix this by introducing a new VLAN private flag that indicates whether
      the VLAN was added on the port by switchdev or the 8021q driver. If the
      VLAN was added by the 8021q driver, then we make sure to delete it via
      the 8021q driver as well.
      
      [1]
      unreferenced object 0xffff88822d20b1e8 (size 256):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s)
        hex dump (first 32 bytes):
          e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00  .B..............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<00000000e0178b02>] vlan_vid_add+0x661/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      unreferenced object 0xffff888227454308 (size 32):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s)
        hex dump (first 32 bytes):
          88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff  .. -...... -....
          81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<0000000018050631>] vlan_vid_add+0x3e6/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      
      Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: bridge@lists.linux-foundation.org
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27973793
  7. Jan 07, 2019
  8. Jan 06, 2019
    • Masahiro Yamada's avatar
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada authored
      
      
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      e9666d10
  9. Jan 05, 2019
  10. Jan 04, 2019
    • Eric Dumazet's avatar
      ipv6: make icmp6_send() robust against null skb->dev · 8d933670
      Eric Dumazet authored
      
      
      syzbot was able to crash one host with the following stack trace :
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 8625 Comm: syz-executor4 Not tainted 4.20.0+ #8
      RIP: 0010:dev_net include/linux/netdevice.h:2169 [inline]
      RIP: 0010:icmp6_send+0x116/0x2d30 net/ipv6/icmp.c:426
       icmpv6_send
       smack_socket_sock_rcv_skb
       security_sock_rcv_skb
       sk_filter_trim_cap
       __sk_receive_skb
       dccp_v6_do_rcv
       release_sock
      
      This is because a RX packet found socket owned by user and
      was stored into socket backlog. Before leaving RCU protected section,
      skb->dev was cleared in __sk_receive_skb(). When socket backlog
      was finally handled at release_sock() time, skb was fed to
      smack_socket_sock_rcv_skb() then icmp6_send()
      
      We could fix the bug in smack_socket_sock_rcv_skb(), or simply
      make icmp6_send() more robust against such possibility.
      
      In the future we might provide to icmp6_send() the net pointer
      instead of infering it.
      
      Fixes: d66a8acb ("Smack: Inform peer that IPv6 traffic has been blocked")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Piotr Sawicki <p.sawicki2@partner.samsung.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d933670
    • Stefano Brivio's avatar
      fou6: Prevent unbounded recursion in GUE error handler · 44039e00
      Stefano Brivio authored
      
      
      I forgot to deal with IPv6 in commit 11789039 ("fou: Prevent unbounded
      recursion in GUE error handler").
      
      Now syzbot reported what might be the same type of issue, caused by
      gue6_err(), that is, handling exceptions for direct UDP encapsulation in
      GUE (UDP-in-UDP) leads to unbounded recursion in the GUE exception
      handler.
      
      As it probably doesn't make sense to set up GUE this way, and it's
      currently not even possible to configure this, skip exception handling for
      UDP (or UDP-Lite) packets encapsulated in UDP (or UDP-Lite) packets with
      GUE on IPv6.
      
      Reported-by: default avatar <syzbot+4ad25edc7a33e4ab91e0@syzkaller.appspotmail.com>
      Reported-by: default avatarWillem de Bruijn <willemdebruijn.kernel@gmail.com>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Fixes: b8a51b38 ("fou, fou6: ICMP error handlers for FoU and GUE")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44039e00
    • Stefano Brivio's avatar
      fou: Prevent unbounded recursion in GUE error handler also with UDP-Lite · bc6e019b
      Stefano Brivio authored
      
      
      In commit 11789039 ("fou: Prevent unbounded recursion in GUE error
      handler"), I didn't take care of the case where UDP-Lite is encapsulated
      into UDP or UDP-Lite with GUE. From a syzbot report about a possibly
      similar issue with GUE on IPv6, I just realised the same thing might
      happen with a UDP-Lite inner payload.
      
      Also skip exception handling for inner UDP-Lite protocol.
      
      Fixes: 11789039 ("fou: Prevent unbounded recursion in GUE error handler")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc6e019b
    • Yi-Hung Wei's avatar
      openvswitch: Fix IPv6 later frags parsing · 41e4e2cd
      Yi-Hung Wei authored
      
      
      The previous commit fa642f08
      ("openvswitch: Derive IP protocol number for IPv6 later frags")
      introduces IP protocol number parsing for IPv6 later frags that can mess
      up the network header length calculation logic, i.e. nh_len < 0.
      However, the network header length calculation is mainly for deriving
      the transport layer header in the key extraction process which the later
      fragment does not apply.
      
      Therefore, this commit skips the network header length calculation to
      fix the issue.
      
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Reported-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Fixes: fa642f08 ("openvswitch: Derive IP protocol number for IPv6 later frags")
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41e4e2cd
    • David Rientjes's avatar
      net, skbuff: do not prefer skb allocation fails early · f8c468e8
      David Rientjes authored
      
      
      Commit dcda9b04 ("mm, tree wide: replace __GFP_REPEAT by
      __GFP_RETRY_MAYFAIL with more useful semantic") replaced __GFP_REPEAT in
      alloc_skb_with_frags() with __GFP_RETRY_MAYFAIL when the allocation may
      directly reclaim.
      
      The previous behavior would require reclaim up to 1 << order pages for
      skb aligned header_len of order > PAGE_ALLOC_COSTLY_ORDER before failing,
      otherwise the allocations in alloc_skb() would loop in the page allocator
      looking for memory.  __GFP_RETRY_MAYFAIL makes both allocations failable
      under memory pressure, including for the HEAD allocation.
      
      This can cause, among many other things, write() to fail with ENOTCONN
      during RPC when under memory pressure.
      
      These allocations should succeed as they did previous to dcda9b04
      even if it requires calling the oom killer and additional looping in the
      page allocator to find memory.  There is no way to specify the previous
      behavior of __GFP_REPEAT, but it's unlikely to be necessary since the
      previous behavior only guaranteed that 1 << order pages would be reclaimed
      before failing for order > PAGE_ALLOC_COSTLY_ORDER.  That reclaim is not
      guaranteed to be contiguous memory, so repeating for such large orders is
      usually not beneficial.
      
      Removing the setting of __GFP_RETRY_MAYFAIL to restore the previous
      behavior, specifically not allowing alloc_skb() to fail for small orders
      and oom kill if necessary rather than allowing RPCs to fail.
      
      Fixes: dcda9b04 ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8c468e8
    • Arthur Gautier's avatar
      netlink: fixup regression in RTM_GETADDR · 7c1e8a38
      Arthur Gautier authored
      
      
      This commit fixes a regression in AF_INET/RTM_GETADDR and
      AF_INET6/RTM_GETADDR.
      
      Before this commit, the kernel would stop dumping addresses once the first
      skb was full and end the stream with NLMSG_DONE(-EMSGSIZE). The error
      shouldn't be sent back to netlink_dump so the callback is kept alive. The
      userspace is expected to call back with a new empty skb.
      
      Changes from V1:
       - The error is not handled in netlink_dump anymore but rather in
         inet_dump_ifaddr and inet6_dump_addr directly as suggested by
         David Ahern.
      
      Fixes: d7e38611 ("net/ipv4: Put target net when address dump fails due to bad attributes")
      Fixes: 242afaa6 ("net/ipv6: Put target net when address dump fails due to bad attributes")
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: "David S . Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarArthur Gautier <baloo@gandi.net>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c1e8a38
    • Linus Torvalds's avatar
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds authored
      
      
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  11. Jan 03, 2019
  12. Jan 02, 2019
Loading