Skip to content
  1. Jul 27, 2015
  2. Jul 25, 2015
    • Julian Anastasov's avatar
      ipv4: consider TOS in fib_select_default · 2392debc
      Julian Anastasov authored
      
      
      fib_select_default considers alternative routes only when
      res->fi is for the first alias in res->fa_head. In the
      common case this can happen only when the initial lookup
      matches the first alias with highest TOS value. This
      prevents the alternative routes to require specific TOS.
      
      This patch solves the problem as follows:
      
      - routes that require specific TOS should be returned by
      fib_select_default only when TOS matches, as already done
      in fib_table_lookup. This rule implies that depending on the
      TOS we can have many different lists of alternative gateways
      and we have to keep the last used gateway (fa_default) in first
      alias for the TOS instead of using single tb_default value.
      
      - as the aliases are ordered by many keys (TOS desc,
      fib_priority asc), we restrict the possible results to
      routes with matching TOS and lowest metric (fib_priority)
      and routes that match any TOS, again with lowest metric.
      
      For example, packet with TOS 8 can not use gw3 (not lowest
      metric), gw4 (different TOS) and gw6 (not lowest metric),
      all other gateways can be used:
      
      tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
      tos 8 via gw2 metric 2
      tos 8 via gw3 metric 3
      tos 4 via gw4
      tos 0 via gw5
      tos 0 via gw6 metric 1
      
      Reported-by: default avatarHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2392debc
  3. Jul 20, 2015
    • Pablo Neira Ayuso's avatar
      netfilter: fix netns dependencies with conntrack templates · 0838aa7f
      Pablo Neira Ayuso authored
      
      
      Quoting Daniel Borkmann:
      
      "When adding connection tracking template rules to a netns, f.e. to
      configure netfilter zones, the kernel will endlessly busy-loop as soon
      as we try to delete the given netns in case there's at least one
      template present, which is problematic i.e. if there is such bravery that
      the priviledged user inside the netns is assumed untrusted.
      
      Minimal example:
      
        ip netns add foo
        ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
        ip netns del foo
      
      What happens is that when nf_ct_iterate_cleanup() is being called from
      nf_conntrack_cleanup_net_list() for a provided netns, we always end up
      with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
      don't get a soft-lockup as we still have a schedule() point, but the
      serving CPU spins on 100% from that point onwards.
      
      Since templates are normally allocated with nf_conntrack_alloc(), we
      also bump net->ct.count. The issue why they are not yet nf_ct_put() is
      because the per netns .exit() handler from x_tables (which would eventually
      invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
      called in the dependency chain at a *later* point in time than the per
      netns .exit() handler for the connection tracker.
      
      This is clearly a chicken'n'egg problem: after the connection tracker
      .exit() handler, we've teared down all the connection tracking
      infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
      invoked at a later point in time during the netns cleanup, as that would
      lead to a use-after-free. At the same time, we cannot make x_tables depend
      on the connection tracker module, so that the xt_ct_tg_destroy() would
      be invoked earlier in the cleanup chain."
      
      Daniel confirms this has to do with the order in which modules are loaded or
      having compiled nf_conntrack as modules while x_tables built-in. So we have no
      guarantees regarding the order in which netns callbacks are executed.
      
      Fix this by allocating the templates through kmalloc() from the respective
      SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
      Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
      is marked as unlikely since conntrack templates are rarely allocated and only
      from the configuration plane path.
      
      Note that templates are not kept in any list to avoid further dependencies with
      nf_conntrack anymore, thus, the tmpl larval list is removed.
      
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Tested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0838aa7f
  4. Jul 17, 2015
  5. Jul 16, 2015
  6. Jun 28, 2015
  7. Jun 24, 2015
    • Andy Gospodarek's avatar
      net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f
      Andy Gospodarek authored
      
      
      This feature is only enabled with the new per-interface or ipv4 global
      sysctls called 'ignore_routes_with_linkdown'.
      
      net.ipv4.conf.all.ignore_routes_with_linkdown = 0
      net.ipv4.conf.default.ignore_routes_with_linkdown = 0
      net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, will report to userspace that a route is
      dead and will no longer resolve to this nexthop when performing a fib
      lookup.  This will signal to userspace that the route will not be
      selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
      if the sysctl is enabled and link is down.  This was done as without it
      the netlink listeners would have no idea whether or not a nexthop would
      be selected.   The kernel only sets RTNH_F_DEAD internally if the
      interface has IFF_UP cleared.
      
      With the new sysctl set, the following behavior can be observed
      (interface p8p1 is link-down):
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
          cache
      
      While the route does remain in the table (so it can be modified if
      needed rather than being wiped away as it would be if IFF_UP was
      cleared), the proper next-hop is chosen automatically when the link is
      down.  Now interface p8p1 is linked-up:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
      90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      
      and the output changes to what one would expect.
      
      If the sysctl is not set, the following output would be expected when
      p8p1 is down:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      
      Since the dead flag does not appear, there should be no expectation that
      the kernel would skip using this route due to link being down.
      
      v2: Split kernel changes into 2 patches, this actually makes a
      behavioral change if the sysctl is set.  Also took suggestion from Alex
      to simplify code by only checking sysctl during fib lookup and
      suggestion from Scott to add a per-interface sysctl.
      
      v3: Code clean-ups to make it more readable and efficient as well as a
      reverse path check fix.
      
      v4: Drop binary sysctl
      
      v5: Whitespace fixups from Dave
      
      v6: Style changes from Dave and checkpatch suggestions
      
      v7: One more checkpatch fixup
      
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eeb075f
    • Andy Gospodarek's avatar
      net: track link-status of ipv4 nexthops · 8a3d0316
      Andy Gospodarek authored
      
      
      Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
      reachable via an interface where carrier is off.  No action is taken,
      but additional flags are passed to userspace to indicate carrier status.
      
      This also includes a cleanup to fib_disable_ip to more clearly indicate
      what event made the function call to replace the more cryptic force
      option previously used.
      
      v2: Split out kernel functionality into 2 patches, this patch simply
      sets and clears new nexthop flag RTNH_F_LINKDOWN.
      
      v3: Cleanups suggested by Alex as well as a bug noticed in
      fib_sync_down_dev and fib_sync_up when multipath was not enabled.
      
      v5: Whitespace and variable declaration fixups suggested by Dave.
      
      v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
      all.
      
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a3d0316
  8. Jun 23, 2015
    • Scott Feldman's avatar
      switchdev: rename vlan vid_start to vid_begin · 3e3a78b4
      Scott Feldman authored
      
      
      Use vid_begin/end to be consistent with BRIDGE_VLAN_INFO_RANGE_BEGIN/END.
      
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e3a78b4
    • Eric W. Biederman's avatar
      netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook · 8405a8ff
      Eric W. Biederman authored
      
      
      Add code to nf_unregister_hook to flush the nf_queue when a hook is
      unregistered.  This guarantees that the pointer that the nf_queue code
      retains into the nf_hook list will remain valid while a packet is
      queued.
      
      I tested what would happen if we do not flush queued packets and was
      trivially able to obtain the oops below.  All that was required was
      to stop the nf_queue listening process, to delete all of the nf_tables,
      and to awaken the nf_queue listening process.
      
      > BUG: unable to handle kernel paging request at 0000000100000001
      > IP: [<0000000100000001>] 0x100000001
      > PGD b9c35067 PUD 0
      > Oops: 0010 [#1] SMP
      > Modules linked in:
      > CPU: 0 PID: 519 Comm: lt-nfqnl_test Not tainted
      > task: ffff8800b9c8c050 ti: ffff8800ba9d8000 task.ti: ffff8800ba9d8000
      > RIP: 0010:[<0000000100000001>]  [<0000000100000001>] 0x100000001
      > RSP: 0018:ffff8800ba9dba40  EFLAGS: 00010a16
      > RAX: ffff8800bab48a00 RBX: ffff8800ba9dba90 RCX: ffff8800ba9dba90
      > RDX: ffff8800b9c10128 RSI: ffff8800ba940900 RDI: ffff8800bab48a00
      > RBP: ffff8800b9c10128 R08: ffffffff82976660 R09: ffff8800ba9dbb28
      > R10: dead000000100100 R11: dead000000200200 R12: ffff8800ba940900
      > R13: ffffffff8313fd50 R14: ffff8800b9c95200 R15: 0000000000000000
      > FS:  00007fb91fc34700(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > CR2: 0000000100000001 CR3: 00000000babfb000 CR4: 00000000000007f0
      > Stack:
      >  ffffffff8206ab0f ffffffff82982240 ffff8800bab48a00 ffff8800b9c100a8
      >  ffff8800b9c10100 0000000000000001 ffff8800ba940900 ffff8800b9c10128
      >  ffffffff8206bd65 ffff8800bfb0d5e0 ffff8800bab48a00 0000000000014dc0
      > Call Trace:
      >  [<ffffffff8206ab0f>] ? nf_iterate+0x4f/0xa0
      >  [<ffffffff8206bd65>] ? nf_reinject+0x125/0x190
      >  [<ffffffff8206dee5>] ? nfqnl_recv_verdict+0x255/0x360
      >  [<ffffffff81386290>] ? nla_parse+0x80/0xf0
      >  [<ffffffff8206c42c>] ? nfnetlink_rcv_msg+0x13c/0x240
      >  [<ffffffff811b2fec>] ? __memcg_kmem_get_cache+0x4c/0x150
      >  [<ffffffff8206c2f0>] ? nfnl_lock+0x20/0x20
      >  [<ffffffff82068159>] ? netlink_rcv_skb+0xa9/0xc0
      >  [<ffffffff820677bf>] ? netlink_unicast+0x12f/0x1c0
      >  [<ffffffff82067ade>] ? netlink_sendmsg+0x28e/0x650
      >  [<ffffffff81fdd814>] ? sock_sendmsg+0x44/0x50
      >  [<ffffffff81fde07b>] ? ___sys_sendmsg+0x2ab/0x2c0
      >  [<ffffffff810e8f73>] ? __wake_up+0x43/0x70
      >  [<ffffffff8141a134>] ? tty_write+0x1c4/0x2a0
      >  [<ffffffff81fde9f4>] ? __sys_sendmsg+0x44/0x80
      >  [<ffffffff823ff8d7>] ? system_call_fastpath+0x12/0x6a
      > Code:  Bad RIP value.
      > RIP  [<0000000100000001>] 0x100000001
      >  RSP <ffff8800ba9dba40>
      > CR2: 0000000100000001
      > ---[ end trace 08eb65d42362793f ]---
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8405a8ff
  9. Jun 21, 2015
  10. Jun 18, 2015
  11. Jun 16, 2015
  12. Jun 15, 2015
  13. Jun 14, 2015
    • Marcelo Ricardo Leitner's avatar
      sctp: fix ASCONF list handling · 2d45a02d
      Marcelo Ricardo Leitner authored
      
      
      ->auto_asconf_splist is per namespace and mangled by functions like
      sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
      
      Also, the call to inet_sk_copy_descendant() was backuping
      ->auto_asconf_list through the copy but was not honoring
      ->do_auto_asconf, which could lead to list corruption if it was
      different between both sockets.
      
      This commit thus fixes the list handling by using ->addr_wq_lock
      spinlock to protect the list. A special handling is done upon socket
      creation and destruction for that. Error handlig on sctp_init_sock()
      will never return an error after having initialized asconf, so
      sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
      will be take on sctp_close_sock(), before locking the socket, so we
      don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
      
      Instead of taking the lock on sctp_sock_migrate() for copying and
      restoring the list values, it's preferred to avoid rewritting it by
      implementing sctp_copy_descendant().
      
      Issue was found with a test application that kept flipping sysctl
      default_auto_asconf on and off, but one could trigger it by issuing
      simultaneous setsockopt() calls on multiple sockets or by
      creating/destroying sockets fast enough. This is only triggerable
      locally.
      
      Fixes: 9f7d653b ("sctp: Add Auto-ASCONF support (core).")
      Reported-by: default avatarJi Jianwen <jiji@redhat.com>
      Suggested-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Suggested-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d45a02d
  14. Jun 12, 2015
  15. Jun 11, 2015
  16. Jun 10, 2015
  17. Jun 09, 2015
  18. Jun 08, 2015
    • Samuel Ortiz's avatar
      NFC: Introduce vendor commands structures · 8115dd59
      Samuel Ortiz authored
      
      
      Together with inline routines to associate a vendor commands
      array with an NFC device.
      
      Vendor commands allow vendors to implement their very specific
      operations from driver code instead of adding new stack ops
      for non NFC generic commands.
      Vendors need to select their own unique IDs and use that as a
      namespace for defining sub commands.
      
      Signed-off-by: default avatarSamuel Ortiz <sameo@linux.intel.com>
      8115dd59
Loading