Skip to content
  1. Jan 25, 2016
  2. Jan 21, 2016
  3. Jan 20, 2016
  4. Jan 19, 2016
  5. Jan 18, 2016
  6. Jan 16, 2016
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate orig_node free function · 42eff6a6
      Sven Eckelmann authored
      
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_orig_node_free_ref.
      
      Fixes: 72822225 ("batman-adv: Fix rcu_barrier() miss due to double call_rcu() in TT code")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      42eff6a6
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_hard_iface free function · b4d922cf
      Sven Eckelmann authored
      
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_hardif_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      b4d922cf
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate neigh_ifinfo free function · ae3e1e36
      Sven Eckelmann authored
      
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_neigh_ifinfo_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      ae3e1e36
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_hardif_neigh_node free function · f6389692
      Sven Eckelmann authored
      
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_hardif_neigh_free_ref.
      
      Fixes: cef63419 ("batman-adv: add list of unique single hop neighbors per hard-interface")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      f6389692
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_neigh_node free function · 2baa753c
      Sven Eckelmann authored
      
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_neigh_node_free_ref.
      
      Fixes: 89652331 ("batman-adv: split tq information in neigh_node struct")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      2baa753c
    • Sven Eckelmann's avatar
      batman-adv: Drop immediate batadv_orig_ifinfo free function · deed9660
      Sven Eckelmann authored
      
      
      It is not allowed to free the memory of an object which is part of a list
      which is protected by rcu-read-side-critical sections without making sure
      that no other context is accessing the object anymore. This usually happens
      by removing the references to this object and then waiting until the rcu
      grace period is over and no one (allowedly) accesses it anymore.
      
      But the _now functions ignore this completely. They free the object
      directly even when a different context still tries to access it. This has
      to be avoided and thus these functions must be removed and all functions
      have to use batadv_orig_ifinfo_free_ref.
      
      Fixes: 7351a482 ("batman-adv: split out router from orig_node")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      deed9660
    • Sven Eckelmann's avatar
      batman-adv: Avoid recursive call_rcu for batadv_nc_node · 44e8e7e9
      Sven Eckelmann authored
      
      
      The batadv_nc_node_free_ref function uses call_rcu to delay the free of the
      batadv_nc_node object until no (already started) rcu_read_lock is enabled
      anymore. This makes sure that no context is still trying to access the
      object which should be removed. But batadv_nc_node also contains a
      reference to orig_node which must be removed.
      
      The reference drop of orig_node was done in the call_rcu function
      batadv_nc_node_free_rcu but should actually be done in the
      batadv_nc_node_release function to avoid nested call_rcus. This is
      important because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will
      not detect the inner call_rcu as relevant for its execution. Otherwise this
      barrier will most likely be inserted in the queue before the callback of
      the first call_rcu was executed. The caller of rcu_barrier will therefore
      continue to run before the inner call_rcu callback finished.
      
      Fixes: d56b1705 ("batman-adv: network coding - detect coding nodes and remove these after timeout")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      44e8e7e9
    • Sven Eckelmann's avatar
      batman-adv: Avoid recursive call_rcu for batadv_bla_claim · 63b39927
      Sven Eckelmann authored
      
      
      The batadv_claim_free_ref function uses call_rcu to delay the free of the
      batadv_bla_claim object until no (already started) rcu_read_lock is enabled
      anymore. This makes sure that no context is still trying to access the
      object which should be removed. But batadv_bla_claim also contains a
      reference to backbone_gw which must be removed.
      
      The reference drop of backbone_gw was done in the call_rcu function
      batadv_claim_free_rcu but should actually be done in the
      batadv_claim_release function to avoid nested call_rcus. This is important
      because rcu_barrier (e.g. batadv_softif_free or batadv_exit) will not
      detect the inner call_rcu as relevant for its execution. Otherwise this
      barrier will most likely be inserted in the queue before the callback of
      the first call_rcu was executed. The caller of rcu_barrier will therefore
      continue to run before the inner call_rcu callback finished.
      
      Fixes: 23721387 ("batman-adv: add basic bridge loop avoidance code")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Acked-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarAntonio Quartulli <a@unstable.cc>
      63b39927
  7. Jan 15, 2016
    • Nikolay Aleksandrov's avatar
      bridge: fix lockdep addr_list_lock false positive splat · c6894dec
      Nikolay Aleksandrov authored
      
      
      After promisc mode management was introduced a bridge device could do
      dev_set_promiscuity from its ndo_change_rx_flags() callback which in
      turn can be called after the bridge's addr_list_lock has been taken
      (e.g. by dev_uc_add). This causes a false positive lockdep splat because
      the port interfaces' addr_list_lock is taken when br_manage_promisc()
      runs after the bridge's addr list lock was already taken.
      To remove the false positive introduce a custom bridge addr_list_lock
      class and set it on bridge init.
      A simple way to reproduce this is with the following:
      $ brctl addbr br0
      $ ip l add l br0 br0.100 type vlan id 100
      $ ip l set br0 up
      $ ip l set br0.100 up
      $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering
      $ brctl addif br0 eth0
      Splat:
      [   43.684325] =============================================
      [   43.684485] [ INFO: possible recursive locking detected ]
      [   43.684636] 4.4.0-rc8+ #54 Not tainted
      [   43.684755] ---------------------------------------------
      [   43.684906] brctl/1187 is trying to acquire lock:
      [   43.685047]  (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
      [   43.685460]  but task is already holding lock:
      [   43.685618]  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
      [   43.686015]  other info that might help us debug this:
      [   43.686316]  Possible unsafe locking scenario:
      
      [   43.686743]        CPU0
      [   43.686967]        ----
      [   43.687197]   lock(_xmit_ETHER);
      [   43.687544]   lock(_xmit_ETHER);
      [   43.687886] *** DEADLOCK ***
      
      [   43.688438]  May be due to missing lock nesting notation
      
      [   43.688882] 2 locks held by brctl/1187:
      [   43.689134]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81510317>] rtnl_lock+0x17/0x20
      [   43.689852]  #1:  (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
      [   43.690575] stack backtrace:
      [   43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54
      [   43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
      [   43.691770]  ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0
      [   43.692425]  ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139
      [   43.693071]  ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020
      [   43.693709] Call Trace:
      [   43.693931]  [<ffffffff81360ceb>] dump_stack+0x4b/0x70
      [   43.694199]  [<ffffffff810d0466>] __lock_acquire+0x1e46/0x1e90
      [   43.694483]  [<ffffffff81537139>] ? netlink_broadcast_filtered+0x139/0x3e0
      [   43.694789]  [<ffffffff8153b5da>] ? nlmsg_notify+0x5a/0xc0
      [   43.695064]  [<ffffffff810d10f5>] lock_acquire+0xe5/0x1f0
      [   43.695340]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
      [   43.695623]  [<ffffffff815edea5>] _raw_spin_lock_bh+0x45/0x80
      [   43.695901]  [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
      [   43.696180]  [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
      [   43.696460]  [<ffffffff8150189c>] dev_set_promiscuity+0x3c/0x50
      [   43.696750]  [<ffffffffa0586845>] br_port_set_promisc+0x25/0x50 [bridge]
      [   43.697052]  [<ffffffffa05869aa>] br_manage_promisc+0x8a/0xe0 [bridge]
      [   43.697348]  [<ffffffffa05826ee>] br_dev_change_rx_flags+0x1e/0x20 [bridge]
      [   43.697655]  [<ffffffff81501532>] __dev_set_promiscuity+0x132/0x1f0
      [   43.697943]  [<ffffffff81501672>] __dev_set_rx_mode+0x82/0x90
      [   43.698223]  [<ffffffff815072de>] dev_uc_add+0x5e/0x80
      [   43.698498]  [<ffffffffa05b3c62>] vlan_device_event+0x542/0x650 [8021q]
      [   43.698798]  [<ffffffff8109886d>] notifier_call_chain+0x5d/0x80
      [   43.699083]  [<ffffffff810988b6>] raw_notifier_call_chain+0x16/0x20
      [   43.699374]  [<ffffffff814f456e>] call_netdevice_notifiers_info+0x6e/0x80
      [   43.699678]  [<ffffffff814f4596>] call_netdevice_notifiers+0x16/0x20
      [   43.699973]  [<ffffffffa05872be>] br_add_if+0x47e/0x4c0 [bridge]
      [   43.700259]  [<ffffffffa058801e>] add_del_if+0x6e/0x80 [bridge]
      [   43.700548]  [<ffffffffa0588b5f>] br_dev_ioctl+0xaf/0xc0 [bridge]
      [   43.700836]  [<ffffffff8151a7ac>] dev_ifsioc+0x30c/0x3c0
      [   43.701106]  [<ffffffff8151aac9>] dev_ioctl+0xf9/0x6f0
      [   43.701379]  [<ffffffff81254345>] ? mntput_no_expire+0x5/0x450
      [   43.701665]  [<ffffffff812543ee>] ? mntput_no_expire+0xae/0x450
      [   43.701947]  [<ffffffff814d7b02>] sock_do_ioctl+0x42/0x50
      [   43.702219]  [<ffffffff814d8175>] sock_ioctl+0x1e5/0x290
      [   43.702500]  [<ffffffff81242d0b>] do_vfs_ioctl+0x2cb/0x5c0
      [   43.702771]  [<ffffffff81243079>] SyS_ioctl+0x79/0x90
      [   43.703033]  [<ffffffff815eebb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
      
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Bridge list <bridge@lists.linux-foundation.org>
      CC: Andy Gospodarek <gospo@cumulusnetworks.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Fixes: 2796d0c6 ("bridge: Automatically manage port promiscuous mode.")
      Reported-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6894dec
    • Geert Uytterhoeven's avatar
      net: sctp: Move sequence start handling into sctp_transport_get_idx() · fb331185
      Geert Uytterhoeven authored
      
      
      net/sctp/proc.c: In function ‘sctp_transport_get_idx’:
      net/sctp/proc.c:313: warning: ‘obj’ may be used uninitialized in this function
      
      This is currently a false positive, as all callers check for a zero
      offset first, and handle this case in the exact same way.
      
      Move the check and handling into sctp_transport_get_idx() to kill the
      compiler warning, and avoid future bugs.
      
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb331185
    • Eric Dumazet's avatar
      ipv6: update skb->csum when CE mark is propagated · 34ae6a1a
      Eric Dumazet authored
      
      
      When a tunnel decapsulates the outer header, it has to comply
      with RFC 6080 and eventually propagate CE mark into inner header.
      
      It turns out IP6_ECN_set_ce() does not correctly update skb->csum
      for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
      messages and stack traces.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34ae6a1a
    • Xin Long's avatar
      sctp: support to lookup with ep+paddr in transport rhashtable · 65a5124a
      Xin Long authored
      
      
      Now, when we sendmsg, we translate the ep to laddr by selecting the
      first element of the list, and then do a lookup for a transport.
      
      But sctp_hash_cmp() will compare it against asoc addr_list, which may
      be a subset of ep addr_list, meaning that this chosen laddr may not be
      there, and thus making it impossible to find the transport.
      
      So we fix it by using ep + paddr to lookup transports in hashtable. In
      sctp_hash_cmp, if .ep is set, we will check if this ep == asoc->ep,
      or we will do the laddr check.
      
      Fixes: d6c0256a ("sctp: add the rhashtable apis for sctp global transport hashtable")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reported-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65a5124a
    • Konstantin Khlebnikov's avatar
      net: preserve IP control block during GSO segmentation · 9207f9d4
      Konstantin Khlebnikov authored
      
      
      Skb_gso_segment() uses skb control block during segmentation.
      This patch adds 32-bytes room for previous control block which
      will be copied into all resulting segments.
      
      This patch fixes kernel crash during fragmenting forwarded packets.
      Fragmentation requires valid IP CB in skb for clearing ip options.
      Also patch removes custom save/restore in ovs code, now it's redundant.
      
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Link: http://lkml.kernel.org/r/CALYGNiP-0MZ-FExV2HutTvE9U-QQtkKSoE--KN=JQE5STYsjAA@mail.gmail.com
      
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9207f9d4
    • Johannes Weiner's avatar
      mm: memcontrol: switch to the updated jump-label API · ef12947c
      Johannes Weiner authored
      
      
      According to <linux/jump_label.h> the direct use of struct static_key is
      deprecated.  Update the socket and slab accounting code accordingly.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reported-by: default avatarJason Baron <jbaron@akamai.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef12947c
    • Johannes Weiner's avatar
      mm: memcontrol: generalize the socket accounting jump label · 80e95fe0
      Johannes Weiner authored
      
      
      The unified hierarchy memory controller is going to use this jump label
      as well to control the networking callbacks.  Move it to the memory
      controller code and give it a more generic name.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80e95fe0
    • Johannes Weiner's avatar
      net: tcp_memcontrol: simplify linkage between socket and page counter · baac50bb
      Johannes Weiner authored
      
      
      There won't be any separate counters for socket memory consumed by
      protocols other than TCP in the future.  Remove the indirection and link
      sockets directly to their owning memory cgroup.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      baac50bb
    • Johannes Weiner's avatar
      net: tcp_memcontrol: sanitize tcp memory accounting callbacks · e805605c
      Johannes Weiner authored
      
      
      There won't be a tcp control soft limit, so integrating the memcg code
      into the global skmem limiting scheme complicates things unnecessarily.
      Replace this with simple and clear charge and uncharge calls--hidden
      behind a jump label--to account skb memory.
      
      Note that this is not purely aesthetic: as a result of shoehorning the
      per-memcg code into the same memory accounting functions that handle the
      global level, the old code would compare the per-memcg consumption
      against the smaller of the per-memcg limit and the global limit.  This
      allowed the total consumption of multiple sockets to exceed the global
      limit, as long as the individual sockets stayed within bounds.  After
      this change, the code will always compare the per-memcg consumption to
      the per-memcg limit, and the global consumption to the global limit, and
      thus close this loophole.
      
      Without a soft limit, the per-memcg memory pressure state in sockets is
      generally questionable.  However, we did it until now, so we continue to
      enter it when the hard limit is hit, and packets are dropped, to let
      other sockets in the cgroup know that they shouldn't grow their transmit
      windows, either.  However, keep it simple in the new callback model and
      leave memory pressure lazily when the next packet is accepted (as
      opposed to doing it synchroneously when packets are processed).  When
      packets are dropped, network performance will already be in the toilet,
      so that should be a reasonable trade-off.
      
      As described above, consumption is now checked on the per-memcg level
      and the global level separately.  Likewise, memory pressure states are
      maintained on both the per-memcg level and the global level, and a
      socket is considered under pressure when either level asserts as much.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e805605c
    • Johannes Weiner's avatar
      net: tcp_memcontrol: simplify the per-memcg limit access · 80f23124
      Johannes Weiner authored
      
      
      tcp_memcontrol replicates the global sysctl_mem limit array per cgroup,
      but it only ever sets these entries to the value of the memory_allocated
      page_counter limit.  Use the latter directly.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80f23124
    • Johannes Weiner's avatar
      net: tcp_memcontrol: remove dead per-memcg count of allocated sockets · af95d7df
      Johannes Weiner authored
      
      
      The number of allocated sockets is used for calculations in the soft
      limit phase, where packets are accepted but the socket is under memory
      pressure.
       Since there is no soft limit phase in tcp_memcontrol, and memory
      pressure is only entered when packets are already dropped, this is
      actually dead code.  Remove it.
      
      As this is the last user of parent_cg_proto(), remove that too.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af95d7df
    • Johannes Weiner's avatar
      net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label · 3d596f7b
      Johannes Weiner authored
      
      
      Move the jump-label from sock_update_memcg() and sock_release_memcg() to
      the callsite, and so eliminate those function calls when socket
      accounting is not enabled.
      
      This also eliminates the need for dummy functions because the calls will
      be optimized away if the Kconfig options are not enabled.
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d596f7b
    • Vladimir Davydov's avatar
      memcg: do not allow to disable tcp accounting after limit is set · 9ee11ba4
      Vladimir Davydov authored
      
      
      There are two bits defined for cg_proto->flags - MEMCG_SOCK_ACTIVATED
      and MEMCG_SOCK_ACTIVE - both are set in tcp_update_limit, but the former
      is never cleared while the latter can be cleared by unsetting the limit.
      This allows to disable tcp socket accounting for new sockets after it
      was enabled by writing -1 to memory.kmem.tcp.limit_in_bytes while still
      guaranteeing that memcg_socket_limit_enabled static key will be
      decremented on memcg destruction.
      
      This functionality looks dubious, because it is not clear what a use
      case would be.  By enabling tcp accounting a user accepts the price.  If
      they then find the performance degradation unacceptable, they can always
      restart their workload with tcp accounting disabled.  It does not seem
      there is any need to flip it while the workload is running.
      
      Besides, it contradicts to how kmem accounting API works: writing
      whatever to memory.kmem.limit_in_bytes enables kmem accounting for the
      cgroup in question, after which it cannot be disabled.  Therefore one
      might expect that writing -1 to memory.kmem.tcp.limit_in_bytes just
      enables socket accounting w/o limiting it, which might be useful by
      itself, but it isn't true.
      
      Since this API peculiarity is not documented anywhere, I propose to drop
      it.  This will allow to simplify the code by dropping cg_proto->flags.
      
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ee11ba4
    • Vladimir Davydov's avatar
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov authored
      
      
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  8. Jan 14, 2016
  9. Jan 13, 2016
Loading