Skip to content
  1. Aug 05, 2009
  2. Aug 03, 2009
    • Eric Dumazet's avatar
      neigh: Convert garbage collection from softirq to workqueue · e4c4e448
      Eric Dumazet authored
      
      
      Current neigh_periodic_timer() function is fired by timer IRQ, and
      scans one hash bucket each round (very litle work in fact)
      
      As we are supposed to scan whole hash table in 15 seconds, this means
      neigh_periodic_timer() can be fired very often. (depending on the number
      of concurrent hash entries we stored in this table)
      
      Converting this to a workqueue permits scanning whole table, minimizing
      icache pollution, and firing this work every 15 seconds, independantly
      of hash table size.
      
      This 15 seconds delay is not a hard number, as work is a deferrable one.
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4c4e448
  3. Jul 27, 2009
  4. Jul 24, 2009
  5. Jul 20, 2009
  6. Jul 17, 2009
    • Eric Dumazet's avatar
      net: sock_copy() fixes · 4dc6dc71
      Eric Dumazet authored
      
      
      Commit e912b114
      (net: sk_prot_alloc() should not blindly overwrite memory)
      took care of not zeroing whole new socket at allocation time.
      
      sock_copy() is another spot where we should be very careful.
      We should not set refcnt to a non null value, until
      we are sure other fields are correctly setup, or
      a lockless reader could catch this socket by mistake,
      while not fully (re)initialized.
      
      This patch puts sk_node & sk_refcnt to the very beginning
      of struct sock to ease sock_copy() & sk_prot_alloc() job.
      
      We add appropriate smp_wmb() before sk_refcnt initializations
      to match our RCU requirements (changes to sock keys should
      be committed to memory before sk_refcnt setting)
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4dc6dc71
  7. Jul 13, 2009
  8. Jul 12, 2009
  9. Jul 10, 2009
    • Jiri Olsa's avatar
      net: adding memory barrier to the poll and receive callbacks · a57de0b4
      Jiri Olsa authored
      
      
      Adding memory barrier after the poll_wait function, paired with
      receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
      to wrap the memory barrier.
      
      Without the memory barrier, following race can happen.
      The race fires, when following code paths meet, and the tp->rcv_nxt
      and __add_wait_queue updates stay in CPU caches.
      
      CPU1                         CPU2
      
      sys_select                   receive packet
        ...                        ...
        __add_wait_queue           update tp->rcv_nxt
        ...                        ...
        tp->rcv_nxt check          sock_def_readable
        ...                        {
        schedule                      ...
                                      if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                              wake_up_interruptible(sk->sk_sleep)
                                      ...
                                   }
      
      If there was no cache the code would work ok, since the wait_queue and
      rcv_nxt are opposit to each other.
      
      Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
      passed the tp->rcv_nxt check and sleeps, or will get the new value for
      tp->rcv_nxt and will return with new data mask.
      In both cases the process (CPU1) is being added to the wait queue, so the
      waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
      
      The bad case is when the __add_wait_queue changes done by CPU1 stay in its
      cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
      endup calling schedule and sleep forever if there are no more data on the
      socket.
      
      Calls to poll_wait in following modules were ommited:
      	net/bluetooth/af_bluetooth.c
      	net/irda/af_irda.c
      	net/irda/irnet/irnet_ppp.c
      	net/mac80211/rc80211_pid_debugfs.c
      	net/phonet/socket.c
      	net/rds/af_rds.c
      	net/rfkill/core.c
      	net/sunrpc/cache.c
      	net/sunrpc/rpc_pipe.c
      	net/tipc/socket.c
      
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a57de0b4
  10. Jul 09, 2009
  11. Jul 08, 2009
  12. Jul 06, 2009
  13. Jun 27, 2009
  14. Jun 23, 2009
    • Herbert Xu's avatar
      net: Move rx skb_orphan call to where needed · d55d87fd
      Herbert Xu authored
      
      
      In order to get the tun driver to account packets, we need to be
      able to receive packets with destructors set.  To be on the safe
      side, I added an skb_orphan call for all protocols by default since
      some of them (IP in particular) cannot handle receiving packets
      destructors properly.
      
      Now it seems that at least one protocol (CAN) expects to be able
      to pass skb->sk through the rx path without getting clobbered.
      
      So this patch attempts to fix this properly by moving the skb_orphan
      call to where it's actually needed.  In particular, I've added it
      to skb_set_owner_[rw] which is what most users of skb->destructor
      call.
      
      This is actually an improvement for tun too since it means that
      we only give back the amount charged to the socket when the skb
      is passed to another socket that will also be charged accordingly.
      
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarOliver Hartkopp <olver@hartkopp.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d55d87fd
  15. Jun 18, 2009
  16. Jun 15, 2009
    • Vegard Nossum's avatar
      net: annotate struct sock bitfield · a98b65a3
      Vegard Nossum authored
      2009/2/24 Ingo Molnar <mingo@elte.hu>:
      > ok, this is the last warning i have from today's overnight -tip
      > testruns - a 32-bit system warning in sock_init_data():
      >
      > [    2.610389] NET: Registered protocol family 16
      > [    2.616138] initcall netlink_proto_init+0x0/0x170 returned 0 after 7812 usecs
      > [    2.620010] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f642c184)
      > [    2.624002] 010000000200000000000000604990c000000000000000000000000000000000
      > [    2.634076]  i i i i i i u u i i i i i i i i i i i i i i i i i i i i i i i i
      > [    2.641038]          ^
      > [    2.643376]
      > [    2.644004] Pid: 1, comm: swapper Not tainted (2.6.29-rc6-tip-01751-g4d1c22c-dirty #885)
      > [    2.648003] EIP: 0060:[<c07141a1>] EFLAGS: 00010282 CPU: 0
      > [    2.652008] EIP is at sock_init_data+0xa1/0x190
      > [    2.656003] EAX: 0001a800 EBX: f6836c00 ECX: 00463000 EDX: c0e46fe0
      > [    2.660003] ESI: f642c180 EDI: c0b83088 EBP: f6863ed8 ESP: c0c412ec
      > [    2.664003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      > [    2.668003] CR0: 8005003b CR2: f682c400 CR3: 00b91000 CR4: 000006f0
      > [    2.672003] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      > [    2.676003] DR6: ffff4ff0 DR7: 00000400
      > [    2.680002]  [<c07423e5>] __netlink_create+0x35/0xa0
      > [    2.684002]  [<c07443cc>] netlink_kernel_create+0x4c/0x140
      > [    2.688002]  [<c072755e>] rtnetlink_net_init+0x1e/0x40
      > [    2.696002]  [<c071b601>] register_pernet_operations+0x11/0x30
      > [    2.700002]  [<c071b72c>] register_pernet_subsys+0x1c/0x30
      > [    2.704002]  [<c0bf3c8c>] rtnetlink_init+0x4c/0x100
      > [    2.708002]  [<c0bf4669>] netlink_proto_init+0x159/0x170
      > [    2.712002]  [<c0101124>] do_one_initcall+0x24/0x150
      > [    2.716002]  [<c0bbf3c7>] do_initcalls+0x27/0x40
      > [    2.723201]  [<c0bbf3fc>] do_basic_setup+0x1c/0x20
      > [    2.728002]  [<c0bbfb8a>] kernel_init+0x5a/0xa0
      > [    2.732002]  [<c0103e47>] kernel_thread_helper+0x7/0x10
      > [    2.736002]  [<ffffffff>] 0xffffffff
      
      We fix this false positive by annotating the bitfield in struct
      sock.
      
      Reported-by: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
      a98b65a3
    • Vegard Nossum's avatar
      fe55f6d5
  17. Jun 12, 2009
  18. Jun 11, 2009
    • Timo Teras's avatar
      neigh: fix state transition INCOMPLETE->FAILED via Netlink request · 5ef12d98
      Timo Teras authored
      
      
      The current code errors out the INCOMPLETE neigh entry skb queue only from
      the timer if maximum probes have been attempted and there has been no reply.
      This also causes the transtion to FAILED state.
      
      However, the neigh entry can be also updated via Netlink to inform that the
      address is unavailable.  Currently, neigh_update() just stops the timers and
      leaves the pending skb's unreleased. This results that the clean up code in
      the timer callback is never called, preventing also proper garbage collection.
      
      This fixes neigh_update() to process the pending skb queue immediately if
      INCOMPLETE -> FAILED state transtion occurs due to a Netlink request.
      
      Signed-off-by: default avatarTimo Teras <timo.teras@iki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ef12d98
    • Eric Dumazet's avatar
      net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e
      Eric Dumazet authored
      
      
      One of the problem with sock memory accounting is it uses
      a pair of sock_hold()/sock_put() for each transmitted packet.
      
      This slows down bidirectional flows because the receive path
      also needs to take a refcount on socket and might use a different
      cpu than transmit path or transmit completion path. So these
      two atomic operations also trigger cache line bounces.
      
      We can see this in tx or tx/rx workloads (media gateways for example),
      where sock_wfree() can be in top five functions in profiles.
      
      We use this sock_hold()/sock_put() so that sock freeing
      is delayed until all tx packets are completed.
      
      As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
      by one unit at init time, until sk_free() is called.
      Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
      to decrement initial offset and atomicaly check if any packets
      are in flight.
      
      skb_set_owner_w() doesnt call sock_hold() anymore
      
      sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
      reached 0 to perform the final freeing.
      
      Drawback is that a skb->truesize error could lead to unfreeable sockets, or
      even worse, prematurely calling __sk_free() on a live socket.
      
      Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
      on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
      contention point. 5 % speedup on a UDP transmit workload (depends
      on number of flows), lowering TX completion cpu usage.
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b85a34e
  19. Jun 10, 2009
    • Johannes Berg's avatar
      mac80211: do not pass PS frames out of mac80211 again · 8f77f384
      Johannes Berg authored
      
      
      In order to handle powersave frames properly we had needed
      to pass these out to the device queues again, and introduce
      the skb->requeue bit. This, however, also has unnecessary
      overhead by needing to 'clean up' already tried frames, and
      this clean-up code is also buggy when software encryption
      is used.
      
      Instead of sending the frames via the master netdev queue
      again, simply put them into the pending queue. This also
      fixes a problem where frames for that particular station
      could be reordered when some were still on the software
      queues and older ones are re-injected into the software
      queue after them.
      
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      8f77f384
  20. Jun 09, 2009
  21. Jun 08, 2009
Loading