Skip to content
  1. Jun 28, 2006
  2. Jun 26, 2006
  3. Jun 25, 2006
  4. Jun 23, 2006
  5. Jun 22, 2006
    • Richard Purdie's avatar
      [PATCH] zlib_inflate: Upgrade library code to a recent version · 4f3865fb
      Richard Purdie authored
      Upgrade the zlib_inflate implementation in the kernel from a patched
      version 1.1.3/4 to a patched 1.2.3.
      
      The code in the kernel is about seven years old and I noticed that the
      external zlib library's inflate performance was significantly faster (~50%)
      than the code in the kernel on ARM (and faster again on x86_32).
      
      For comparison the newer deflate code is 20% slower on ARM and 50% slower
      on x86_32 but gives an approx 1% compression ratio improvement.  I don't
      consider this to be an improvement for kernel use so have no plans to
      change the zlib_deflate code.
      
      Various changes have been made to the zlib code in the kernel, the most
      significant being the extra functions/flush option used by ppp_deflate.
      This update reimplements the features PPP needs to ensure it continues to
      work.
      
      This code has been tested on ARM under both JFFS2 (with zlib compression
      enabled) and ppp_deflate and on x86_32.  JFFS2 sees an approx.  10% real
      world file read speed improvement.
      
      This patch also removes ZLIB_VERSION as it no longer has a correct value.
      We don't need version checks anyway as the kernel's module handling will
      take care of that for us.  This removal is also more in keeping with the
      zlib author's wishes (http://www.zlib.net/zlib_faq.html#faq24
      
      ) and I've
      added something to the zlib.h header to note its a modified version.
      
      Signed-off-by: default avatarRichard Purdie <rpurdie@rpsys.net>
      Acked-by: default avatarJoern Engel <joern@wh.fh-wedel.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4f3865fb
  6. Jun 21, 2006
  7. Jun 05, 2006
  8. May 21, 2006
  9. May 12, 2006
  10. Apr 27, 2006
  11. Apr 21, 2006
    • David Woodhouse's avatar
      [RBTREE] Merge colour and parent fields of struct rb_node. · 55a98102
      David Woodhouse authored
      
      
      We only used a single bit for colour information, so having a whole
      machine word of space allocated for it was a bit wasteful. Instead,
      store it in the lowest bit of the 'parent' pointer, since that was
      always going to be aligned anyway.
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      55a98102
    • David Woodhouse's avatar
      [RBTREE] Remove dead code in rb_erase() · 1975e593
      David Woodhouse authored
      
      
      Observe rb_erase(), when the victim node 'old' has two children so
      neither of the simple cases at the beginning are taken.
      
      Observe that it effectively does an 'rb_next()' operation to find the
      next (by value) node in the tree. That is; we go to the victim's
      right-hand child and then follow left-hand pointers all the way
      down the tree as far as we can until we find the next node 'node'. We
      end up with 'node' being either the same immediate right-hand child of
      'old', or one of its descendants on the far left-hand side.
      
      For a start, we _know_ that 'node' has a parent. We can drop that check.
      
      We also know that if 'node's parent is 'old', then 'node' is the
      right-hand child of its parent. And that if 'node's parent is _not_
      'old', then 'node' is the left-hand child of its parent.
      
      So instead of checking for 'node->rb_parent == old' in one place and
      also checking 'node's heritage separately when we're trying to change
      its link from its parent, we can shuffle things around a bit and do
      it like this...
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      1975e593
  12. Apr 19, 2006
    • Tim Chen's avatar
      [PATCH] Kconfig.debug: Set DEBUG_MUTEX to off by default · cca57c5b
      Tim Chen authored
      
      
      DEBUG_MUTEX flag is on by default in current kernel configuration.
      
      During performance testing, we saw mutex debug functions like
      mutex_debug_check_no_locks_freed (called by kfree()) is expensive as it
      goes through a global list of memory areas with mutex lock and do the
      checking.  For benchmarks such as Volanomark and Hackbench, we have seen
      more than 40% drop in performance on some platforms.  We suggest to set
      DEBUG_MUTEX off by default.  Or at least do that later when we feel that
      the mutex changes in the current code have stabilized.
      
      Signed-off-by: default avatarTim Chen <tim.c.chen@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      cca57c5b
  13. Apr 14, 2006
    • NeilBrown's avatar
      [PATCH] sysfs: Allow sysfs attribute files to be pollable · 4508a7a7
      NeilBrown authored
      
      
      It works like this:
        Open the file
        Read all the contents.
        Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
        When poll returns,
           close the file and go to top of loop.
         or lseek to start of file and go back to the 'read'.
      
      Events are signaled by an object manager calling
         sysfs_notify(kobj, dir, attr);
      
      If the dir is non-NULL, it is used to find a subdirectory which
      contains the attribute (presumably created by sysfs_create_group).
      
      This has a cost of one int  per attribute, one wait_queuehead per kobject,
      one int per open file.
      
      The name "sysfs_notify" may be confused with the inotify
      functionality.  Maybe it would be nice to support inotify for sysfs
      attributes as well?
      
      This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
      to be pollable
      
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      4508a7a7
  14. Apr 11, 2006
  15. Mar 30, 2006
  16. Mar 27, 2006
  17. Mar 26, 2006
    • Akinobu Mita's avatar
      [PATCH] bitops: hweight() speedup · f9b41929
      Akinobu Mita authored
      <linux@horizon.com> wrote:
      
      This is an extremely well-known technique.  You can see a similar version that
      uses a multiply for the last few steps at
      http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel whch
      refers to "Software Optimization Guide for AMD Athlon 64 and Opteron
      Processors"
      http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
      
      
      
      It's section 8.6, "Efficient Implementation of Population-Count Function in
      32-bit Mode", pages 179-180.
      
      It uses the name that I am more familiar with, "popcount" (population count),
      although "Hamming weight" also makes sense.
      
      Anyway, the proof of correctness proceeds as follows:
      
      	b = a - ((a >> 1) & 0x55555555);
      	c = (b & 0x33333333) + ((b >> 2) & 0x33333333);
      	d = (c + (c >> 4)) & 0x0f0f0f0f;
      #if SLOW_MULTIPLY
      	e = d + (d >> 8)
      	f = e + (e >> 16);
      	return f & 63;
      #else
      	/* Useful if multiply takes at most 4 cycles */
      	return (d * 0x01010101) >> 24;
      #endif
      
      The input value a can be thought of as 32 1-bit fields each holding their own
      hamming weight.  Now look at it as 16 2-bit fields.  Each 2-bit field a1..a0
      has the value 2*a1 + a0.  This can be converted into the hamming weight of the
      2-bit field a1+a0 by subtracting a1.
      
      That's what the (a >> 1) & mask subtraction does.  Since there can be no
      borrows, you can just do it all at once.
      
      Enumerating the 4 possible cases:
      
      0b00 = 0  ->  0 - 0 = 0
      0b01 = 1  ->  1 - 0 = 1
      0b10 = 2  ->  2 - 1 = 1
      0b11 = 3  ->  3 - 1 = 2
      
      The next step consists of breaking up b (made of 16 2-bir fields) into
      even and odd halves and adding them into 4-bit fields.  Since the largest
      possible sum is 2+2 = 4, which will not fit into a 4-bit field, the 2-bit
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                "which will not fit into a 2-bit field"
      
      fields have to be masked before they are added.
      
      After this point, the masking can be delayed.  Each 4-bit field holds a
      population count from 0..4, taking at most 3 bits.  These numbers can be added
      without overflowing a 4-bit field, so we can compute c + (c >> 4), and only
      then mask off the unwanted bits.
      
      This produces d, a number of 4 8-bit fields, each in the range 0..8.  From
      this point, we can shift and add d multiple times without overflowing an 8-bit
      field, and only do a final mask at the end.
      
      The number to mask with has to be at least 63 (so that 32 on't be truncated),
      but can also be 128 or 255.  The x86 has a special encoding for signed
      immediate byte values -128..127, so the value of 255 is slower.  On other
      processors, a special "sign extend byte" instruction might be faster.
      
      On a processor with fast integer multiplies (Athlon but not P4), you can
      reduce the final few serially dependent instructions to a single integer
      multiply.  Consider d to be 3 8-bit values d3, d2, d1 and d0, each in the
      range 0..8.  The multiply forms the partial products:
      
      	           d3 d2 d1 d0
      	        d3 d2 d1 d0
      	     d3 d2 d1 d0
      	+ d3 d2 d1 d0
      	----------------------
      	           e3 e2 e1 e0
      
      Where e3 = d3 + d2 + d1 + d0.   e2, e1 and e0 obviously cannot generate
      any carries.
      
      Signed-off-by: default avatarAkinobu Mita <mita@miraclelinux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f9b41929
Loading