Skip to content
  1. Apr 30, 2007
    • Jens Axboe's avatar
      cfq-iosched: rework the whole round-robin list concept · d9e7620e
      Jens Axboe authored
      
      
      Drawing on some inspiration from the CFS CPU scheduler design, overhaul
      the pending cfq_queue concept list management. Currently CFQ uses a
      doubly linked list per priority level for sorting and service uses.
      Kill those lists and maintain an rbtree of cfq_queue's, sorted by when
      to service them.
      
      This unfortunately means that the ionice levels aren't as strong
      anymore, will work on improving those later. We only scale the slice
      time now, not the number of times we service. This means that latency
      is better (for all priority levels), but that the distinction between
      the highest and lower levels aren't as big.
      
      The diffstat speaks for itself.
      
       cfq-iosched.c |  363 +++++++++++++++++---------------------------------
       1 file changed, 125 insertions(+), 238 deletions(-)
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      d9e7620e
    • Jens Axboe's avatar
      cfq-iosched: minor updates · 1afba045
      Jens Axboe authored
      
      
      - Move the queue_new flag clear to when the queue is selected
      - Only select the non-first queue in cfq_get_best_queue(), if there's
        a substantial difference between the best and first.
      - Get rid of ->busy_rr
      - Only select a close cooperator, if the current queue is known to take
        a while to "think".
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      1afba045
    • Jens Axboe's avatar
      cfq-iosched: development update · 6d048f53
      Jens Axboe authored
      
      
      - Implement logic for detecting cooperating processes, so we
        choose the best available queue whenever possible.
      
      - Improve residual slice time accounting.
      
      - Remove dead code: we no longer see async requests coming in on
        sync queues. That part was removed a long time ago. That means
        that we can also remove the difference between cfq_cfqq_sync()
        and cfq_cfqq_class_sync(), they are now indentical. And we can
        kill the on_dispatch array, just make it a counter.
      
      - Allow a process to go into the current list, if it hasn't been
        serviced in this scheduler tick yet.
      
      Possible future improvements including caching the cfqq lookup
      in cfq_close_cooperator(), so we don't have to look it up twice.
      cfq_get_best_queue() should just use that last decision instead
      of doing it again.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      6d048f53
    • Jens Axboe's avatar
      cfq-iosched: improve preemption for cooperating tasks · 1e3335de
      Jens Axboe authored
      
      
      When testing the syslet async io approach, I discovered that CFQ
      sometimes didn't perform as well as expected. cfq_should_preempt()
      needs to better check for cooperating tasks, so fix that by allowing
      preemption of an equal priority queue if the recently queued request
      is as good a candidate for IO as the one we are currently waiting for.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      1e3335de
  2. Apr 25, 2007
    • Jens Axboe's avatar
      cfq-iosched: fix alias + front merge bug · 5044eed4
      Jens Axboe authored
      There's a really rare and obscure bug in CFQ, that causes a crash in
      cfq_dispatch_insert() due to rq == NULL.  One example of the resulting
      oops is seen here:
      
      	http://lkml.org/lkml/2007/4/15/41
      
      Neil correctly diagnosed the situation for how this can happen: if two
      concurrent requests with the exact same sector number (due to direct IO
      or aliasing between MD and the raw device access), the alias handling
      will add the request to the sortlist, but next_rq remains NULL.
      
      Read the more complete analysis at:
      
      	http://lkml.org/lkml/2007/4/25/57
      
      
      
      This looks like it requires md to trigger, even though it should
      potentially be possible to due with O_DIRECT (at least if you edit the
      kernel and doctor some of the unplug calls).
      
      The fix is to move the ->next_rq update to when we add a request to the
      rbtree. Then we remove the possibility for a request to exist in the
      rbtree code, but not have ->next_rq correctly updated.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5044eed4
  3. Apr 21, 2007
    • Jens Axboe's avatar
      cfq-iosched: fix sequential write regression · a9938006
      Jens Axboe authored
      
      
      We have a 10-15% performance regression for sequential writes on TCQ/NCQ
      enabled drives in 2.6.21-rcX after the CFQ update went in.  It has been
      reported by Valerie Clement <valerie.clement@bull.net> and the Intel
      testing folks.  The regression is because of CFQ's now more aggressive
      queue control, limiting the depth available to the device.
      
      This patches fixes that regression by allowing a greater depth when only
      one queue is busy.  It has been tested to not impact sync-vs-async
      workloads too much - we still do a lot better than 2.6.20.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9938006
  4. Apr 05, 2007
  5. Mar 27, 2007
    • Thibaut VARENE's avatar
      make elv_register() output atomic · 1ffb96c5
      Thibaut VARENE authored
      
      
      Booting 2.6.21-rc3-g45592145 I noticed the following on one of my
      machines in the bootlog:
      
      io scheduler noop registered<6>Time: jiffies clocksource has been installed.
      
      io scheduler deadline registered (default)
      
      Looking at block/elevator.c, it appears that elv_register() uses two
      consecutive printks in a non-atomic way, leading to the above glitch. The
      attached trivial patch fixes this issue, by using a single printk.
      
      Signed-off-by: default avatarThibaut VARENE <varenet@parisc-linux.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      1ffb96c5
    • Vasily Tarasov's avatar
      block: blk_max_pfn is somtimes wrong · f772b3d9
      Vasily Tarasov authored
      
      
      There is a small problem in handling page bounce.
      
      At the moment blk_max_pfn equals max_pfn, which is in fact not maximum
      possible _number_ of a page frame, but the _amount_ of page frames.  For
      example for the 32bit x86 node with 4Gb RAM, max_pfn = 0x100000, but not
      0xFFFF.
      
      request_queue structure has a member q->bounce_pfn and queue needs bounce
      pages for the pages _above_ this limit.  This routine is handled by
      blk_queue_bounce(), where the following check is produced:
      
      	if (q->bounce_pfn >= blk_max_pfn)
      		return;
      
      Assume, that a driver has set q->bounce_pfn to 0xFFFF, but blk_max_pfn
      equals 0x10000.  In such situation the check above fails and for each bio
      we always fall down for iterating over pages tied to the bio.
      
      I want to notice, that for quite a big range of device drivers (ide, md,
      ...) such problem doesn't happen because they use BLK_BOUNCE_ANY for
      bounce_pfn.  BLK_BOUNCE_ANY is defined as blk_max_pfn << PAGE_SHIFT, and
      then the check above doesn't fail.  But for other drivers, which obtain
      reuired value from drivers, it fails.  For example sata_nv uses
      ATA_DMA_MASK or dev->dma_mask.
      
      I propose to use (max_pfn - 1) for blk_max_pfn.  And the same for
      blk_max_low_pfn.  The patch also cleanses some checks related with
      bounce_pfn.
      
      Signed-off-by: default avatarVasily Tarasov <vtaras@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      f772b3d9
  6. Feb 21, 2007
    • Peter Zijlstra's avatar
      [PATCH] lockdep: annotate BLKPG_DEL_PARTITION · 6d740cd5
      Peter Zijlstra authored
      
      
      >=============================================
      >[ INFO: possible recursive locking detected ]
      >2.6.19-1.2909.fc7 #1
      >---------------------------------------------
      >anaconda/587 is trying to acquire lock:
      > (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24
      >
      >but task is already holding lock:
      > (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24
      >
      >other info that might help us debug this:
      >1 lock held by anaconda/587:
      > #0:  (&bdev->bd_mutex){--..}, at: [<c05fb380>] mutex_lock+0x21/0x24
      >
      >stack backtrace:
      > [<c0405812>] show_trace_log_lvl+0x1a/0x2f
      > [<c0405db2>] show_trace+0x12/0x14
      > [<c0405e36>] dump_stack+0x16/0x18
      > [<c043bd84>] __lock_acquire+0x116/0xa09
      > [<c043c960>] lock_acquire+0x56/0x6f
      > [<c05fb1fa>] __mutex_lock_slowpath+0xe5/0x24a
      > [<c05fb380>] mutex_lock+0x21/0x24
      > [<c04d82fb>] blkdev_ioctl+0x600/0x76d
      > [<c04946b1>] block_ioctl+0x1b/0x1f
      > [<c047ed5a>] do_ioctl+0x22/0x68
      > [<c047eff2>] vfs_ioctl+0x252/0x265
      > [<c047f04e>] sys_ioctl+0x49/0x63
      > [<c0404070>] syscall_call+0x7/0xb
      
      Annotate BLKPG_DEL_PARTITION's bd_mutex locking and add a little comment
      clarifying the bd_mutex locking, because I confused myself and initially
      thought the lock order was wrong too.
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d740cd5
    • Andrew Morton's avatar
      [PATCH] rework reserved major handling · b446b60e
      Andrew Morton authored
      
      
      Several people have reported failures in dynamic major device number handling
      due to the recent changes in there to avoid handing out the local/experimental
      majors.
      
      Rolf reports that this is due to a gcc-4.1.0 bug.
      
      The patch refactors that code a lot in an attempt to provoke the compiler into
      behaving.
      
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b446b60e
  7. Feb 17, 2007
  8. Feb 12, 2007
  9. Feb 11, 2007
  10. Feb 09, 2007
    • Neil Brown's avatar
      [PATCH] md: fix various bugs with aligned reads in RAID5 · 387bb173
      Neil Brown authored
      
      
      It is possible for raid5 to be sent a bio that is too big for an underlying
      device.  So if it is a READ that we pass stright down to a device, it will
      fail and confuse RAID5.
      
      So in 'chunk_aligned_read' we check that the bio fits within the parameters
      for the target device and if it doesn't fit, fall back on reading through
      the stripe cache and making lots of one-page requests.
      
      Note that this is the earliest time we can check against the device because
      earlier we don't have a lock on the device, so it could change underneath
      us.
      
      Also, the code for handling a retry through the cache when a read fails has
      not been tested and was badly broken.  This patch fixes that code.
      
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Cc: "Kai" <epimetreus@fastmail.fm>
      Cc: <stable@suse.de>
      Cc: <org@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      387bb173
  11. Jan 30, 2007
  12. Jan 23, 2007
    • Linas Vepstas's avatar
      [PATCH] elevator: move clearing of unplug flag earlier · 95543179
      Linas Vepstas authored
      A flag was recently added to the elevator code to avoid
      performing an unplug when reuests are being re-queued.
      The goal of this flag was to avoid a deep recursion that
      can occur when re-queueing requests after a SCSI device/host
      reset.  See http://lkml.org/lkml/2006/5/17/254
      
      
      
      However, that fix added the flag near the bottom of a case
      statement, where an earlier break (in an if statement) could
      transport one out of the case, without setting the flag.
      This patch sets the flag earlier in the case statement.
      
      I re-discovered the deep recursion recently during testing;
      I was told that it was a known problem, and the fix to it was
      in the kernel I was testing. Indeed it was ... but it didn't
      fix the bug. With the patch below, I no longer see the bug.
      
      Signed-off by: Linas Vepstas <linas@austin.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@suse.de>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95543179
  13. Jan 02, 2007
    • Jens Axboe's avatar
      [PATCH] cfq-iosched: merging problem · ec8acb69
      Jens Axboe authored
      
      
      Two issues:
      
      - The final return 1 should be a return 0, otherwise comparing cfqq is
        a noop.
      
      - bio_sync() only checks the sync flag, while rq_is_sync() checks both
        for READ and sync. The latter is what we want. Expand the bio check
        to include reads, and relax the restriction to allow merging of async
        io into sync requests.
      
      In the future we want to clean up the SYNC logic, right now it means
      both sync request (such as READ and O_DIRECT WRITE) and unplug-on-issue.
      Leave that for later.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ec8acb69
  14. Dec 22, 2006
  15. Dec 20, 2006
    • Jens Axboe's avatar
      [PATCH] cfq-iosched: don't allow sync merges across queues · da775265
      Jens Axboe authored
      
      
      Currently we allow any merge, even if the io originates from different
      processes. This can cause really bad starvation and unfairness, if those
      ios happen to be synchronous (reads or direct writes).
      
      So add a allow_merge hook to the io scheduler ops, so an io scheduler can
      help decide whether a bio/process combination may be merged with an
      existing request.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      da775265
  16. Dec 19, 2006
Loading