Skip to content
  1. Oct 04, 2009
    • Jens Axboe's avatar
      Revert "Seperate read and write statistics of in_flight requests" · 0f78ab98
      Jens Axboe authored
      
      
      This reverts commit a9327cac.
      
      Corrado Zoccolo <czoccolo@gmail.com> reports:
      
      "with 2.6.32-rc1 I started getting the following strange output from
      "iostat -kx 2":
      Linux 2.6.31bisect (et2) 	04/10/2009 	_i686_	(2 CPU)
      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                10,70    0,00    3,16   15,75    0,00   70,38
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda              18,22     0,00    0,67    0,01    14,77     0,02
      43,94     0,01   10,53 39043915,03 2629219,87
      sdb              60,89     9,68   50,79    3,04  1724,43    50,52
      65,95     0,70   13,06 488437,47 2629219,87
      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                 2,72    0,00    0,74    0,00    0,00   96,53
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      sdb               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                 6,68    0,00    0,99    0,00    0,00   92,33
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      sdb               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                 4,40    0,00    0,73    1,47    0,00   93,40
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      sdb               0,00     4,00    0,00    3,00     0,00    28,00
      18,67     0,06   19,50 333,33 100,00
      
      Global values for service time and utilization are garbage. For
      interval values, utilization is always 100%, and service time is
      higher than normal.
      
      I bisected it down to:
      [a9327cac] Seperate read and write
      statistics of in_flight requests
      and verified that reverting just that commit indeed solves the issue
      on 2.6.32-rc1."
      
      So until this is debugged, revert the bad commit.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      0f78ab98
    • Jens Axboe's avatar
      cfq-iosched: don't delay async queue if it hasn't dispatched at all · e00c54c3
      Jens Axboe authored
      
      
      We cannot delay for the first dispatch of the async queue if it
      hasn't dispatched at all, since that could present a local user
      DoS attack vector using an app that just did slow timed sync reads
      while filling memory.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      e00c54c3
  2. Oct 03, 2009
  3. Oct 02, 2009
    • Jens Axboe's avatar
      cfq-iosched: add a knob for desktop interactiveness · 1d223515
      Jens Axboe authored
      
      
      This is basically identical to what Vivek Goyal posted, but combined
      into one and labelled 'desktop' instead of 'fairness'. The goal
      is to continue to improve on the latency side of things as it relates
      to interactiveness, keeping the questionable bits under this sysfs
      tunable so it would be easy for throughput-only people to turn off.
      
      Apart from adding the interactive sysfs knob, it also adds the
      behavioural change of allowing slice idling even if the hardware
      does tagged command queuing.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      1d223515
  4. Oct 01, 2009
    • Jun'ichi Nomura's avatar
      Add a tracepoint for block request remapping · b0da3f0d
      Jun'ichi Nomura authored
      
      
      Since 2.6.31 now has request-based device-mapper, it's useful to have
      a tracepoint for request-remapping as well as bio-remapping.
      This patch adds a tracepoint for request-remapping, trace_block_rq_remap().
      
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      b0da3f0d
    • Christoph Hellwig's avatar
      block: allow large discard requests · 67efc925
      Christoph Hellwig authored
      
      
      Currently we set the bio size to the byte equivalent of the blocks to
      be trimmed when submitting the initial DISCARD ioctl.  That means it
      is subject to the max_hw_sectors limitation of the HBA which is
      much lower than the size of a DISCARD request we can support.
      Add a separate max_discard_sectors tunable to limit the size for discard
      requests.
      
      We limit the max discard request size in bytes to 32bit as that is the
      limit for bio->bi_size.  This could be much larger if we had a way to pass
      that information through the block layer.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      67efc925
    • Christoph Hellwig's avatar
      block: use normal I/O path for discard requests · c15227de
      Christoph Hellwig authored
      
      
      prepare_discard_fn() was being called in a place where memory allocation
      was effectively impossible.  This makes it inappropriate for all but
      the most trivial translations of Linux's DISCARD operation to the block
      command set.  Additionally adding a payload there makes the ownership
      of the bio backing unclear as it's now allocated by the device driver
      and not the submitter as usual.
      
      It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
      the queue supports discard operations or not.  blkdev_issue_discard now
      allocates a one-page, sector-length payload which is the right thing
      for the common ATA and SCSI implementations.
      
      The mtd implementation of prepare_discard_fn() is replaced with simply
      checking for the request being a discard.
      
      Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
      which did the prepare_discard_fn but not the different payload allocation
      yet.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      c15227de
    • Zdenek Kabelac's avatar
      Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs · 48c0d4d4
      Zdenek Kabelac authored
      
      
      Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
      introduced in commit 1d54ad6d.
      Release kobject also in case the request_fn is NULL.
      
      Problem was noticed via kmemleak backtrace when some sysfs entries were
      note properly destroyed during  device removal:
      
      unreferenced object 0xffff88001aa76640 (size 80):
        comm "lvcreate", pid 2120, jiffies 4294885144
        hex dump (first 32 bytes):
          01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff  .........e......
          90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff  .f........S.....
        backtrace:
          [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60
          [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0
          [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120
          [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0
          [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0
          [<ffffffff81197d93>] sysfs_create_group+0x13/0x20
          [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20
          [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0
          [<ffffffff812447e4>] add_disk+0x94/0x160
          [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod]
          [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod]
          [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod]
          [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
          [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0
          [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      Signed-off-by: default avatarZdenek Kabelac <zkabelac@redhat.com>
      Reviewed-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      48c0d4d4
    • Martin K. Petersen's avatar
      block: Do not clamp max_hw_sectors for stacking devices · 5dee2477
      Martin K. Petersen authored
      
      
      Stacking devices do not have an inherent max_hw_sector limit.  Set the
      default to INT_MAX so we are bounded only by capabilities of the
      underlying storage.
      
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      5dee2477
    • Martin K. Petersen's avatar
      block: Set max_sectors correctly for stacking devices · 80ddf247
      Martin K. Petersen authored
      
      
      The topology changes unintentionally caused SAFE_MAX_SECTORS to be set
      for stacking devices.  Set the default limit to BLK_DEF_MAX_SECTORS and
      provide SAFE_MAX_SECTORS in blk_queue_make_request() for legacy hw
      drivers that depend on the old behavior.
      
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      80ddf247
  5. Sep 19, 2009
  6. Sep 15, 2009
  7. Sep 14, 2009
  8. Sep 11, 2009
  9. Sep 01, 2009
    • Nikanth Karthikesan's avatar
      block: Allow changing max_sectors_kb above the default 512 · c295fc05
      Nikanth Karthikesan authored
      
      
      The patch "block: Use accessor functions for queue limits"
      (ae03bf63) changed queue_max_sectors_store()
      to use blk_queue_max_sectors() instead of directly assigning the value.
      
      But blk_queue_max_sectors() differs a bit
      1. It sets both max_sectors_kb, and max_hw_sectors_kb
      2. Never allows one to change max_sectors_kb above BLK_DEF_MAX_SECTORS. If one
      specifies a value greater then max_hw_sectors is set to that value but
      max_sectors is set to BLK_DEF_MAX_SECTORS
      
      I am not sure whether blk_queue_max_sectors() should be changed, as it seems
      to be that way for a long time. And there may be callers dependent on that
      behaviour.
      
      This patch simply reverts to the older way of directly assigning the value to
      max_sectors as it was before.
      
      Signed-off-by: default avatarNikanth Karthikesan <knikanth@suse.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      c295fc05
  10. Aug 04, 2009
Loading