Skip to content
  1. Nov 08, 2013
  2. Oct 24, 2013
    • Bian Yu's avatar
      md: avoid deadlock when md_set_badblocks. · 905b0297
      Bian Yu authored
      
      
      When operate harddisk and hit errors, md_set_badblocks is called after
      scsi_restart_operations which already disabled the irq. but md_set_badblocks
      will call write_sequnlock_irq and enable irq. so softirq can preempt the
      current thread and that may cause a deadlock. I think this situation should
      use write_sequnlock_irqsave/irqrestore instead.
      
      I met the situation and the call trace is below:
      [  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
      [  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
      [  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37
      [  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
      [  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
      [  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
      [  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
      [  638.933884] Call Trace:
      [  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
      [  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
      [  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
      [  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
      [  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
      [  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
      [  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
      [  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
      [  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
      [  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
      [  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
      [  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
      [  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
      [  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
      [  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
      [  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
      [  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
      [  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
      [  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
      [  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
      [  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
      [  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
      [  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
      [  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
      [  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
      [  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
      [  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
      [  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
      [  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
      [  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
      [  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
      [  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
      [  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
      [  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
      [  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
      [  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
      [  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
      [  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
      [  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
      [  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      [  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
      [  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      
      This bug was introduce in commit  2e8ac303
      (the first time rdev_set_badblock was call from interrupt context),
      so this patch is appropriate for 3.5 and subsequent kernels.
      
      Cc: <stable@vger.kernel.org> (3.5+)
      Signed-off-by: default avatarBian Yu <bianyu@kedacom.com>
      Reviewed-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      905b0297
  3. Aug 27, 2013
    • NeilBrown's avatar
      md: avoid deadlock when dirty buffers during md_stop. · 260fa034
      NeilBrown authored
      
      
      When the last process closes /dev/mdX sync_blockdev will be called so
      that all buffers get flushed.
      So if it is then opened for the STOP_ARRAY ioctl to be sent there will
      be nothing to flush.
      
      However if we open /dev/mdX in order to send the STOP_ARRAY ioctl just
      moments before some other process which was writing closes their file
      descriptor, then there won't be a 'last close' and the buffers might
      not get flushed.
      
      So do_md_stop() calls sync_blockdev().  However at this point it is
      holding ->reconfig_mutex.  So if the array is currently 'clean' then
      the writes from sync_blockdev() will not complete until the array
      can be marked dirty and that won't happen until some other thread
      can get ->reconfig_mutex.  So we deadlock.
      
      We need to move the sync_blockdev() call to before we take
      ->reconfig_mutex.
      However then some other thread could open /dev/mdX and write to it
      after we call sync_blockdev() and before we actually stop the array.
      This can leave dirty data in the page cache which is awkward.
      
      So introduce new flag MD_STILL_CLOSED.  Set it before calling
      sync_blockdev(), clear it if anyone does open the file, and abort the
      STOP_ARRAY attempt if it gets set before we lock against further
      opens.
      
      It is still possible to get problems if you open /dev/mdX, write to
      it, then issue the STOP_ARRAY ioctl.  Just don't do that.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      260fa034
    • NeilBrown's avatar
      md: Don't test all of mddev->flags at once. · 7a0a5355
      NeilBrown authored
      
      
      mddev->flags is mostly used to record if an update of the
      metadata is needed.  Sometimes the whole field is tested
      instead of just the important bits.  This makes it difficult
      to introduce more state bits.
      
      So replace all bare tests of mddev->flags with tests for the bits
      that actually need testing.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7a0a5355
    • Dave Jones's avatar
      md: Fix apparent cut-and-paste error in super_90_validate · c9ad020f
      Dave Jones authored
      
      
      Setting a variable to itself probably wasn't the intention here.
      
      Signed-off-by: default avatarDave Jones <davej@fedoraproject.org>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c9ad020f
    • NeilBrown's avatar
      md: fix safe_mode buglet. · 275c51c4
      NeilBrown authored
      
      
      Whe we set the safe_mode_timeout to a smaller value we trigger a timeout
      immediately - otherwise the small value might not be honoured.
      However if the previous timeout was 0 meaning "no timeout", we didn't.
      This would mean that no timeout happens until the next write completes,
      which could be a long time.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      275c51c4
    • NeilBrown's avatar
      md: don't call md_allow_write in get_bitmap_file. · 60559da4
      NeilBrown authored
      
      
      There is no really need as GFP_NOIO is very likely sufficient,
      and failure is not catastrophic.
      
      Calling md_allow_write here will convert a read-auto array to
      read/write which could be confusing when you are just performing
      a read operation.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      60559da4
  4. Jul 18, 2013
    • NeilBrown's avatar
      md: Remove recent change which allows devices to skip recovery. · 5024c298
      NeilBrown authored
      
      
      commit 7ceb17e8
          md: Allow devices to be re-added to a read-only array.
      
      allowed a bit more than just that.  It also allows devices to be added
      to a read-write array and to end up skipping recovery.
      
      This patch removes the offending piece of code pending a rewrite for a
      subsequent release.
      
      More specifically:
       If the array has a bitmap, then the device will still need a bitmap
       based resync ('saved_raid_disk' is set under different conditions
       is a bitmap is present).
       If the array doesn't have a bitmap, then this is correct as long as
       nothing has been written to the array since the metadata was checked
       by ->validate_super.  However there is no locking to ensure that there
       was no write.
      
      Bug was introduced in 3.10 and causes data corruption so
      patch is suitable for 3.10-stable.
      
      Cc: stable@vger.kernel.org (3.10)
      Reported-by: default avatarJoe Lawrence <joe.lawrence@stratus.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      5024c298
  5. Jun 26, 2013
    • Jonathan Brassow's avatar
      MD: Remember the last sync operation that was performed · c4a39551
      Jonathan Brassow authored
      
      
      MD:  Remember the last sync operation that was performed
      
      This patch adds a field to the mddev structure to track the last
      sync operation that was performed.  This is especially useful when
      it comes to what is recorded in mismatch_cnt in sysfs.  If the
      last operation was "data-check", then it reports the number of
      descrepancies found by the user-initiated check.  If it was a
      "repair" operation, then it is reporting the number of
      descrepancies repaired.  etc.
      
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      c4a39551
  6. Jun 13, 2013
  7. May 07, 2013
  8. Apr 30, 2013
    • NeilBrown's avatar
      md: bad block list should default to disabled. · 486adf72
      NeilBrown authored
      
      
      Maintenance of a bad-block-list currently defaults to 'enabled'
      and is then disabled when it cannot be supported.
      This is backwards and causes problem for dm-raid which didn't know
      to disable it.
      
      So fix the defaults, and only enabled for v1.x metadata which
      explicitly has bad blocks enabled.
      
      The problem with dm-raid has been present since badblock support was
      added in v3.1, so this patch is suitable for any -stable from 3.1
      onwards.
      
      Cc: stable@vger.kernel.org (3.1+)
      Reported-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      486adf72
  9. Apr 24, 2013
    • Jonathan Brassow's avatar
      MD: Export 'md_reap_sync_thread' function · a91d5ac0
      Jonathan Brassow authored
      
      
      MD: Export 'md_reap_sync_thread' function
      
      Make 'md_reap_sync_thread' available to other files, specifically dm-raid.c.
      - rename reap_sync_thread to md_reap_sync_thread
      - move the fn after md_check_recovery to match md.h declaration placement
      - export md_reap_sync_thread
      
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a91d5ac0
    • NeilBrown's avatar
      md: don't update metadata when stopping a read-only array. · b6d428c6
      NeilBrown authored
      
      
      read-only arrays should stay that way as much as possible.
      Updating the metadata - which could be triggered by a re-add
      while assembling the array metadata - should be avoided.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      b6d428c6
    • NeilBrown's avatar
      md: Allow devices to be re-added to a read-only array. · 7ceb17e8
      NeilBrown authored
      
      
      When assembling an array incrementally we might want to make
      it device available when "enough" devices are present, but maybe
      not "all" devices are present.
      If the remaining devices appear before the array is actually used,
      they should be added transparently.
      
      We do this by using the "read-auto" mode where the array acts like
      it is read-only until a write request arrives.
      
      Current an add-device request switches a read-auto array to active.
      This means that only one device can be added after the array is first
      made read-auto.  This isn't a problem for RAID5, but is not ideal for
      RAID6 or RAID10.
      Also we don't really want to switch the array to read-auto at all
      when re-adding a device as this doesn't really imply any change.
      
      So:
       - remove the "md_update_sb()" call from add_new_disk().  This isn't
         really needed as just adding a disk doesn't require a metadata
         update.  Instead, just set MD_CHANGE_DEVS.  This will effect a
         metadata update soon enough, once the array is not read-only.
      
       - Allow the ADD_NEW_DISK ioctl to succeed without activating a
         read-auto array, providing the MD_DISK_SYNC flag is set.
         In this case, the device will be rejected if it cannot be added
         with the correct device number, or has an incorrect event count.
      
       - Teach remove_and_add_spares() to be careful about adding spares
         when the array is read-only (or read-mostly) - only add devices
         that are thought to be in-sync, and only do it if the array is
         in-sync itself.
      
       - In md_check_recovery, use remove_and_add_spares in the read-only
         case, rather than open coding just the 'remove' part of it.
      
      Reported-by: default avatarMartin Wilck <mwilck@arcor.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7ceb17e8
    • NeilBrown's avatar
      md: HOT_DISK_REMOVE shouldn't make a read-auto device active. · 3ea8929d
      NeilBrown authored
      
      
      If a fail device or a spare is removed from an array, there is
      not need to make the array 'active'.  If/when the array does become
      active for some other reason the metadata will be update to reflect
      the removal.
      If that never happens and the array is stopped while still read-auto,
      then there is no loss in forgetting the that the device had 'failed'.
      
      A read-only array will leave failed devices attached to
      the array personality, so we need to explicitly call
      remove_and_add_spares() to free it (clearing Blocked just
      like we do in store_slot()).
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      3ea8929d
    • NeilBrown's avatar
      md: use common code for all calls to ->hot_remove_disk() · 746d3207
      NeilBrown authored
      
      
      slot_store and remove_and_add_spares both call ->hot_remove_disk(),
      but with slightly different tests and consequences, which is
      at least untidy and might be buggy.
      
      So modify remove_and_add_spaces() so that it can be asked
      to remove a specific device, and call it from slot_store().
      
      We also clear the Blocked flag to ensure that doesn't prevent
      removal.  The purpose of Blocked is to prevent automatic removal
      by the kernel before an error is acknowledged.
      If the array is read/write then user-space would have not reason
      to remove a device unless it was known to be 'spare' or 'faulty' in
      which it would have already cleared the Blocked flag.
      If the array is read-only, the flag might still be blocked, but
      there is no harm in clearing the flag for read-only arrays.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      746d3207
    • NeilBrown's avatar
      md: never update metadata when array is read-only. · d87f064f
      NeilBrown authored
      
      
      Normally we don't even try to update the metadata if
      the array is read-only.  However future patches
      will increase the number of things that can happen on a read-only
      array, so it is safest to explicitly disable this.
      
      Every time that mddev->ro is set to 0, either
       - md_update_sb will be called again (at least if MD_CHANGE_DEVS
         is set) or
       - the mddev->thread is scheduled, which will also run
         md_update_sb if needed.
      
      So this is safe: if the array ever become read-write the
      metadata will be updated.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      d87f064f
  10. Mar 23, 2013
  11. Mar 20, 2013
    • Jonathan Brassow's avatar
      MD: Prevent sysfs operations on uninitialized kobjects · 90584fc9
      Jonathan Brassow authored
      
      
      MD: Prevent sysfs operations on uninitialized kobjects
      
      Device-mapper does not use sysfs; but when device-mapper is leveraging
      MD's RAID personalities, MD sometimes attempts to update sysfs.  This
      patch adds checks for 'mddev-kobj.sd' in sysfs_[un]link_rdev to ensure
      it is about to operate on something valid.  This patch also checks for
      'mddev->kobj.sd' before calling 'sysfs_notify' in 'remove_and_add_spares'.
      Although 'sysfs_notify' already makes this check, doing so in
      'remove_and_add_spares' prevents an additional mutex operation.
      
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      90584fc9
  12. Feb 28, 2013
  13. Feb 26, 2013
    • NeilBrown's avatar
      md: fix two bugs when attempting to resize RAID0 array. · a6468539
      NeilBrown authored
      
      
      You cannot resize a RAID0 array (in terms of making the devices
      bigger), but the code doesn't entirely stop you.
      So:
      
       disable setting of the available size on each device for
       RAID0 and Linear devices.  This must not change as doing so
       can change the effective layout of data.
      
       Make sure that the size that raid0_size() reports is accurate,
       but rounding devices sizes to chunk sizes.  As the device sizes
       cannot change now, this isn't so important, but it is best to be
       safe.
      
      Without this change:
        mdadm --grow /dev/md0 -z max
        mdadm --grow /dev/md0 -Z max
        then read to the end of the array
      
      can cause a BUG in a RAID0 array.
      
      These bugs have been present ever since it became possible
      to resize any device, which is a long time.  So the fix is
      suitable for any -stable kerenl.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a6468539
  14. Feb 21, 2013
  15. Dec 13, 2012
  16. Dec 11, 2012
  17. Nov 30, 2012
    • Lukas Czerner's avatar
      wait: add wait_event_lock_irq() interface · eed8c02e
      Lukas Czerner authored
      
      
      New wait_event{_interruptible}_lock_irq{_cmd} macros added. This commit
      moves the private wait_event_lock_irq() macro from MD to regular wait
      includes, introduces new macro wait_event_lock_irq_cmd() instead of using
      the old method with omitting cmd parameter which is ugly and makes a use
      of new macros in the MD. It also introduces the _interruptible_ variant.
      
      The use of new interface is when one have a special lock to protect data
      structures used in the condition, or one also needs to invoke "cmd"
      before putting it to sleep.
      
      All new macros are expected to be called with the lock taken. The lock
      is released before sleep and is reacquired afterwards. We will leave the
      macro with the lock held.
      
      Note to DM: IMO this should also fix theoretical race on waitqueue while
      using simultaneously wait_event_lock_irq() and wait_event() because of
      lack of locking around current state setting and wait queue removal.
      
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eed8c02e
  18. Nov 19, 2012
  19. Oct 29, 2012
  20. Oct 11, 2012
    • NeilBrown's avatar
      md: refine reporting of resync/reshape delays. · 72f36d59
      NeilBrown authored
      
      
      If 'resync_max' is set to 0 (as is often done when starting a
      reshape, so the mdadm can remain in control during a sensitive
      period), and if the reshape request is initially delayed because
      another array using the same array is resyncing or reshaping etc,
      when user-space cannot easily tell when the delay changes from being
      due to a conflicting reshape, to being due to resync_max = 0.
      
      So introduce a new state: (curr_resync == 3) to reflect this, make
      sure it is visible both via /proc/mdstat and via the "sync_completed"
      sysfs attribute, and ensure that the event transition from one delay
      state to the other is properly notified.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      72f36d59
    • NeilBrown's avatar
      md: make sure manual changes to recovery checkpoint are saved. · db07d85e
      NeilBrown authored
      
      
      If you make an array bigger but suppress resync of the new region with
        mdadm --grow /dev/mdX --size=max --assume-clean
      
      then stop the array before anything is written to it, the effect of
      the "--assume-clean" is lost and the array will resync the new space
      when restarted.
      So ensure that we update the metadata in the case.
      
      Reported-by: default avatarSebastian Riemer <sebastian.riemer@profitbricks.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      db07d85e
    • NeilBrown's avatar
      md: writing to sync_action should clear the read-auto state. · 48c26ddc
      NeilBrown authored
      
      
      In some cases array are started in 'read-auto' state where in
      nothing gets written to any device until the array is written
      to.  The purpose of this is to make accidental auto-assembly
      of the wrong arrays less of a risk, and to allow arrays to be
      started to read suspend-to-disk images without actually changing
      anything (as might happen if the array were dirty and a
      resync seemed necessary).
      
      Explicitly writing the 'sync_action' for a read-auto array currently
      doesn't clear the read-auto state, so the sync action doesn't
      happen, which can be confusing.
      
      So allow any successful write to sync_action to clear any read-auto
      state.
      
      Reported-by: default avatarAlexander Kühn <alexander.kuehn@nagilum.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      48c26ddc
    • Jianpeng Ma's avatar
      Subject: [PATCH] md:change resync_mismatches to atomic64_t to avoid races · 7f7583d4
      Jianpeng Ma authored
      
      
      Now that multiple threads can handle stripes, it is safer to
      use an atomic64_t for resync_mismatches, to avoid update races.
      
      Signed-off-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      7f7583d4
Loading