Skip to content
  1. Nov 27, 2013
  2. Nov 23, 2013
    • Tejun Heo's avatar
      sysfs: use a separate locking class for open files depending on mmap · 027a485d
      Tejun Heo authored
      
      
      The following two commits implemented mmap support in the regular file
      path and merged bin file support into the regular path.
      
       73d97146 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
       3124eb16 ("sysfs: merge regular and bin file handling")
      
      After the merge, the following commands trigger a spurious lockdep
      warning.  "test-mmap-read" simply mmaps the file and dumps the
      content.
      
        $ cat /sys/block/sda/trace/act_mask
        $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.12.0-work+ #378 Not tainted
        -------------------------------------------------------
        test-mmap-read/567 is trying to acquire lock:
         (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
      
        but task is already holding lock:
         (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #3 (&mm->mmap_sem){++++++}:
        ...
        -> #2 (sr_mutex){+.+.+.}:
        ...
        -> #1 (&bdev->bd_mutex){+.+.+.}:
        ...
        -> #0 (&of->mutex){+.+.+.}:
        ...
      
        other info that might help us debug this:
      
        Chain exists of:
         &of->mutex --> sr_mutex --> &mm->mmap_sem
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&mm->mmap_sem);
      				 lock(sr_mutex);
      				 lock(&mm->mmap_sem);
          lock(&of->mutex);
      
         *** DEADLOCK ***
      
        1 lock held by test-mmap-read/567:
         #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        stack backtrace:
        CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
         ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
         0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
        Call Trace:
         [<ffffffff81611ad2>] dump_stack+0x4e/0x7a
         [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
         [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
         [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
         [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
         [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
         [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
         [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
         [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
         [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
         [<ffffffff81008282>] SyS_mmap+0x22/0x30
         [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b
      
      This happens because one file nests sr_mutex, which nests mm->mmap_sem
      under it, under of->mutex while mmap implementation naturally nests
      of->mutex under mm->mmap_sem.  The warning is false positive as
      of->mutex is per open-file and the two paths belong to two different
      files.  This warning didn't trigger before regular and bin file
      supports were merged because only bin file supported mmap and the
      other side of locking happened only on regular files which used
      equivalent but separate locking.
      
      It'd be best if we give separate locking classes per file but we can't
      easily do that.  Let's differentiate on ->mmap() for now.  Later we'll
      add explicit file operations struct and can add per-ops lockdep key
      there.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      027a485d
    • Mika Westerberg's avatar
      sysfs: handle duplicate removal attempts in sysfs_remove_group() · 54d71145
      Mika Westerberg authored
      
      
      Commit bcdde7e2 (sysfs: make __sysfs_remove_dir() recursive) changed
      the behavior so that directory removals will be done recursively. This
      means that the sysfs group might already be removed if its parent directory
      has been removed.
      
      The current code outputs warnings similar to following log snippet when it
      detects that there is no group for the given kobject:
      
       WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
       sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
       Modules linked in:
       CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
       Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
       Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
        ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
        ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
       Call Trace:
        [<ffffffff817daab1>] dump_stack+0x45/0x56
        [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
        [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
        [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
        [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
        [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
        [<ffffffff8142a0d0>] device_del+0x40/0x1b0
        [<ffffffff8142a24d>] device_unregister+0xd/0x20
        [<ffffffff8144131a>] scsi_remove_host+0xba/0x110
        [<ffffffff8145f526>] ata_host_detach+0xc6/0x100
        [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
        [<ffffffff812e8f48>] pci_device_remove+0x28/0x60
        [<ffffffff8142d854>] __device_release_driver+0x64/0xd0
        [<ffffffff8142d8de>] device_release_driver+0x1e/0x30
        [<ffffffff8142d257>] bus_remove_device+0xf7/0x140
        [<ffffffff8142a1b1>] device_del+0x121/0x1b0
        [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
        [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
        [<ffffffff812fd90d>] hotplug_event+0xcd/0x160
        [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
        [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
        [<ffffffff8105cf3a>] process_one_work+0x17a/0x430
        [<ffffffff8105db29>] worker_thread+0x119/0x390
        [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
        [<ffffffff81063a5d>] kthread+0xcd/0xf0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
        [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
      
      On this particular machine I see ~16 of these message during Thunderbolt
      hot-unplug.
      
      Fix this in similar way that was done for sysfs_remove_one() by checking
      if the parent directory has already been removed and bailing out early.
      
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54d71145
  3. Nov 22, 2013
    • Junxiao Bi's avatar
      configfs: fix race between dentry put and lookup · 76ae281f
      Junxiao Bi authored
      
      
      A race window in configfs, it starts from one dentry is UNHASHED and end
      before configfs_d_iput is called.  In this window, if a lookup happen,
      since the original dentry was UNHASHED, so a new dentry will be
      allocated, and then in configfs_attach_attr(), sd->s_dentry will be
      updated to the new dentry.  Then in configfs_d_iput(),
      BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
      
      sys_open:                     sys_close:
       ...                           fput
                                      dput
                                       dentry_kill
                                        __d_drop <--- dentry unhashed here,
                                                 but sd->dentry still point
                                                 to this dentry.
      
       lookup_real
        configfs_lookup
         configfs_attach_attr---> update sd->s_dentry
                                  to new allocated dentry here.
      
                                         d_kill
                                           configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
                                                           triggered here.
      
      To fix it, change configfs_d_iput to not update sd->s_dentry if
      sd->s_count > 2, that means there are another dentry is using the sd
      beside the one that is going to be put.  Use configfs_dirent_lock in
      configfs_attach_attr to sync with configfs_d_iput.
      
      With the following steps, you can reproduce the bug.
      
      1. enable ocfs2, this will mount configfs at /sys/kernel/config and
         fill configure in it.
      
      2. run the following script.
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76ae281f
  4. Nov 21, 2013
  5. Nov 20, 2013
    • Phillip Lougher's avatar
      Squashfs: Check stream is not NULL in decompressor_multi.c · ed4f381e
      Phillip Lougher authored
      
      
      Fix static checker complaint that stream is not checked in
      squashfs_decompressor_destroy().
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      ed4f381e
    • Phillip Lougher's avatar
      Squashfs: Directly decompress into the page cache for file data · 0d455c12
      Phillip Lougher authored
      
      
      This introduces an implementation of squashfs_readpage_block()
      that directly decompresses into the page cache.
      
      This uses the previously added page handler abstraction to push
      down the necessary kmap_atomic/kunmap_atomic operations on the
      page cache buffers into the decompressors.  This enables
      direct copying into the page cache without using the slow
      kmap/kunmap calls.
      
      The code detects when multiple threads are racing in
      squashfs_readpage() to decompress the same block, and avoids
      this regression by falling back to using an intermediate
      buffer.
      
      This patch enhances the performance of Squashfs significantly
      when multiple processes are accessing the filesystem simultaneously
      because it not only reduces memcopying, but it more importantly
      eliminates the lock contention on the intermediate buffer.
      
      Using single-thread decompression.
      
              dd if=file1 of=/dev/null bs=4096 &
              dd if=file2 of=/dev/null bs=4096 &
              dd if=file3 of=/dev/null bs=4096 &
              dd if=file4 of=/dev/null bs=4096
      
      Before:
      
      629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s
      
      After:
      
      629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s
      
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      0d455c12
    • Phillip Lougher's avatar
      Squashfs: Restructure squashfs_readpage() · 5f55dbc0
      Phillip Lougher authored
      
      
      Restructure squashfs_readpage() splitting it into separate
      functions for datablocks, fragments and sparse blocks.
      
      Move the memcpying (from squashfs cache entry) implementation of
      squashfs_readpage_block into file_cache.c
      
      This allows different implementations to be supported.
      
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      5f55dbc0
    • Phillip Lougher's avatar
      Squashfs: Generalise paging handling in the decompressors · 846b730e
      Phillip Lougher authored
      
      
      Further generalise the decompressors by adding a page handler
      abstraction.  This adds helpers to allow the decompressors
      to access and process the output buffers in an implementation
      independant manner.
      
      This allows different types of output buffer to be passed
      to the decompressors, with the implementation specific
      aspects handled at decompression time, but without the
      knowledge being held in the decompressor wrapper code.
      
      This will allow the decompressors to handle Squashfs
      cache buffers, and page cache pages.
      
      This patch adds the abstraction and an implementation for
      the caches.
      
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      846b730e
    • Phillip Lougher's avatar
      Squashfs: add multi-threaded decompression using percpu variable · d208383d
      Phillip Lougher authored
      
      
      Add a multi-threaded decompression implementation which uses
      percpu variables.
      
      Using percpu variables has advantages and disadvantages over
      implementations which do not use percpu variables.
      
      Advantages:
        * the nature of percpu variables ensures decompression is
          load-balanced across the multiple cores.
        * simplicity.
      
      Disadvantages: it limits decompression to one thread per core.
      
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      d208383d
    • Minchan Kim's avatar
      squashfs: Enhance parallel I/O · cd59c2ec
      Minchan Kim authored
      
      
      Now squashfs have used for only one stream buffer for decompression
      so it hurts parallel read performance so this patch supports
      multiple decompressor to enhance performance parallel I/O.
      
      Four 1G file dd read on KVM machine which has 2 CPU and 4G memory.
      
      dd if=test/test1.dat of=/dev/null &
      dd if=test/test2.dat of=/dev/null &
      dd if=test/test3.dat of=/dev/null &
      dd if=test/test4.dat of=/dev/null &
      
      old : 1m39s -> new : 9s
      
      * From v1
        * Change comp_strm with decomp_strm - Phillip
        * Change/add comments - Phillip
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      cd59c2ec
    • Phillip Lougher's avatar
      Squashfs: Refactor decompressor interface and code · 9508c6b9
      Phillip Lougher authored
      
      
      The decompressor interface and code was written from
      the point of view of single-threaded operation.  In doing
      so it mixed a lot of single-threaded implementation specific
      aspects into the decompressor code and elsewhere which makes it
      difficult to seamlessly support multiple different decompressor
      implementations.
      
      This patch does the following:
      
      1.  It removes compressor_options parsing from the decompressor
          init() function.  This allows the decompressor init() function
          to be dynamically called to instantiate multiple decompressors,
          without the compressor options needing to be read and parsed each
          time.
      
      2.  It moves threading and all sleeping operations out of the
          decompressors.  In doing so, it makes the decompressors
          non-blocking wrappers which only deal with interfacing with
          the decompressor implementation.
      
      3. It splits decompressor.[ch] into decompressor generic functions
         in decompressor.[ch], and moves the single threaded
         decompressor implementation into decompressor_single.c.
      
      The result of this patch is Squashfs should now be able to
      support multiple decompressors by adding new decompressor_xxx.c
      files with specialised implementations of the functions in
      decompressor_single.c
      
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      9508c6b9
  6. Nov 19, 2013
  7. Nov 18, 2013
Loading