Skip to content
  1. Jan 03, 2014
  2. Jan 02, 2014
  3. Dec 12, 2013
    • Stanislav Kholmanskikh's avatar
      nfsd: revoking of suid/sgid bits after chown() in a consistent way · c4fa6d7c
      Stanislav Kholmanskikh authored
      There is an inconsistency in the handling of SUID/SGID file
      bits after chown() between NFS and other local file systems.
      
      Local file systems (for example, ext3, ext4, xfs, btrfs) revoke
      SUID/SGID bits after chown() on a regular file even if
      the owner/group of the file has not been changed:
      
      ~# touch file; chmod ug+s file; chmod u+x file
      ~# ls -l file
      -rwsr-Sr-- 1 root root 0 Dec  6 04:49 file
      ~# chown root file; ls -l file
      -rwxr-Sr-- 1 root root 0 Dec  6 04:49 file
      
      but NFS doesn't do that:
      
      ~# touch file; chmod ug+s file; chmod u+x file
      ~# ls -l file
      -rwsr-Sr-- 1 root root 0 Dec  6 04:49 file
      ~# chown root file; ls -l file
      -rwsr-Sr-- 1 root root 0 Dec  6 04:49 file
      
      NFS does that only if the owner/group has been changed:
      
      ~# touch file; chmod ug+s file; chmod u+x file
      ~# ls -l file
      -rwsr-Sr-- 1 root root 0 Dec  6 05:02 file
      ~# chown bin file; ls -l file
      -rwxr-Sr-- 1 bin root 0 Dec  6 05:02 file
      
      See: http://pubs.opengroup.org/onlinepubs/9699919799/functions/chown.html
      
      
      
       "If the specified file is a regular file, one or more of
       the S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set,
       and the process has appropriate privileges, it is
       implementation-defined whether the set-user-ID and set-group-ID
       bits are altered."
      
      So both variants are acceptable by POSIX.
      
      This patch makes NFS to behave like local file systems.
      
      Signed-off-by: default avatarStanislav Kholmanskikh <stanislav.kholmanskikh@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      c4fa6d7c
  4. Dec 11, 2013
  5. Dec 06, 2013
  6. Dec 04, 2013
    • Helge Deller's avatar
      nfs: fix do_div() warning by instead using sector_div() · 3873d064
      Helge Deller authored
      
      
      When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
      shown below.  Fix this warning by instead using sector_div() which is provided
      by the kernel.h header file.
      
      fs/nfs/blocklayout/extents.c: In function ‘normalize’:
      include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
      fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
      nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
      fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
      include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
       extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      3873d064
    • Trond Myklebust's avatar
      NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery · f22e5edd
      Trond Myklebust authored
      
      
      Andy Adamson reports:
      
      The state manager is recovering expired state and recovery OPENs are being
      processed. If kswapd is pruning inodes at the same time, a deadlock can occur
      when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
      resultant layoutreturn gets an error that the state mangager is to handle,
      causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
      
      At the same time an open is waiting for the inode deletion to complete in
      __wait_on_freeing_inode.
      
      If the open is either the open called by the state manager, or an open from
      the same open owner that is holding the NFSv4 sequence id which causes the
      OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
      then the state is deadlocked with kswapd.
      
      The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY.
      We already know that layouts are dropped on all server reboots, and that
      it has to be coded to deal with the "forgetful client model" that doesn't
      send layoutreturns.
      
      Reported-by: default avatarAndy Adamson <andros@netapp.com>
      Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.com
      
      
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@primarydata.com>
      f22e5edd
  7. Dec 03, 2013
  8. Dec 02, 2013
    • Linus Torvalds's avatar
      vfs: fix subtle use-after-free of pipe_inode_info · b0d8d229
      Linus Torvalds authored
      
      
      The pipe code was trying (and failing) to be very careful about freeing
      the pipe info only after the last access, with a pattern like:
      
              spin_lock(&inode->i_lock);
              if (!--pipe->files) {
                      inode->i_pipe = NULL;
                      kill = 1;
              }
              spin_unlock(&inode->i_lock);
              __pipe_unlock(pipe);
              if (kill)
                      free_pipe_info(pipe);
      
      where the final freeing is done last.
      
      HOWEVER.  The above is actually broken, because while the freeing is
      done at the end, if we have two racing processes releasing the pipe
      inode info, the one that *doesn't* free it will decrement the ->files
      count, and unlock the inode i_lock, but then still use the
      "pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
      
      This is *very* hard to trigger in practice, since the race window is
      very small, and adding debug options seems to just hide it by slowing
      things down.
      
      Simon originally reported this way back in July as an Oops in
      kmem_cache_allocate due to a single bit corruption (due to the final
      "spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
      allocation that had re-used the free'd pipe-info), it's taken this long
      to figure out.
      
      Since the 'pipe->files' accesses aren't even protected by the pipe lock
      (we very much use the inode lock for that), the simple solution is to
      just drop the pipe lock early.  And since there were two users of this
      pattern, create a helper function for it.
      
      Introduced commit ba5bb147 ("pipe: take allocation and freeing of
      pipe_inode_info out of ->i_mutex").
      
      Reported-by: default avatarSimon Kirby <sim@hostway.ca>
      Reported-by: default avatarIan Applegate <ia@cloudflare.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org   # v3.10+
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0d8d229
  9. Nov 29, 2013
  10. Nov 28, 2013
  11. Nov 27, 2013
  12. Nov 25, 2013
    • Steve French's avatar
      [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload · f19e84df
      Steve French authored
      
      
      Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
      of BTRFS_IOC_CLONE to avoid confusion about whether
      copy-on-write is required or optional for this operation.
      
      SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
      they both speed up copy by offloading the copy rather than
      passing many read and write requests back and forth and both have
      identical syntax (passing file handles), but for SMB2/SMB3
      CopyChunk the server is not required to use copy-on-write
      to make a copy of the file (although some do), and Christoph
      has commented that since CopyChunk does not require
      copy-on-write we should not reuse BTRFS_IOC_CLONE.
      
      This patch renames the ioctl to use a cifs specific IOCTL
      CIFS_IOCTL_COPYCHUNK.  This ioctl is particularly important
      for SMB2/SMB3 since large file copy over the network otherwise
      can be very slow, and with this is often more than 100 times
      faster putting less load on server and client.
      
      Note that if a copy syscall is ever introduced, depending on
      its requirements/format it could end up using one of the other
      three methods that CIFS/SMB2/SMB3 can do for copy offload,
      but this method is particularly useful for file copy
      and broadly supported (not just by Samba server).
      
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarDavid Disseldorp <ddiss@samba.org>
      f19e84df
  13. Nov 24, 2013
  14. Nov 23, 2013
    • Li Wang's avatar
      ceph: allocate non-zero page to fscache in readpage() · ff638b7d
      Li Wang authored
      
      
      ceph_osdc_readpages() returns number of bytes read, currently,
      the code only allocate full-zero page into fscache, this patch
      fixes this.
      
      Signed-off-by: default avatarLi Wang <liwang@ubuntukylin.com>
      Reviewed-by: default avatarMilosz Tanski <milosz@adfin.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      ff638b7d
    • Yan, Zheng's avatar
      ceph: wake up 'safe' waiters when unregistering request · fc55d2c9
      Yan, Zheng authored
      
      
      We also need to wake up 'safe' waiters if error occurs or request
      aborted. Otherwise sync(2)/fsync(2) may hang forever.
      
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      fc55d2c9
    • Yan, Zheng's avatar
      ceph: cleanup aborted requests when re-sending requests. · eb1b8af3
      Yan, Zheng authored
      
      
      Aborted requests usually get cleared when the reply is received.
      If MDS crashes, no reply will be received. So we need to cleanup
      aborted requests when re-sending requests.
      
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarGreg Farnum <greg@inktank.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      eb1b8af3
    • Yan, Zheng's avatar
      ceph: handle race between cap reconnect and cap release · 99a9c273
      Yan, Zheng authored
      
      
      When a cap get released while composing the cap reconnect message.
      We should skip queuing the release message if the cap hasn't been
      added to the cap reconnect message.
      
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      99a9c273
    • Yan, Zheng's avatar
      ceph: set caps count after composing cap reconnect message · 44c99757
      Yan, Zheng authored
      
      
      It's possible that some caps get released while composing the cap
      reconnect message.
      
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      44c99757
    • Yan, Zheng's avatar
      ceph: queue cap release in __ceph_remove_cap() · a096b09a
      Yan, Zheng authored
      
      
      call __queue_cap_release() in __ceph_remove_cap(), this avoids
      acquiring s_cap_lock twice.
      
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      a096b09a
    • Tejun Heo's avatar
      sysfs: use a separate locking class for open files depending on mmap · 027a485d
      Tejun Heo authored
      
      
      The following two commits implemented mmap support in the regular file
      path and merged bin file support into the regular path.
      
       73d97146 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
       3124eb16 ("sysfs: merge regular and bin file handling")
      
      After the merge, the following commands trigger a spurious lockdep
      warning.  "test-mmap-read" simply mmaps the file and dumps the
      content.
      
        $ cat /sys/block/sda/trace/act_mask
        $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.12.0-work+ #378 Not tainted
        -------------------------------------------------------
        test-mmap-read/567 is trying to acquire lock:
         (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
      
        but task is already holding lock:
         (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #3 (&mm->mmap_sem){++++++}:
        ...
        -> #2 (sr_mutex){+.+.+.}:
        ...
        -> #1 (&bdev->bd_mutex){+.+.+.}:
        ...
        -> #0 (&of->mutex){+.+.+.}:
        ...
      
        other info that might help us debug this:
      
        Chain exists of:
         &of->mutex --> sr_mutex --> &mm->mmap_sem
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&mm->mmap_sem);
      				 lock(sr_mutex);
      				 lock(&mm->mmap_sem);
          lock(&of->mutex);
      
         *** DEADLOCK ***
      
        1 lock held by test-mmap-read/567:
         #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        stack backtrace:
        CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
         ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
         0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
        Call Trace:
         [<ffffffff81611ad2>] dump_stack+0x4e/0x7a
         [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
         [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
         [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
         [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
         [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
         [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
         [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
         [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
         [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
         [<ffffffff81008282>] SyS_mmap+0x22/0x30
         [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b
      
      This happens because one file nests sr_mutex, which nests mm->mmap_sem
      under it, under of->mutex while mmap implementation naturally nests
      of->mutex under mm->mmap_sem.  The warning is false positive as
      of->mutex is per open-file and the two paths belong to two different
      files.  This warning didn't trigger before regular and bin file
      supports were merged because only bin file supported mmap and the
      other side of locking happened only on regular files which used
      equivalent but separate locking.
      
      It'd be best if we give separate locking classes per file but we can't
      easily do that.  Let's differentiate on ->mmap() for now.  Later we'll
      add explicit file operations struct and can add per-ops lockdep key
      there.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      027a485d
    • Mika Westerberg's avatar
      sysfs: handle duplicate removal attempts in sysfs_remove_group() · 54d71145
      Mika Westerberg authored
      
      
      Commit bcdde7e2 (sysfs: make __sysfs_remove_dir() recursive) changed
      the behavior so that directory removals will be done recursively. This
      means that the sysfs group might already be removed if its parent directory
      has been removed.
      
      The current code outputs warnings similar to following log snippet when it
      detects that there is no group for the given kobject:
      
       WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
       sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
       Modules linked in:
       CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
       Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
       Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
        ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
        ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
       Call Trace:
        [<ffffffff817daab1>] dump_stack+0x45/0x56
        [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
        [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
        [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
        [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
        [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
        [<ffffffff8142a0d0>] device_del+0x40/0x1b0
        [<ffffffff8142a24d>] device_unregister+0xd/0x20
        [<ffffffff8144131a>] scsi_remove_host+0xba/0x110
        [<ffffffff8145f526>] ata_host_detach+0xc6/0x100
        [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
        [<ffffffff812e8f48>] pci_device_remove+0x28/0x60
        [<ffffffff8142d854>] __device_release_driver+0x64/0xd0
        [<ffffffff8142d8de>] device_release_driver+0x1e/0x30
        [<ffffffff8142d257>] bus_remove_device+0xf7/0x140
        [<ffffffff8142a1b1>] device_del+0x121/0x1b0
        [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
        [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
        [<ffffffff812fd90d>] hotplug_event+0xcd/0x160
        [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
        [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
        [<ffffffff8105cf3a>] process_one_work+0x17a/0x430
        [<ffffffff8105db29>] worker_thread+0x119/0x390
        [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
        [<ffffffff81063a5d>] kthread+0xcd/0xf0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
        [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
      
      On this particular machine I see ~16 of these message during Thunderbolt
      hot-unplug.
      
      Fix this in similar way that was done for sysfs_remove_one() by checking
      if the parent directory has already been removed and bailing out early.
      
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54d71145
  15. Nov 22, 2013
    • Junxiao Bi's avatar
      configfs: fix race between dentry put and lookup · 76ae281f
      Junxiao Bi authored
      
      
      A race window in configfs, it starts from one dentry is UNHASHED and end
      before configfs_d_iput is called.  In this window, if a lookup happen,
      since the original dentry was UNHASHED, so a new dentry will be
      allocated, and then in configfs_attach_attr(), sd->s_dentry will be
      updated to the new dentry.  Then in configfs_d_iput(),
      BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
      
      sys_open:                     sys_close:
       ...                           fput
                                      dput
                                       dentry_kill
                                        __d_drop <--- dentry unhashed here,
                                                 but sd->dentry still point
                                                 to this dentry.
      
       lookup_real
        configfs_lookup
         configfs_attach_attr---> update sd->s_dentry
                                  to new allocated dentry here.
      
                                         d_kill
                                           configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
                                                           triggered here.
      
      To fix it, change configfs_d_iput to not update sd->s_dentry if
      sd->s_count > 2, that means there are another dentry is using the sd
      beside the one that is going to be put.  Use configfs_dirent_lock in
      configfs_attach_attr to sync with configfs_d_iput.
      
      With the following steps, you can reproduce the bug.
      
      1. enable ocfs2, this will mount configfs at /sys/kernel/config and
         fill configure in it.
      
      2. run the following script.
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76ae281f
  16. Nov 21, 2013
Loading