- Mar 05, 2014
-
-
Trond Myklebust authored
When nfs4_set_rw_stateid() can fails by returning EIO to indicate that the stateid is completely invalid, then it makes no sense to have it trigger a retry of the READ or WRITE operation. Instead, we should just have it fall through and attempt a recovery. This fixes an infinite loop in which the client keeps replaying the same bad stateid back to the server. Reported-by:
Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1393954269-3974-1-git-send-email-andros@netapp.com Cc: stable@vger.kernel.org # 3.10+ Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Mar 03, 2014
-
-
Trond Myklebust authored
The clean-up in commit 36281caa ended up removing a NULL pointer check that is needed in order to prevent an Oops in nfs_async_inode_return_delegation(). Reported-by:
"Yan, Zheng" <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/5313E9F6.2020405@intel.com Fixes: 36281caa (NFSv4: Further clean-ups of delegation stateid validation) Cc: stable@vger.kernel.org # 3.4+ Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Mar 01, 2014
-
-
Trond Myklebust authored
nfs4_release_lockowner needs to set the rpc_message reply to point to the nfs4_sequence_res in order to avoid another Oopsable situation in nfs41_assign_slot. Fixes: fbd4bfd1 (NFS: Add nfs4_sequence calls for RELEASE_LOCKOWNER) Cc: stable@vger.kernel.org # 3.12+ Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Feb 19, 2014
-
-
Andy Adamson authored
Do not return an error when nfs4_copy_delegation_stateid succeeds. Signed-off-by:
Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1392737765-41942-1-git-send-email-andros@netapp.com Fixes: ef1820f9 (NFSv4: Don't try to recover NFSv4 locks when...) Cc: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org # 3.12+ Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Feb 17, 2014
-
-
Trond Myklebust authored
We need to use the same net namespace that was used to resolve the hostname and sockaddr arguments. Fixes: 32e62b7c (NFS: Add nfs4_update_server) Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Feb 10, 2014
-
-
Trond Myklebust authored
Commit aa9c2669 (NFS: Client implementation of Labeled-NFS) introduces a performance regression. When nfs_zap_caches_locked is called, it sets the NFS_INO_INVALID_LABEL flag irrespectively of whether or not the NFS server supports security labels. Since that flag is never cleared, it means that all calls to nfs_revalidate_inode() will now trigger an on-the-wire GETATTR call. This patch ensures that we never set the NFS_INO_INVALID_LABEL unless the server advertises support for labeled NFS. It also causes nfs_setsecurity() to clear NFS_INO_INVALID_LABEL when it has successfully set the security label for the inode. Finally it gets rid of the NFS_INO_INVALID_LABEL cruft from nfs_update_inode, which has nothing to do with labeled NFS. Reported-by:
Neil Brown <neilb@suse.de> Cc: stable@vger.kernel.org # 3.11+ Tested-by:
Neil Brown <neilb@suse.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Feb 03, 2014
-
-
Trond Myklebust authored
nfs3_proc_setacls is used internally by the NFSv3 create operations to set the acl after the file has been created. If the operation fails because the server doesn't support acls, then it must return '0', not -EOPNOTSUPP. Reported-by:
Russell King <linux@arm.linux.org.uk> Link: http://lkml.kernel.org/r/20140201010328.GI15937@n2100.arm.linux.org.uk Cc: Christoph Hellwig <hch@lst.de> Tested-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Cc: Christoph Hellwig <hch@lst.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Feb 01, 2014
-
-
Trond Myklebust authored
There may still be timers active on the session waitqueues. Make sure that we kill them before freeing the memory. Cc: stable@vger.kernel.org # 3.12+ Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
nfs41_wake_and_assign_slot() relies on the task->tk_msg.rpc_argp and task->tk_msg.rpc_resp always pointing to the session sequence arguments. nfs4_proc_open_confirm tries to pull a fast one by reusing the open sequence structure, thus causing corruption of the NFSv4 slot table. Cc: stable@vger.kernel.org # 3.12+ Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Noah Massey authored
nfs3_get_acl() tries to skip posix equivalent ACLs, but misinterprets the return value of posix_acl_equiv_mode(). Fix it. This is a regression introduced by "nfs: use generic posix ACL infrastructure for v3 Posix ACLs" CC: Christoph Hellwig <hch@infradead.org> CC: linux-nfs@vger.kernel.org CC: linux-fsdevel@vger.kernel.org Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 31, 2014
-
-
Malahal Naineni authored
Avoid returning incorrect acl mask attributes when the server doesn't support ACLs. Signed-off-by:
Malahal Naineni <malahal@us.ibm.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 30, 2014
-
-
Christoph Hellwig authored
Chris Mason reported a NULL pointer derefernence in generic_getxattr() that was due to sb->s_xattr being NULL. The reason is that the nfs #ifdef's for ACL support were misplaced, and the nfs3 inode operations had the xattr operation pointers set up, even though xattrs were not actually supported. As a result, the xattr code was being called without the infrastructure having been set up. Move the #ifdef's appropriately. Reported-and-tested-by:
Chris Mason <clm@fb.com> Acked-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 29, 2014
-
-
Trond Myklebust authored
It is now completely safe to call nfs41_sequence_free_slot with a NULL slot. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Move the test for res->sr_slot == NULL out of the nfs41_sequence_free_slot helper and into the main function for efficiency. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
The check for whether or not we sent an RPC call in nfs40_sequence_done is insufficient to decide whether or not we are holding a session slot, and thus should not be used to decide when to free that slot. This patch replaces the RPC_WAS_SENT() test with the correct test for whether or not slot == NULL. Cc: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # 3.12+ Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Andy Adamson authored
Fix a dynamic session slot leak where a slot is preallocated and I/O is resent through the MDS. Signed-off-by:
Andy Adamson <andros@netapp.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 28, 2014
-
-
Jeff Layton authored
If the setting of NFS_INO_INVALIDATING gets reordered to before the clearing of NFS_INO_INVALID_DATA, then another task may hit a race window where both appear to be clear, even though the inode's pages are still in need of invalidation. Fix this by adding the appropriate memory barriers. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Commit d529ef83 (NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping) introduces a potential race, since it doesn't test the value of nfsi->cache_validity and set the bitlock in nfsi->flags atomically. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com> Cc: Jeff Layton <jlayton@redhat.com>
-
- Jan 27, 2014
-
-
Jeff Layton authored
The original printk() made sense when the GSSAPI codepaths were called only when sec=krb5* was explicitly requested. Now however, in many cases the nfs client will try to acquire GSSAPI credentials by default, even when it's not requested. Since we don't have a great mechanism to distinguish between the two cases, just turn the pr_warn into a dprintk instead. With this change we can also get rid of the ratelimiting. We do need to keep the EXPORT_SYMBOL(gssd_running) in place since auth_gss.ko needs it and sunrpc.ko provides it. We can however, eliminate the gssd_running call in the nfs code since that's a bit of a layering violation. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
There is a possible race in how the nfs_invalidate_mapping function is handled. Currently, we go and invalidate the pages in the file and then clear NFS_INO_INVALID_DATA. The problem is that it's possible for a stale page to creep into the mapping after the page was invalidated (i.e., via readahead). If another writer comes along and sets the flag after that happens but before invalidate_inode_pages2 returns then we could clear the flag without the cache having been properly invalidated. So, we must clear the flag first and then invalidate the pages. Doing this however, opens another race: It's possible to have two concurrent read() calls that end up in nfs_revalidate_mapping at the same time. The first one clears the NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping. Just before calling that though, the other task races in, checks the flag and finds it cleared. At that point, it trusts that the mapping is good and gets the lock on the page, allowing the read() to be satisfied from the cache even though the data is no longer valid. These effects are easily manifested by running diotest3 from the LTP test suite on NFS. That program does a series of DIO writes and buffered reads. The operations are serialized and page-aligned but the existing code fails the test since it occasionally allows a read to come out of the cache incorrectly. While mixing direct and buffered I/O isn't recommended, I believe it's possible to hit this in other ways that just use buffered I/O, though that situation is much harder to reproduce. The problem is that the checking/clearing of that flag and the invalidation of the mapping really need to be atomic. Fix this by serializing concurrent invalidations with a bitlock. At the same time, we also need to allow other places that check NFS_INO_INVALID_DATA to check whether we might be in the middle of invalidating the file, so fix up a couple of places that do that to look for the new NFS_INO_INVALIDATING flag. Doing this requires us to be careful not to set the bitlock unnecessarily, so this code only does that if it believes it will be doing an invalidation. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Malahal Naineni authored
Currently we support ACLs if the NFS server file system supports both ALLOW and DENY ACE types. This patch makes the Linux client work with ACLs even if the server supports only 'ALLOW' ACE type. Signed-off-by:
Malahal Naineni <malahal@us.ibm.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 26, 2014
-
-
Christoph Hellwig authored
This causes a small behaviour change in that we don't bother to set ACLs on file creation if the mode bit can express the access permissions fully, and thus behaving identical to local filesystems. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Christoph Hellwig authored
Rename the current posix_acl_created to __posix_acl_create and add a fully featured helper to set up the ACLs on file creation that uses get_acl(). Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Jan 23, 2014
-
-
Boaz Harrosh authored
An NFS4ERR_RECALLCONFLICT is returned by server from a GET_LAYOUT only when a Server Sent a RECALL do to that GET_LAYOUT, or the RECALL and GET_LAYOUT crossed on the wire. In any way this means we want to wait at most until in-flight IO is finished and the RECALL can be satisfied. So a proper wait here is more like 1/10 of a second, not 15 seconds like we have now. In case of a server bug we delay exponentially longer on each retry. Current code totally craps out performance of very large files on most pnfs-objects layouts, because of how the map changes when the file has grown into the next raid group. [Stable: This will patch back to 3.9. If there are earlier still maintained trees, please tell me I'll send a patch] CC: Stable Tree <stable@vger.kernel.org> Signed-off-by:
Boaz Harrosh <bharrosh@panasas.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 21, 2014
-
-
Weston Andros Adamson authored
cond_resched_lock(cinfo->lock) is called everywhere else while holding the cinfo->lock spinlock. Not holding this lock while calling transfer_commit_list in filelayout_recover_commit_reqs causes the BUG below. It's true that we can't hold this lock while calling pnfs_put_lseg, because that might try to lock the inode lock - which might be the same lock as cinfo->lock. To reproduce, mount a 2 DS pynfs server and run an O_DIRECT command that crosses a stripe boundary and is not page aligned, such as: dd if=/dev/zero of=/mnt/f bs=17000 count=1 oflag=direct BUG: sleeping function called from invalid context at linux/fs/nfs/nfs4filelayout.c:1161 in_atomic(): 0, irqs_disabled(): 0, pid: 27, name: kworker/0:1 2 locks held by kworker/0:1/27: #0: (events){.+.+.+}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5 #1: ((&dreq->work)){+.+...}, at: [<ffffffff810501d7>] process_one_work+0x175/0x3a5 CPU: 0 PID: 27 Comm: kworker/0:1 Not tainted 3.13.0-rc3-branch-dros_testing+ #21 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Workqueue: events nfs_direct_write_schedule_work [nfs] 0000000000000000 ffff88007a39bbb8 ffffffff81491256 ffff88007b87a130 ffff88007a39bbd8 ffffffff8105f103 ffff880079614000 ffff880079617d40 ffff88007a39bc20 ffffffffa011603e ffff880078988b98 0000000000000000 Call Trace: [<ffffffff81491256>] dump_stack+0x4d/0x66 [<ffffffff8105f103>] __might_sleep+0x100/0x105 [<ffffffffa011603e>] transfer_commit_list+0x94/0xf1 [nfs_layout_nfsv41_files] [<ffffffffa01160d6>] filelayout_recover_commit_reqs+0x3b/0x68 [nfs_layout_nfsv41_files] [<ffffffffa00ba53a>] nfs_direct_write_reschedule+0x9f/0x1d6 [nfs] [<ffffffff810705df>] ? mark_lock+0x1df/0x224 [<ffffffff8106e617>] ? trace_hardirqs_off_caller+0x37/0xa4 [<ffffffff8106e691>] ? trace_hardirqs_off+0xd/0xf [<ffffffffa00ba8f8>] nfs_direct_write_schedule_work+0x9d/0xb7 [nfs] [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5 [<ffffffff81050258>] process_one_work+0x1f6/0x3a5 [<ffffffff810501d7>] ? process_one_work+0x175/0x3a5 [<ffffffff8105187e>] worker_thread+0x149/0x1f5 [<ffffffff81051735>] ? rescuer_thread+0x28d/0x28d [<ffffffff81056d74>] kthread+0xd2/0xda [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61 [<ffffffff8149e66c>] ret_from_fork+0x7c/0xb0 [<ffffffff81056ca2>] ? __kthread_parkme+0x61/0x61 Signed-off-by:
Weston Andros Adamson <dros@primarydata.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 20, 2014
-
-
Weston Andros Adamson authored
If clp is new (cl_count = 1) and it matches another client in nfs4_discover_server_trunking, the nfs_put_client will free clp before ->cl_preserve_clid is set. Cc: stable@vger.kernel.org # 3.7+ Signed-off-by:
Weston Andros Adamson <dros@primarydata.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 19, 2014
-
-
Trond Myklebust authored
Both nfs41_walk_client_list and nfs40_walk_client_list expect the 'status' variable to be set to the value -NFS4ERR_STALE_CLIENTID if the loop fails to find a match. The problem is that the 'pos->cl_cons_state > NFS_CS_READY' changes the value of 'status', and sets it either to the value '0' (which indicates success), or to the value EINTR. Cc: stable@vger.kernel.org # 3.7.x: 7b1f1fd1: NFSv4/4.1: Fix bugs in Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 17, 2014
-
-
Scott Mayhew authored
We should always make sure the cached page is up-to-date when we're determining whether we can extend a write to cover the full page -- even if we've received a write delegation from the server. Commit c7559663 added logic to skip this check if we have a write delegation, which can lead to data corruption such as the following scenario if client B receives a write delegation from the NFS server: Client A: # echo 123456789 > /mnt/file Client B: # echo abcdefghi >> /mnt/file # cat /mnt/file 0�D0�abcdefghi Just because we hold a write delegation doesn't mean that we've read in the entire page contents. Cc: <stable@vger.kernel.org> # v3.11+ Signed-off-by:
Scott Mayhew <smayhew@redhat.com> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 13, 2014
-
-
Christoph Hellwig authored
Make sure to properly invalidate the pagecache before performing direct I/O, so that no stale pages are left around. This matches what the generic direct I/O code does. Also take the i_mutex over the direct write submission to avoid the lifelock vs truncate waiting for i_dio_count to decrease, and to avoid having the pagecache easily repopulated while direct I/O is in progrss. Again matching the generic direct I/O code. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Christoph Hellwig authored
We'll need the i_mutex to prevent i_dio_count from incrementing while truncate is waiting for it to reach zero, and protects against having the pagecache repopulated after we flushed it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Christoph Hellwig authored
Simple code cleanup to prepare for later fixes. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Christoph Hellwig authored
Simple code cleanup to prepare for later fixes. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Christoph Hellwig authored
i_dio_count is used to protect dio access against truncate. We want to make sure there are no dio reads pending either when doing a truncate. I suspect on plain NFS things might work even without this, but once we use a pnfs layout driver that access backing devices directly things will go bad without the proper synchronization. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Christoph Hellwig authored
We need to have the I/O fully finished before telling the truncate code that we are done. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Christoph Hellwig authored
nfs_file_direct_write only updates the inode size if it succeeded and returned the number of bytes written. But in the AIO case nfs_direct_wait turns the return value into -EIOCBQUEUED and we skip the size update. Instead the aio completion path should updated it, which this patch does. The implementation is a little hacky because there is no obvious way to find out we are called for a write in nfs_direct_complete. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Weston Andros Adamson authored
Don't check for -NFS4ERR_NOTSUPP, it's already been mapped to -ENOTSUPP by nfs4_stat_to_errno. This allows the client to mount v4.1 servers that don't support SECINFO_NO_NAME by falling back to the "guess and check" method of nfs4_find_root_sec. Signed-off-by:
Weston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org # 3.1+ Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
nfs4_write_inode() must not be allowed to exit until the layoutcommit is done. That means that both NFS_INO_LAYOUTCOMMIT and NFS_INO_LAYOUTCOMMITTING have to be cleared. Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
If a LAYOUTCOMMIT is outstanding, then chances are that the metadata server may still be returning incorrect values for the change attribute, ctime, mtime and/or size. Just ignore those attributes for now, and wait for the LAYOUTCOMMIT rpc call to finish. Reported-by:
shaobingqing <shaobingqing@bwstor.com.cn> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-
- Jan 05, 2014
-
-
Toralf Förster authored
Signed-off-by:
Toralf Förster <toralf.foerster@gmx.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com>
-