- Dec 02, 2010
-
-
Frederic Weisbecker authored
reiserfs_acl_chmod() can be called by reiserfs_set_attr() and then take the reiserfs lock a second time. Thereafter it may call journal_begin() that definitely requires the lock not to be nested in order to release it before taking the journal mutex because the reiserfs lock depends on the journal mutex already. So, aviod nesting the lock in reiserfs_acl_chmod(). Reported-by:
Pawel Zawora <pzawora@gmail.com> Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com> Tested-by:
Pawel Zawora <pzawora@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.32.x+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 01, 2010
-
-
Dave Chinner authored
Recent tests writing lots of small files showed the flusher thread being CPU bound and taking a long time to do allocations on a debug kernel. perf showed this as the prime reason: samples pcnt function DSO _______ _____ ___________________________ _________________ 224648.00 36.8% xfs_error_test [kernel.kallsyms] 86045.00 14.1% xfs_btree_check_sblock [kernel.kallsyms] 39778.00 6.5% prandom32 [kernel.kallsyms] 37436.00 6.1% xfs_btree_increment [kernel.kallsyms] 29278.00 4.8% xfs_btree_get_rec [kernel.kallsyms] 27717.00 4.5% random32 [kernel.kallsyms] Walking btree blocks during allocation checking them requires each block (a cache hit, so no I/O) call xfs_error_test(), which then does a random32() call as the first operation. IOWs, ~50% of the CPU is being consumed just testing whether we need to inject an error, even though error injection is not active. Kill this overhead when error injection is not active by adding a global counter of active error traps and only calling into xfs_error_test when fault injection is active. Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
Dave Chinner authored
When an inode has been marked stale because the cluster is being freed, we don't want to (re-)insert this inode into the AIL. There is a race condition where the cluster buffer may be unpinned before the inode is inserted into the AIL during transaction committed processing. If the buffer is unpinned before the inode item has been committed and inserted, then it is possible for the buffer to be released and hence processthe stale inode callbacks before the inode is inserted into the AIL. In this case, we then insert a clean, stale inode into the AIL which will never get removed by an IO completion. It will, however, get reclaimed and that triggers an assert in xfs_inode_free() complaining about freeing an inode still in the AIL. This race can be avoided by not moving stale inodes forward in the AIL during transaction commit completion processing. This closes the race condition by ensuring we never insert clean stale inodes into the AIL. It is safe to do this because a dirty stale inode, by definition, must already be in the AIL. Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
Dave Chinner authored
There is an assumption in the parts of XFS that flushing a dirty file will make all the delayed allocation blocks disappear from an inode. That is, that after calling xfs_flush_pages() then ip->i_delayed_blks will be zero. This is an invalid assumption as we may have specualtive preallocation beyond EOF and they are recorded in ip->i_delayed_blks. A flush of the dirty pages of an inode will not change the state of these blocks beyond EOF, so a non-zero deeelalloc block count after a flush is valid. The bmap code has an invalid ASSERT() that needs to be removed, and the swapext code has a bug in that while it swaps the data forks around, it fails to swap the i_delayed_blks counter associated with the fork and hence can get the block accounting wrong. Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
Dave Chinner authored
As reported by Nick Piggin, XFS is suffering from long pauses under highly concurrent workloads when hosted on ramdisks. The problem is that an inode buffer is stuck in the pinned state in memory and as a result either the inode buffer or one of the inodes within the buffer is stopping the tail of the log from being moved forward. The system remains in this state until a periodic log force issued by xfssyncd causes the buffer to be unpinned. The main problem is that these are stale buffers, and are hence held locked until the transaction/checkpoint that marked them state has been committed to disk. When the filesystem gets into this state, only the xfssyncd can cause the async transactions to be committed to disk and hence unpin the inode buffer. This problem was encountered when scaling the busy extent list, but only the blocking lock interface was fixed to solve the problem. Extend the same fix to the buffer trylock operations - if we fail to lock a pinned, stale buffer, then force the log immediately so that when the next attempt to lock it comes around, it will have been unpinned. Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
Dave Chinner authored
Since the move to the new truncate sequence we call xfs_setattr to truncate down excessively instanciated blocks. As shown by the testcase in kernel.org BZ #22452 that doesn't work too well. Due to the confusion of the internal inode size, and the VFS inode i_size it zeroes data that it shouldn't. But full blown truncate seems like overkill here. We only instanciate delayed allocations in the write path, and given that we never released the iolock we can't have converted them to real allocations yet either. The only nasty case is pre-existing preallocation which we need to skip. We already do this for page discard during writeback, so make the delayed allocation block punching a generic function and call it from the failed write path as well as xfs_aops_discard_page. The callers are responsible for ensuring that partial blocks are not truncated away, and that they hold the ilock. Based on a fix originally from Christoph Hellwig. This version used filesystem blocks as the range unit. Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
Oleg Nesterov authored
Note: this patch targets 2.6.37 and tries to be as simple as possible. That is why it adds more copy-and-paste horror into fs/compat.c and uglifies fs/exec.c, this will be cleanuped later. compat_copy_strings() plays with bprm->vma/mm directly and thus has two problems: it lacks the RLIMIT_STACK check and argv/envp memory is not visible to oom killer. Export acct_arg_size() and get_arg_page(), change compat_copy_strings() to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0) as do_execve() does. Add the fatal_signal_pending/cond_resched checks into compat_count() and compat_copy_strings(), this matches the code in fs/exec.c and certainly makes sense. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Brad Spengler published a local memory-allocation DoS that evades the OOM-killer (though not the virtual memory RLIMIT): http://www.grsecurity.net/~spender/64bit_dos.c execve()->copy_strings() can allocate a lot of memory, but this is not visible to oom-killer, nobody can see the nascent bprm->mm and take it into account. With this patch get_arg_page() increments current's MM_ANONPAGES counter every time we allocate the new page for argv/envp. When do_execve() succeds or fails, we change this counter back. Technically this is not 100% correct, we can't know if the new page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but I don't think this really matters and everything becomes correct once exec changes ->mm or fails. Reported-by:
Brad Spengler <spender@grsecurity.net> Reviewed-and-discussed-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: stable@kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 30, 2010
-
-
Jeff Layton authored
The DFS referral parsing code does a memchr() call to find the '\\' delimiter that separates the hostname in the referral UNC from the sharename. It then uses that value to set the length of the hostname via pointer subtraction. Instead of subtracting the start of the hostname however, it subtracts the start of the UNC, which causes the code to pass in a hostname length that is 2 bytes too long. Regression introduced in commit 1a4240f4. Reported-and-Tested-by:
Robbert Kouprie <robbert@exx.nl> Signed-off-by:
Jeff Layton <jlayton@redhat.com> Cc: Wang Lei <wang840925@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: stable@kernel.org Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Trond Myklebust authored
When comparing filehandles in the helper nfs_same_file(), we should not be using 'strncmp()': filehandles are not null terminated strings. Instead, we should just use the existing helper nfs_compare_fh(). Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Suresh Jayaraman authored
Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Suresh Jayaraman authored
Currently, if CONFIG_CIFS_FSCACHE is set, fscache is enabled on files opened as read-only irrespective of the 'fsc' mount option. Fix this by enabling fscache only if 'fsc' mount option is specified explicitly. Remove an extraneous cFYI debug message and fix a typo while at it. Reported-by:
Jeff Layton <jlayton@redhat.com> Acked-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Suresh Jayaraman authored
Currently, it is possible to specify 'fsc' mount option even if CONFIG_CIFS_FSCACHE has not been set. The option is being ignored silently while the user fscache functionality to work. Fix this by raising error when the CONFIG option is not set. Reported-by:
Jeff Layton <jlayton@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Shirish Pargaonkar authored
Add extended attribute name system.cifs_acl Get/generate cifs/ntfs acl blob and hand over to the invoker however it wants to parse/process it under experimental configurable option CIFS_ACL. Do not get CIFS/NTFS ACL for xattr for attribute system.posix_acl_access Signed-off-by:
Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Shirish Pargaonkar authored
Change the name of function mode_to_acl to mode_to_cifs_acl. Handle return code in functions mode_to_cifs_acl and cifs_acl_to_fattr. Signed-off-by:
Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
- Nov 29, 2010
-
-
Suresh Jayaraman authored
Only the callers check whether the invalid_mapping flag is set and not cifs_invalidate_mapping(). Signed-off-by:
Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by:
Steve French <sfrench@us.ibm.com>
-
Chris Mason authored
Fixes compile error Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
The new DIO bio splitting code has problems when the bio spans more than one ordered extent. This will happen as the generic DIO code merges our get_blocks calls together into a bigger single bio. This fixes things by walking forward in the ordered extent code finding all the overlapping ordered extents and completing them all at once. Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Linus Torvalds authored
This avoids some include-file hell, and the function isn't really important enough to be inlined anyway. Reported-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 28, 2010
-
-
Linus Torvalds authored
And in particular, use it in 'pipe_fcntl()'. The other pipe functions do not need to use the 'careful' version, since they are only ever called for things that are already known to be pipes. The normal read/write/ioctl functions are called through the file operations structures, so if a file isn't a pipe, they'd never get called. But pipe_fcntl() is special, and called directly from the generic fcntl code, and needs to use the same careful function that the splice code is using. Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Jones <davej@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
.. and change it to take the 'file' pointer instead of an inode, since that's what all users want anyway. The renaming is preparatory to exporting it to other users. The old 'pipe_info()' name was too generic and is already used elsewhere, so before making the function public we need to use a more specific name. Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Jones <davej@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Nov 27, 2010
-
-
Josef Bacik authored
There is a problem with how we use sget, it searches through the list of supers attached to the fs_type looking for a super with the same fs_devices as what we're trying to mount. This depends on sb->s_fs_info being filled, but we don't fill that in until we get to btrfs_fill_super, so we could hit supers on the fs_type super list that have a null s_fs_info. In order to fix that we need to go ahead and setup a blank root with a blank fs_info to hold fs_devices, that way our test will work out right and then we can set s_fs_info in btrfs_set_super, and then open_ctree will simply use our pre-allocated root and fs_info when setting everything up. Thanks, Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
There are two big problems currently with FIEMAP 1) We return extents for holes. This isn't supposed to happen, we just don't return extents for holes and then userspace interprets the lack of an extent as a hole. 2) We sometimes don't set FIEMAP_EXTENT_LAST properly. This is because we wait to see a EXTENT_FLAG_VACANCY flag on the em, but this won't happen if say we ask fiemap to map up to the last extent in a file, and there is nothing but holes up to the i_size. To fix this we need to lookup the last extent in this file and save the logical offset, so if we happen to try and map that extent we can be sure to set FIEMAP_EXTENT_LAST. With this patch we now pass xfstest 225, which we never have before. Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Ian Kent authored
When mounting a btrfs file system btrfs_test_super() may attempt to use sb->s_fs_info, the btrfs root, of a super block that is going away and that has had the btrfs root set to NULL in its ->put_super(). But if the super block is going away it cannot be an existing super block so we can return false in this case. Signed-off-by:
Ian Kent <raven@themaw.net> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
Currently we fail xfstest 236 because we're not updating the inode ctime on link. This is a simple fix, and makes it so we pass 236 now. Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
We have been failing xfstest 228 forever, because we don't check to make sure the new inode size is acceptable as far as RLIMIT is concerned. Just check to make sure it's ok to create a inode with this new size and error out if not. With this patch we now pass 228. Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
There is a typo in __btrfs_prealloc_file_range() where we set the i_size to actual_len/cur_offset, and then just set it to cur_offset again, and do the same with btrfs_ordered_update_i_size(). This fixes it back to keeping i_size in a local variable and then updating i_size properly. Tested this with xfs_io -F -f -c "falloc 0 1" -c "pwrite 0 1" foo stat'ing foo gives us a size of 1 instead of 4096 like it was. Thanks, Signed-off-by:
Josef Bacik <josef@redhat.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
- Nov 24, 2010
-
-
Frederic Weisbecker authored
reiserfs_unpack() locks the inode mutex with reiserfs_mutex_lock_safe() to protect against reiserfs lock dependency. However this protection requires to have the reiserfs lock to be locked. This is the case if reiserfs_unpack() is called by reiserfs_ioctl but not from reiserfs_quota_on() when it tries to unpack tails of quota files. Fix the ordering of the two locks in reiserfs_unpack() to fix this issue. Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com> Reported-by:
Markus Gapp <markus.gapp@gmx.net> Reported-by:
Jan Kara <jack@suse.cz> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.36.x] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Naoya Horiguchi authored
Currently one pagemap_read() call walks in PAGEMAP_WALK_SIZE bytes (== 512 pages.) But there is a corner case where walk_pmd_range() accidentally runs over a VMA associated with a hugetlbfs file. For example, when a process has mappings to VMAs as shown below: # cat /proc/<pid>/maps ... 3a58f6d000-3a58f72000 rw-p 00000000 00:00 0 7fbd51853000-7fbd51855000 rw-p 00000000 00:00 0 7fbd5186c000-7fbd5186e000 rw-p 00000000 00:00 0 7fbd51a00000-7fbd51c00000 rw-s 00000000 00:12 8614 /hugepages/test then pagemap_read() goes into walk_pmd_range() path and walks in the range 0x7fbd51853000-0x7fbd51a53000, but the hugetlbfs VMA should be handled by walk_hugetlb_range(). Otherwise PMD for the hugepage is considered bad and cleared, which causes undesirable results. This patch fixes it by separating pagemap walk range into one PMD. Signed-off-by:
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Ken Sumrall authored
The attribute cache for a file was not being cleared when a file is opened with O_TRUNC. If the filesystem's open operation truncates the file ("atomic_o_trunc" feature flag is set) then the kernel should invalidate the cached st_mtime and st_ctime attributes. Also i_size should be explicitly be set to zero as it is used sometimes without refreshing the cache. Signed-off-by:
Ken Sumrall <ksumrall@android.com> Cc: Anfei <anfei.zhou@gmail.com> Cc: "Anand V. Avati" <avati@gluster.com> Signed-off-by:
Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Ryusuke Konishi authored
Fixes a typo: "uncommited" -> "uncommitted". Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
- Nov 23, 2010
-
-
Dan Carpenter authored
nilfs_iget_for_gc() returns an ERR_PTR() on failure and doesn't return NULL. Signed-off-by:
Dan Carpenter <error27@gmail.com> Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
- Nov 22, 2010
-
-
Trond Myklebust authored
Store the dirent->d_type in the struct nfs_cache_array_entry so that we can use it in getdents() calls. This fixes a regression with the new readdir code. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
It looks as if the array size calculation in MAX_READDIR_ARRAY does not take the alignment of struct nfs_cache_array_entry into account. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
We should ignore the errors from the filldir callback, and just interpret them as meaning we should exit, however we should definitely pass back ENOMEM errors. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Currently, uncached_readdir() is broken because if fails to handle the results from nfs_readdir_xdr_to_array() correctly. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
nfs_do_filldir() must always free desc->page when it is done, otherwise we end up leaking the page. Also remove unused variable 'dentry'. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Some servers are known to be buggy w.r.t. this. Deal with them... Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-
Trond Myklebust authored
Overflowing the buffer in the readdir ->decode_dirent() should not lead to a fatal error, but rather to an attempt to reread the record in question. Signed-off-by:
Trond Myklebust <Trond.Myklebust@netapp.com>
-