- Aug 05, 2020
-
-
Scott Mayhew authored
The NFS_CONTEXT_ERROR_WRITE flag (as well as the check of said flag) was removed by commit 6fbda89b. The absence of an error check allows writes to be continually queued up for a server that may no longer be able to handle them. Fix it by adding an error check using the generic error reporting functions. Fixes: 6fbda89b ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by:
Scott Mayhew <smayhew@redhat.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
- Aug 01, 2020
-
-
Scott Mayhew authored
nfs_wb_all() calls filemap_write_and_wait(), which uses filemap_check_errors() to determine the error to return. filemap_check_errors() only looks at the mapping->flags and will therefore only return either -ENOSPC or -EIO. To ensure that the correct error is returned on close(), nfs{,4}_file_flush() should call filemap_check_wb_err() which looks at the errseq value in mapping->wb_err without consuming it. Fixes: 6fbda89b ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by:
Scott Mayhew <smayhew@redhat.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
- Jul 30, 2020
-
-
Frank van der Linden authored
Caches should be small enough to discard them inline, so do that instead of using a work queue. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
- Jul 28, 2020
-
-
Colin Ian King authored
The variable result is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by:
Colin Ian King <colin.king@canonical.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
- Jul 17, 2020
-
-
Olga Kornievskaia authored
It looks like this "else" is just a typo. It turns off nconnect for NFSv4.0 even though it works for every other version. Signed-off-by:
Olga Kornievskaia <kolga@netapp.com> Signed-off-by:
J. Bruce Fields <bfields@redhat.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
He Zhe authored
commit 0688e64b ("NFS: Allow signal interruption of NFS4ERR_DELAYed operations") introduces nfs4_delay_interruptible which also needs an _unsafe version to avoid the following call trace for the same reason explained in commit 416ad3c9 ("freezer: add unsafe versions of freezable helpers for NFS") CPU: 4 PID: 3968 Comm: rm Tainted: G W 5.8.0-rc4 #1 Hardware name: Marvell OcteonTX CN96XX board (DT) Call trace: dump_backtrace+0x0/0x1dc show_stack+0x20/0x30 dump_stack+0xdc/0x150 debug_check_no_locks_held+0x98/0xa0 nfs4_delay_interruptible+0xd8/0x120 nfs4_handle_exception+0x130/0x170 nfs4_proc_rmdir+0x8c/0x220 nfs_rmdir+0xa4/0x360 vfs_rmdir.part.0+0x6c/0x1b0 do_rmdir+0x18c/0x210 __arm64_sys_unlinkat+0x64/0x7c el0_svc_common.constprop.0+0x7c/0x110 do_el0_svc+0x24/0xa0 el0_sync_handler+0x13c/0x1b8 el0_sync+0x158/0x180 Signed-off-by:
He Zhe <zhe.he@windriver.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
- Jul 13, 2020
-
-
Frank van der Linden authored
Implement client side caching for NFSv4.2 extended attributes. The cache is a per-inode hashtable, with name/value entries. There is one special entry for the listxattr cache. NFS inodes have a pointer to a cache structure. The cache structure is allocated on demand, freed when the cache is invalidated. Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE) are collected by a separate shrinker, and freed more aggressively than others. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
Now that all the lower level code is there to make the RPC calls, hook it in to the xattr handlers and the listxattr entry point, to make them available. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
Implement the extended attribute procedures for NFSv4.2 extended attribute support (RFC 8276). Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
Make the buf_to_pages_noslab function available to the rest of the NFS code. Rename it to nfs4_buf_to_pages_noslab to be consistent. This will be used later in the NFSv4.2 xattr code. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
Define the NFS_INO_INVALID_XATTR flag, to be used for the NFSv4.2 xattr cache, and use it where appropriate. No functional change as yet. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
Until now, change attributes in change_info form were only returned by directory operations. However, they are also used for the RFC 8276 extended attribute operations, which work on both directories and regular files. Modify update_changeattr to deal: * Rename it to nfs4_update_changeattr and make it non-static. * Don't always use INO_INVALID_DATA, this isn't needed for a directory that only had its extended attributes changed by us. * Existing callers now always pass in INO_INVALID_DATA. For the current callers of this function, behavior is unchanged. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
RFC 8276 defines separate ACCESS bits for extended attribute checking. Query them in nfs_do_access and opendata. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
The only consumer of nfs_access_get_cached_rcu and nfs_access_cached calls these static functions in order to first try RCU access, and then locked access. Combine them in to a single function, and call that. Make this function available to the rest of the NFS code. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
Define the argument and response structures that will be used for RFC 8276 extended attribute RPC calls, and implement the necessary functions to encode/decode the extended attribute operations. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
Query the server for extended attribute support, and record it as the NFS_CAP_XATTR flag in the server capabilities. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Frank van der Linden authored
Set limits for extended attributes (attribute value size and listxattr buffer size), based on the fs-independent limits (XATTR_*_MAX). Define the maximum XDR sizes for the RFC 8276 XATTR operations. In the case of operations that carry a larger payload (SETXATTR, GETXATTR, LISTXATTR), these exclude that payload, which is added as separate pages, like other operations do. Define, much like for read and write operations, the maximum overhead sizes for get/set/listxattr, and use them to limit the maximum payload size for those operations, in combination with the channel attributes. Signed-off-by:
Frank van der Linden <fllinden@amazon.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Make sure we specify the layout segment range when calculating the mirror count. In theory, that number could depend on the range to which we're writing. Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Both nfs_pageio_reset_read_mds() and nfs_pageio_reset_write_mds() do call pnfs_generic_pg_cleanup() for us. Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If the application uses the AT_STATX_DONT_SYNC flag after doing readdir(), then we should still mark the parent inode as seeing a readdirplus hit. That ensures that we continue to use readdirplus in the 'ls -l' type of workflow to do fast lookups of the dentries. Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com>
-
- Jul 12, 2020
-
-
Pavel Begunkov authored
59960b9d ("io_uring: fix lazy work init") tried to fix missing io_req_init_async(), but left out work.flags and hash. Do it earlier. Fixes: 7cdaf587 ("io_uring: avoid whole io_wq_work copy for requests completed inline") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Ensure to set msg.msg_name for the async portion of send/recvmsg, as the header copy will copy to/from it. Cc: stable@vger.kernel.org # v5.5+ Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 10, 2020
-
-
Jens Axboe authored
We currently account the memory after the exit work has been run, but that leaves a gap where a process has closed its ring and until the memory has been accounted as freed. If the memlocked ulimit is borderline, then that can introduce spurious setup errors returning -ENOMEM because the free work hasn't been run yet. Account this as freed when we close the ring, as not to expose a tiny gap where setting up a new ring can fail. Fixes: 85faa7b8 ("io_uring: punt final io_ring_ctx wait-and-free to workqueue") Cc: stable@vger.kernel.org # v5.7 Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yang Yingliang authored
I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0x607eeac06e78 (size 8): comm "test", pid 295, jiffies 4294735835 (age 31.745s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<00000000932632e6>] percpu_ref_init+0x2a/0x1b0 [<0000000092ddb796>] __io_uring_register+0x111d/0x22a0 [<00000000eadd6c77>] __x64_sys_io_uring_register+0x17b/0x480 [<00000000591b89a6>] do_syscall_64+0x56/0xa0 [<00000000864a281d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Call percpu_ref_exit() on error path to avoid refcount memleak. Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update") Cc: stable@vger.kernel.org Reported-by:
Hulk Robot <hulkci@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 09, 2020
-
-
Christoph Hellwig authored
btrfs implements the iter_write op and thus can use the more efficient iov_iter based splice implementation. For now falling back to the less efficient default is pretty harmless, but I have a pending series that removes the default, and thus would cause btrfs to not support splice at all. Reported-by:
Andy Lavr <andy.lavr@gmail.com> Tested-by:
Andy Lavr <andy.lavr@gmail.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Josef Bacik authored
While debugging a patch that I wrote I was hitting use-after-free panics when accessing block groups on unmount. This turned out to be because in the nocow case if we bail out of doing the nocow for whatever reason we need to call btrfs_dec_nocow_writers() if we called the inc. This puts our block group, but a few error cases does if (nocow) { btrfs_dec_nocow_writers(); goto error; } unfortunately, error is error: if (nocow) btrfs_dec_nocow_writers(); so we get a double put on our block group. Fix this by dropping the error cases calling of btrfs_dec_nocow_writers(), as it's handled at the error label now. Fixes: 762bf098 ("btrfs: improve error handling in run_delalloc_nocow") CC: stable@vger.kernel.org # 5.4+ Reviewed-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Steve French authored
To 2.28 Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Ronnie Sahlberg authored
Don't leak a reference to tlink during the NOTIFY ioctl Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <stfrench@microsoft.com> Reviewed-by:
Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org> # v5.6+
-
Yang Yingliang authored
I got a memleak report when doing some fuzz test: BUG: memory leak unreferenced object 0xffff888113e02300 (size 488): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`....... backtrace: [<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 BUG: memory leak unreferenced object 0xffff8881152dd5e0 (size 16): comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s) hex dump (first 16 bytes): 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline] [<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline] [<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440 [<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106 [<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151 [<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193 [<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233 [<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline] [<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74 [<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720 [<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 [<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 If io_sqe_file_register() failed, we need put the file that get by fget() to avoid the memleak. Fixes: c3a31e60 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE") Cc: stable@vger.kernel.org Reported-by:
Hulk Robot <hulkci@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Xiaoguang Wang authored
For those applications which are not willing to use io_uring_enter() to reap and handle cqes, they may completely rely on liburing's io_uring_peek_cqe(), but if cq ring has overflowed, currently because io_uring_peek_cqe() is not aware of this overflow, it won't enter kernel to flush cqes, below test program can reveal this bug: static void test_cq_overflow(struct io_uring *ring) { struct io_uring_cqe *cqe; struct io_uring_sqe *sqe; int issued = 0; int ret = 0; do { sqe = io_uring_get_sqe(ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); break;; } ret = io_uring_submit(ring); if (ret <= 0) { if (ret != -EBUSY) fprintf(stderr, "sqe submit failed: %d\n", ret); break; } issued++; } while (ret > 0); assert(ret == -EBUSY); printf("issued requests: %d\n", issued); while (issued) { ret = io_uring_peek_cqe(ring, &cqe); if (ret) { if (ret != -EAGAIN) { fprintf(stderr, "peek completion failed: %s\n", strerror(ret)); break; } printf("left requets: %d\n", issued); continue; } io_uring_cqe_seen(ring, cqe); issued--; printf("left requets: %d\n", issued); } } int main(int argc, char *argv[]) { int ret; struct io_uring ring; ret = io_uring_queue_init(16, &ring, 0); if (ret) { fprintf(stderr, "ring setup failed: %d\n", ret); return 1; } test_cq_overflow(&ring); return 0; } To fix this issue, export cq overflow status to userspace by adding new IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly. Signed-off-by:
Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 08, 2020
-
-
Steve French authored
We should not be logging a warning repeatedly on change notify. CC: Stable <stable@vger.kernel.org> # v5.6+ Signed-off-by:
Steve French <stfrench@microsoft.com> Reviewed-by:
Ronnie Sahlberg <lsahlber@redhat.com>
-
Christoph Hellwig authored
Fold it into the two callers. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Consolidate the two in-kernel read helpers to make upcoming changes easier. The only difference are the missing call to rw_verify_area in kernel_read, and an access_ok check that doesn't make sense for kernel buffers to start with. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
This is the counterpart to __kernel_write, and skip the rw_verify_area call compared to kernel_read. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Fold it into the two callers. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Consolidate the two in-kernel write helpers to make upcoming changes easier. The only difference are the missing call to rw_verify_area in kernel_write, and an access_ok check that doesn't make sense for kernel buffers to start with. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Add a WARN_ON_ONCE if the file isn't actually open for write. This matches the check done in vfs_write, but actually warn warns as a kernel user calling write on a file not opened for writing is a pretty obvious programming error. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
This is a very special interface that skips sb_writes protection, and not used by modules anymore. Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
While pipes don't really need sb_writers projection, __kernel_write is an interface better kept private, and the additional rw_verify_area does not hurt here. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Ian Kent <raven@themaw.net>
-
Christoph Hellwig authored
__kernel_write doesn't take a sb_writers references, which we need here. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
David Howells <dhowells@redhat.com>
-