- Apr 20, 2018
-
-
Steve French authored
RHBZ: 1453123 Since at least the 3.10 kernel and likely a lot earlier we have not been able to create unix domain sockets in a cifs share when mounted using the SFU mount option (except when mounted with the cifs unix extensions to Samba e.g.) Trying to create a socket, for example using the af_unix command from xfstests will cause : BUG: unable to handle kernel NULL pointer dereference at 00000000 00000040 Since no one uses or depends on being able to create unix domains sockets on a cifs share the easiest fix to stop this vulnerability is to simply not allow creation of any other special files than char or block devices when sfu is used. Added update to Ronnie's patch to handle a tcon link leak, and to address a buf leak noticed by Gustavo and Colin. Acked-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> CC: Colin Ian King <colin.king@canonical.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com> Reported-by:
Eryu Guan <eguan@redhat.com> Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <smfrench@gmail.com> Cc: stable@vger.kernel.org
-
Long Li authored
When sending through SMB Direct, also dump the packet in SMB send path. Also fixed a typo in debug message. Signed-off-by:
Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Ronnie Sahlberg <lsahlber@redhat.com>
-
- Apr 19, 2018
-
-
Long Li authored
When sending the last iov that breaks into smaller buffers to fit the transfer size, it's necessary to check if this is the last iov. If this is the latest iov, stop and proceed to send pages. Signed-off-by:
Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com> Reviewed-by:
Ronnie Sahlberg <lsahlber@redhat.com>
-
- Apr 17, 2018
-
-
Souptick Joarder authored
Use new return type vm_fault_t for page_mkwrite handler. Signed-off-by:
Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by:
Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Gustavo A. R. Silva authored
The current code null checks variable err_buf, which is always null when it is checked, hence utf16_path is free'd and the function returns -ENOENT everytime it is called, making it impossible for the execution path to reach the following code: err_buf = err_iov.iov_base; Fix this by null checking err_iov.iov_base instead of err_buf. Also, notice that err_buf no longer needs to be initialized to NULL. Addresses-Coverity-ID: 1467876 ("Logically dead code") Fixes: 2d636199e400 ("cifs: Change SMB2_open to return an iov for the error parameter") Signed-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com>
-
- Apr 14, 2018
-
-
Alexey Dobriyan authored
If module removes proc directory while another process pins it by chdir'ing to it, then subsequent recreation of proc entry and all entries down the tree will not be visible to any process until pinning process unchdir from directory and unpins everything. Steps to reproduce: proc_mkdir("aaa", NULL); proc_create("aaa/bbb", ...); chdir("/proc/aaa"); remove_proc_entry("aaa/bbb", NULL); remove_proc_entry("aaa", NULL); proc_mkdir("aaa", NULL); # inaccessible because "aaa" dentry still points # to the original "aaa". proc_create("aaa/bbb", ...); Fix is to implement ->d_revalidate and ->d_delete. Link: http://lkml.kernel.org/r/20180312201938.GA4871@avx2 Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 13, 2018
-
-
Qu Wenruo authored
When looping btrfs/074 with many cpus (>= 8), it's possible to trigger kernel warning due to first key verification: [ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 btree_read_extent_buffer_pages+0x1ad/0x210 [ 4239.523830] Modules linked in: [ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210 [ 4239.527101] Call Trace: [ 4239.527251] read_tree_block+0x42/0x70 [ 4239.527434] read_node_slot+0xd2/0x110 [ 4239.527632] push_leaf_right+0xad/0x1b0 [ 4239.527809] split_leaf+0x4ea/0x700 [ 4239.527988] ? leaf_space_used+0xbc/0xe0 [ 4239.528192] ? btrfs_set_lock_blocking_rw+0x99/0xb0 [ 4239.528416] btrfs_search_slot+0x8cc/0xa40 [ 4239.528605] btrfs_insert_empty_items+0x71/0xc0 [ 4239.528798] __btrfs_run_delayed_refs+0xa98/0x1680 [ 4239.529013] btrfs_run_delayed_refs+0x10b/0x1b0 [ 4239.529205] btrfs_commit_transaction+0x33/0xaf0 [ 4239.529445] ? start_transaction+0xa8/0x4f0 [ 4239.529630] btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0 [ 4239.529833] btrfs_check_data_free_space+0x54/0xa0 [ 4239.530045] btrfs_delalloc_reserve_space+0x25/0x70 [ 4239.531907] btrfs_direct_IO+0x233/0x3d0 [ 4239.532098] generic_file_direct_write+0xcb/0x170 [ 4239.532296] btrfs_file_write_iter+0x2bb/0x5f4 [ 4239.532491] aio_write+0xe2/0x180 [ 4239.532669] ? lock_acquire+0xac/0x1e0 [ 4239.532839] ? __might_fault+0x3e/0x90 [ 4239.533032] do_io_submit+0x594/0x860 [ 4239.533223] ? do_io_submit+0x594/0x860 [ 4239.533398] SyS_io_submit+0x10/0x20 [ 4239.533560] ? SyS_io_submit+0x10/0x20 [ 4239.533729] do_syscall_64+0x75/0x1d0 [ 4239.533979] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 4239.534182] RIP: 0033:0x7f8519741697 The problem here is, at btree_read_extent_buffer_pages() we don't have acquired read/write lock on that extent buffer, only basic info like level/bytenr is reliable. So race condition leads to such false alert. However in current call site, it's impossible to acquire proper lock without race window. To fix the problem, we only verify first key for committed tree blocks (whose generation is no larger than fs_info->last_trans_committed), so the content of such tree blocks will not change and there is no need to get read/write lock. Reported-by:
Nikolay Borisov <nborisov@suse.com> Fixes: 581c1760 ("btrfs: Validate child tree block's level and first key") Signed-off-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Ronnie Sahlberg authored
Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com>
-
Ronnie Sahlberg authored
Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com>
-
Ronnie Sahlberg authored
Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com>
-
Ronnie Sahlberg authored
and get rid of some more calls to get_rfc1002_length() Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com>
-
Steve French authored
More cleanup of use of hardcoded 4 byte RFC1001 field size Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by:
Ronnie Sahlberg <lsahlber@redhat.com>
-
- Apr 12, 2018
-
-
Ronnie Sahlberg authored
Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com>
-
Ronnie Sahlberg authored
and get rid of some get_rfc1002_length() in smb2 Signed-off-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com>
-
Steve French authored
SMB3.11 crypto and hash contexts were not being checked strictly enough. Add parsing and validity checking for the security contexts in the SMB3.11 negotiate response. Signed-off-by:
Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com>
-
Steve French authored
The length checking for SMB3.11 negotiate request includes "negotiate contexts" which caused a buffer validation problem and a confusing warning message on SMB3.11 mount e.g.: SMB2 server sent bad RFC1001 len 236 not 170 Fix the length checking for SMB3.11 negotiate to account for the new negotiate context so that we don't log a warning on SMB3.11 mount by default but do log warnings if lengths returned by the server are incorrect. CC: Stable <stable@vger.kernel.org> Signed-off-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Aurelien Aptel <aaptel@suse.com> Reviewed-by:
Pavel Shilovsky <pshilov@microsoft.com>
-
Bob Peterson authored
This patch simply fixes some comments and the gfs2-glocks.txt file: Places where i_rwsem was called i_mutex, and adding i_rw_mutex. Signed-off-by:
Bob Peterson <rpeterso@redhat.com>
-
Andreas Gruenbacher authored
Function rhashtable_walk_peek is problematic because there is no guarantee that the glock previously returned still exists; when that key is deleted, rhashtable_walk_peek can end up returning a different key, which will cause an inconsistent glock dump. Fix this by keeping track of the current glock in the seq file iterator functions instead. Signed-off-by:
Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by:
Bob Peterson <rpeterso@redhat.com>
-
David Sterba authored
Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Unify the include protection macros to match the file names. Signed-off-by:
David Sterba <dsterba@suse.com>
-
Filipe Manana authored
Currently if we allocate extents beyond an inode's i_size (through the fallocate system call) and then fsync the file, we log the extents but after a power failure we replay them and then immediately drop them. This behaviour happens since about 2009, commit c71bf099 ("Btrfs: Avoid orphan inodes cleanup while replaying log"), because it marks the inode as an orphan instead of dropping any extents beyond i_size before replaying logged extents, so after the log replay, and while the mount operation is still ongoing, we find the inode marked as an orphan and then perform a truncation (drop extents beyond the inode's i_size). Because the processing of orphan inodes is still done right after replaying the log and before the mount operation finishes, the intention of that commit does not make any sense (at least as of today). However reverting that behaviour is not enough, because we can not simply discard all extents beyond i_size and then replay logged extents, because we risk dropping extents beyond i_size created in past transactions, for example: add prealloc extent beyond i_size fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode transaction commit add another prealloc extent beyond i_size fsync - triggers the fast fsync path power failure In that scenario, we would drop the first extent and then replay the second one. To fix this just make sure that all prealloc extents beyond i_size are logged, and if we find too many (which is far from a common case), fallback to a full transaction commit (like we do when logging regular extents in the fast fsync path). Trivial reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo $ sync $ xfs_io -c "falloc -k 256K 1M" /mnt/foo $ xfs_io -c "fsync" /mnt/foo <power failure> # mount to replay log $ mount /dev/sdb /mnt # at this point the file only has one extent, at offset 0, size 256K A test case for fstests follows soon, covering multiple scenarios that involve adding prealloc extents with previous shrinking truncates and without such truncates. Fixes: c71bf099 ("Btrfs: Avoid orphan inodes cleanup while replaying log") Signed-off-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Liu Bo authored
Currently if some fatal errors occur, like all IO get -EIO, resources would be cleaned up when a) transaction is being committed or b) BTRFS_FS_STATE_ERROR is set However, in some rare cases, resources may be left alone after transaction gets aborted and umount may run into some ASSERT(), e.g. ASSERT(list_empty(&block_group->dirty_list)); For case a), in btrfs_commit_transaciton(), there're several places at the beginning where we just call btrfs_end_transaction() without cleaning up resources. For case b), it is possible that the trans handle doesn't have any dirty stuff, then only trans hanlde is marked as aborted while BTRFS_FS_STATE_ERROR is not set, so resources remain in memory. This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that all resources won't stay in memory after umount. Signed-off-by:
Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Amir Goldstein authored
With mount option "xino=on", mounter declares that there are enough free high bits in underlying fs to hold the layer fsid. If overlayfs does encounter underlying inodes using the high xino bits reserved for layer fsid, a warning will be emitted and the original inode number will be used. The mount option name "xino" goes after a similar meaning mount option of aufs, but in overlayfs case, the mapping is stateless. An example for a use case of "xino=on" is when upper/lower is on an xfs filesystem. xfs uses 64bit inode numbers, but it currently never uses the upper 8bit for inode numbers exposed via stat(2) and that is not likely to change in the future without user opting-in for a new xfs feature. The actual number of unused upper bit is much larger and determined by the xfs filesystem geometry (64 - agno_log - agblklog - inopblog). That means that for all practical purpose, there are enough unused bits in xfs inode numbers for more than OVL_MAX_STACK unique fsid's. Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode numbers are allocated sequentially since boot, so they will practially never use the high inode number bits. For compatibility with applications that expect 32bit inodes, the feature can be disabled with "xino=off". The option "xino=auto" automatically detects underlying filesystem that use 32bit inodes and enables the feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of the same name, determine if the default mode for overlayfs mount is "xino=auto" or "xino=off". Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
When overlay layers are not all on the same fs, but all inode numbers of underlying fs do not use the high 'xino' bits, overlay st_ino values are constant and persistent. In that case, relax non-samefs constraint for consistent d_ino and always iterate non-merge dir using ovl_fill_real() actor so we can remap lower inode numbers to unique lower fs range. Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
When overlay layers are not all on the same fs, but all inode numbers of underlying fs do not use the high 'xino' bits, overlay st_ino values are constant and persistent. In that case, set i_ino value to the same value as st_ino for nfsd readdirplus validator. Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
On 64bit systems, when overlay layers are not all on the same fs, but all inode numbers of underlying fs are not using the high bits, use the high bits to partition the overlay st_ino address space. The high bits hold the fsid (upper fsid is 0). This way overlay inode numbers are unique and all inodes use overlay st_dev. Inode numbers are also persistent for a given layer configuration. Currently, our only indication for available high ino bits is from a filesystem that supports file handles and uses the default encode_fh() operation, which encodes a 32bit inode number. Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
Instead of allocating an anonymous bdev per lower layer, allocate one anonymous bdev per every unique lower fs that is different than upper fs. Every unique lower fs is assigned an fsid > 0 and the number of unique lower fs are stored in ofs->numlowerfs. The assigned fsid is stored in the lower layer struct and will be used also for inode number multiplexing. Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
A helper for ovl_getattr() to map the values of st_dev and st_ino according to constant st_ino rules. Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Miklos Szeredi authored
No need to mess with an alias, the upperdentry can be retrieved directly from the overlay inode. Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Miklos Szeredi authored
Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Vivek Goyal authored
Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Vivek Goyal authored
Certain properties in ovl_lookup_data should be set only for the last element of the path. IOW, if we are calling ovl_lookup_single() for an absolute redirect, then d->is_dir and d->opaque do not make much sense for intermediate path elements. Instead set them only if dentry being lookup is last path element. As of now we do not seem to be making use of d->opaque if it is set for a path/dentry in lower. But just define the semantics so that future code can make use of this assumption. Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Vivek Goyal authored
If we are looking in last layer, then there should not be any need to process redirect. redirect information is used only for lookup in next lower layer and there is no more lower layer to look into. So no need to process redirects. IOW, ignore redirects on lowest layer. Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Reviewed-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
When decoding a lower file handle, we need to check if lower file was copied up and indexed and if it has a whiteout index, we need to check if this is an unlinked but open non-dir before returning -ESTALE. To find out if this is an unlinked but open non-dir we need to lookup an overlay inode in inode cache by lower inode and that requires decoding the lower file handle before looking in inode cache. Before this change, if the lower inode turned out to be a directory, we may have paid an expensive cost to reconnect that lower directory for nothing. After this change, we start by decoding a disconnected lower dentry and using the lower inode for looking up an overlay inode in inode cache. If we find overlay inode and dentry in cache, we avoid the index lookup overhead. If we don't find an overlay inode and dentry in cache, then we only need to decode a connected lower dentry in case the lower dentry is a non-indexed directory. The xfstests group overlay/exportfs tests decoding overlayfs file handles after drop_caches with different states of the file at encode and decode time. Overall the tests in the group call ovl_lower_fh_to_d() 89 times to decode a lower file handle. Before this change, the tests called ovl_get_index_fh() 75 times and reconnect_one() 61 times. After this change, the tests call ovl_get_index_fh() 70 times and reconnect_one() 59 times. The 2 cases where reconnect_one() was avoided are cases where a non-upper directory file handle was encoded, then the directory removed and then file handle was decoded. To demonstrate the affect on decoding file handles with hot inode/dentry cache, the drop_caches call in the tests was disabled. Without drop_caches, there are no reconnect_one() calls at all before or after the change. Before the change, there are 75 calls to ovl_get_index_fh(), exactly as the case with drop_caches. After the change, there are only 10 calls to ovl_get_index_fh(). Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
On lookup of non directory, we try to decode the origin file handle stored in upper inode. The origin file handle is supposed to be decoded to a disconnected non-dir dentry, which is fine, because we only need the lower inode of a copy up origin. However, if the origin file handle somehow turns out to be a directory we pay the expensive cost of reconnecting the directory dentry, only to get a mismatch file type and drop the dentry. Optimize this case by explicitly opting out of reconnecting the dentry. Opting-out of reconnect is done by passing a NULL acceptable callback to exportfs_decode_fh(). While the case described above is a strange corner case that does not really need to be optimized, the API added for this optimization will be used by a following patch to optimize a more common case of decoding an overlayfs file handle. Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
Rename ovl_encode_fh() to ovl_encode_real_fh() to differentiate from the exportfs function ovl_encode_inode_fh() and change the latter to ovl_encode_fh() to match the exportfs method name. Rename ovl_decode_fh() to ovl_decode_real_fh() for consistency. Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
For broken hardlinks, we do not return lower st_ino, so we should also not return lower pseudo st_dev. Fixes: a0c5ad30 ("ovl: relax same fs constraint for constant st_ino") Cc: <stable@vger.kernel.org> #v4.15 Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Amir Goldstein authored
As of now if we encounter an opaque dir while looking for a dentry, we set d->last=true. This means that there is no need to look further in any of the lower layers. This works fine as long as there are no redirets or relative redircts. But what if there is an absolute redirect on the children dentry of opaque directory. We still need to continue to look into next lower layer. This patch fixes it. Here is an example to demonstrate the issue. Say you have following setup. upper: /redirect (redirect=/a/b/c) lower1: /a/[b]/c ([b] is opaque) (c has absolute redirect=/a/b/d/) lower0: /a/b/d/foo Now "redirect" dir should merge with lower1:/a/b/c/ and lower0:/a/b/d. Note, despite the fact lower1:/a/[b] is opaque, we need to continue to look into lower0 because children c has an absolute redirect. Following is a reproducer. Watch me make foo disappear: $ mkdir lower middle upper work work2 merged $ mkdir lower/origin $ touch lower/origin/foo $ mount -t overlay none merged/ \ -olowerdir=lower,upperdir=middle,workdir=work2 $ mkdir merged/pure $ mv merged/origin merged/pure/redirect $ umount merged $ mount -t overlay none merged/ \ -olowerdir=middle:lower,upperdir=upper,workdir=work $ mv merged/pure/redirect merged/redirect Now you see foo inside a twice redirected merged dir: $ ls merged/redirect foo $ umount merged $ mount -t overlay none merged/ \ -olowerdir=middle:lower,upperdir=upper,workdir=work After mount cycle you don't see foo inside the same dir: $ ls merged/redirect During middle layer lookup, the opaqueness of middle/pure is left in the lookup state and then middle/pure/redirect is wrongly treated as opaque. Fixes: 02b69b28 ("ovl: lookup redirects") Cc: <stable@vger.kernel.org> #v4.10 Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
Vivek Goyal authored
d->last signifies that this is the last layer we are looking into and there is no more. And that means this allows for some optimzation opportunities during lookup. For example, in ovl_lookup_single() we don't have to check for opaque xattr of a directory is this is the last layer we are looking into (d->last = true). But knowing for sure whether we are looking into last layer can be very tricky. If redirects are not enabled, then we can look at poe->numlower and figure out if the lookup we are about to is last layer or not. But if redircts are enabled then it is possible poe->numlower suggests that we are looking in last layer, but there is an absolute redirect present in found element and that redirects us to a layer in root and that means lookup will continue in lower layers further. For example, consider following. /upperdir/pure (opaque=y) /upperdir/pure/foo (opaque=y,redirect=/bar) /lowerdir/bar In this case pure is "pure upper". When we look for "foo", that time poe->numlower=0. But that alone does not mean that we will not search for a merge candidate in /lowerdir. Absolute redirect changes that. IOW, d->last should not be set just based on poe->numlower if redirects are enabled. That can lead to setting d->last while it should not have and that means we will not check for opaque xattr while we should have. So do this. - If redirects are not enabled, then continue to rely on poe->numlower information to determine if it is last layer or not. - If redirects are enabled, then set d->last = true only if this is the last layer in root ovl_entry (roe). Suggested-by:
Amir Goldstein <amir73il@gmail.com> Reviewed-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com> Fixes: 02b69b28 ("ovl: lookup redirects") Cc: <stable@vger.kernel.org> #v4.10
-