Commits · cde6f9ab10c6099f5e8daed898a43562b9911b70 · jan.koester / Linux

Dec 25, 2016

ktime: Get rid of ktime_equal() · 1f3a8e49

Thomas Gleixner authored Dec 25, 2016



No point in going through loops and hoops instead of just comparing the
values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>

1f3a8e49

ktime: Cleanup ktime_set() usage · 8b0e1953

Thomas Gleixner authored Dec 25, 2016

ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>

8b0e1953

ktime: Get rid of the union · 2456e855

Thomas Gleixner authored Dec 25, 2016



ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>

2456e855

Dec 24, 2016

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

Linus Torvalds authored Dec 24, 2016



This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

Dec 23, 2016

ufs: fix function declaration for ufs_truncate_blocks · f698cccb

Jeff Layton authored Dec 20, 2016



sparse says:

    fs/ufs/inode.c:1195:6: warning: symbol 'ufs_truncate_blocks' was not declared. Should it be static?

Note that the forward declaration in the file is already marked static.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

f698cccb

fs: exec: apply CLOEXEC before changing dumpable task flags · 613cc2b6

Aleksa Sarai authored Dec 21, 2016



If you have a process that has set itself to be non-dumpable, and it
then undergoes exec(2), any CLOEXEC file descriptors it has open are
"exposed" during a race window between the dumpable flags of the process
being reset for exec(2) and CLOEXEC being applied to the file
descriptors. This can be exploited by a process by attempting to access
/proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.

The race in question is after set_dumpable has been (for get_link,
though the trace is basically the same for readlink):

[vfs]
-> proc_pid_link_inode_operations.get_link
   -> proc_pid_get_link
      -> proc_fd_access_allowed
         -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);

Which will return 0, during the race window and CLOEXEC file descriptors
will still be open during this window because do_close_on_exec has not
been called yet. As a result, the ordering of these calls should be
reversed to avoid this race window.

This is of particular concern to container runtimes, where joining a
PID namespace with file descriptors referring to the host filesystem
can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
against access of CLOEXEC file descriptors -- file descriptors which may
reference filesystem objects the container shouldn't have access to).

Cc: dev@opencontainers.org
Cc: <stable@vger.kernel.org> # v3.2+
Reported-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

613cc2b6

seq_file: reset iterator to first record for zero offset · e522751d

Tomasz Majchrzak authored Nov 29, 2016



If kernfs file is empty on a first read, successive read operations
using the same file descriptor will return no data, even when data is
available. Default kernfs 'seq_next' implementation advances iterator
position even when next object is not there. Kernfs 'seq_start' for
following requests will not return iterator as position is already on
the second object.

This defect doesn't allow to monitor badblocks sysfs files from MD raid.
They are initially empty but if data appears at some stage, userspace is
not able to read it.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

e522751d

vfs: fix isize/pos/len checks for reflink & dedupe · 22725ce4

Darrick J. Wong authored Dec 19, 2016



Strengthen the checking of pos/len vs. i_size, clarify the return values
for the clone prep function, and remove pointless code.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

22725ce4

move aio compat to fs/aio.c · c00d2c7e

Al Viro authored Dec 20, 2016



... and fix the minor buglet in compat io_submit() - native one
kills ioctx as cleanup when put_user() fails.  Get rid of
bogus compat_... in !CONFIG_AIO case, while we are at it - they
should simply fail with ENOSYS, same as for native counterparts.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c00d2c7e

Dec 22, 2016

befs: add NFS export support · ac632f5b

Luis de Bethencourt authored Nov 04, 2016



Implement mandatory export_operations, so it is possible to export befs via
nfs.

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>

ac632f5b

befs: remove trailing whitespaces · e60f749b

Luis de Bethencourt authored Nov 10, 2016



Removing all trailing whitespaces in befs.

I was skeptic about tainting the history with this, but whitespace changes
can be ignored by using 'git blame -w' and 'git log -w'.

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>

e60f749b

befs: remove signatures from comments · 50b00fc4

Luis de Bethencourt authored Aug 14, 2016



No idea why some comments have signatures. These predate git. Removing them
since they add noise and no information.

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>

50b00fc4

befs: fix style issues in header files · 12ecb38d

Luis de Bethencourt authored Aug 14, 2016



Fixing checkpatch.pl issues in befs header files:
WARNING: Missing a blank line after declarations
+       befs_inode_addr iaddr;
+       iaddr.allocation_group = blockno >> BEFS_SB(sb)->ag_shift;

WARNING: space prohibited between function name and open parenthesis '('
+       return BEFS_SB(sb)->block_size / sizeof (befs_disk_inode_addr);

ERROR: "foo * bar" should be "foo *bar"
+                   const char *key, befs_off_t * value);

ERROR: Macros with complex values should be enclosed in parentheses
+#define PACKED __attribute__ ((__packed__))

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>

12ecb38d

befs: fix style issues in linuxvfs.c · 62b80719

Luis de Bethencourt authored Aug 14, 2016



Fix the following type of checkpatch.pl issues:
WARNING: line over 80 characters
+static struct dentry *befs_lookup(struct inode *, struct dentry *, unsigned int);

ERROR: code indent should use tabs where possible
+        if (!bi)$

WARNING: please, no spaces at the start of a line
+        if (!bi)$

WARNING: labels should not be indented
+      unacquire_bh:

WARNING: space prohibited between function name and open parenthesis '('
+                                             sizeof (struct befs_inode_info),

WARNING: braces {} are not necessary for single statement blocks
+       if (!*out) {
+               return -ENOMEM;
+       }

WARNING: Block comments use a trailing */ on a separate line
+        * in special cases */

WARNING: Missing a blank line after declarations
+               int token;
+               if (!*p)

ERROR: do not use assignment in if condition
+       if (!(bh = sb_bread(sb, sb_block))) {

ERROR: space prohibited after that open parenthesis '('
+       if( befs_sb->num_blocks > ~((sector_t)0) ) {

ERROR: space prohibited before that close parenthesis ')'
+       if( befs_sb->num_blocks > ~((sector_t)0) ) {

ERROR: space required before the open parenthesis '('
+       if( befs_sb->num_blocks > ~((sector_t)0) ) {

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>

62b80719

befs: fix typos in linuxvfs.c · 1ca7087e
Luis de Bethencourt authored Aug 14, 2016
```
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
```
1ca7087e

befs: fix style issues in io.c · 4c7df645

Luis de Bethencourt authored Aug 14, 2016



Fixing the two following checkpatch.pl issues:
ERROR: trailing whitespace
+ * Based on portions of file.c and inode.c $

WARNING: labels should not be indented
+      error:

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>

4c7df645

befs: fix style issues in inode.c · 85a06b30

Luis de Bethencourt authored Aug 14, 2016



Fixing the following checkpatch.pl errors and warning:
ERROR: trailing whitespace
+ * $

WARNING: Block comments use * on subsequent lines
+/*
+       Validates the correctness of the befs inode

ERROR: "foo * bar" should be "foo *bar"
+befs_check_inode(struct super_block *sb, befs_inode * raw_inode,

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>

85a06b30

befs: fix style issues in debug.c · a83179a8

Luis de Bethencourt authored Aug 14, 2016



Fix all checkpatch.pl errors and warnings in debug.c:
ERROR: trailing whitespace
+ * $

WARNING: Missing a blank line after declarations
+       va_list args;
+       va_start(args, fmt);

ERROR: "foo * bar" should be "foo *bar"
+befs_dump_inode(const struct super_block *sb, befs_inode * inode)

ERROR: "foo * bar" should be "foo *bar"
+befs_dump_super_block(const struct super_block *sb, befs_super_block * sup)

ERROR: "foo * bar" should be "foo *bar"
+befs_dump_small_data(const struct super_block *sb, befs_small_data * sd)

WARNING: line over 80 characters
+befs_dump_index_entry(const struct super_block *sb, befs_disk_btree_super * super)

ERROR: "foo * bar" should be "foo *bar"
+befs_dump_index_entry(const struct super_block *sb, befs_disk_btree_super * super)

ERROR: "foo * bar" should be "foo *bar"
+befs_dump_index_node(const struct super_block *sb, befs_btree_nodehead * node)

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>

a83179a8

Dec 21, 2016

splice: reinstate SIGPIPE/EPIPE handling · 52bce911

Linus Torvalds authored Dec 21, 2016



Commit 8924feff ("splice: lift pipe_lock out of splice_to_pipe()")
caused a regression when there were no more readers left on a pipe that
was being spliced into: rather than the expected SIGPIPE and -EPIPE
return value, the writer would end up waiting forever for space to free
up (which obviously was not going to happen with no readers around).

Fixes: 8924feff ("splice: lift pipe_lock out of splice_to_pipe()")
Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org>
Debugged-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org   # v4.9
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

52bce911

Dec 19, 2016

NFSv4: Retry the DELEGRETURN if the embedded GETATTR is rejected with EACCES · 8ac2b422

Trond Myklebust authored Dec 19, 2016

If our DELEGRETURN RPC call is rejected with an EACCES call, then we should
remove the GETATTR call from the compound RPC and retry.
This could potentially happen when there is a conflict between an
ACL denying attribute reads and our use of SP4_MACH_CRED.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

8ac2b422

NFS: Retry the CLOSE if the embedded GETATTR is rejected with EACCES · f07d4a31

Trond Myklebust authored Dec 19, 2016

If our CLOSE RPC call is rejected with an EACCES call, then we should
remove the GETATTR call from the compound RPC and retry.
This could potentially happen when there is a conflict between an
ACL denying attribute reads and our use of SP4_MACH_CRED.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

f07d4a31

NFSv4: Place the GETATTR operation before the CLOSE · d8d84983

Trond Myklebust authored Dec 19, 2016

In order to benefit from the DENY share lock protection, we should
put the GETATTR operation before the CLOSE. Otherwise, we might race
with a Windows machine that thinks it is now safe to modify the file.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d8d84983

NFSv4: Also ask for attributes when downgrading to a READ-only state · 9413a1a1

Trond Myklebust authored Dec 19, 2016



If we're downgrading from a READ+WRITE mode to a READ-only mode, then
ask for cache consistency attributes so that we avoid the revalidation
in nfs_close_context()

Fixes: 3947b74d ("NFSv4: Don't request a GETATTR on open_downgrade.")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

9413a1a1

NFS: Don't abuse NFS_INO_REVAL_FORCED in nfs_post_op_update_inode_locked() · a5f925bc

Trond Myklebust authored Dec 19, 2016

The NFS_INO_REVAL_FORCED flag now really only has meaning for the
case when we've just been handed a delegation for a file that was already
cached, and we're unsure about that cache.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

a5f925bc

pNFS: Return RW layouts on OPEN_DOWNGRADE · e71708d4

Trond Myklebust authored Nov 21, 2016

If the client holds no more writeable open state, and does not hold a
write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE.

We do this only for writes, since some layout drivers may require you to
also hold a read layout if you are doing a R/W workload.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e71708d4

NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADE · b6808145

Trond Myklebust authored Nov 20, 2016

While we do not need to return the RW layout when downgrading from a
read/write open state to read-only, we might want to do so in order
to reduce the burden on the metadataserver so that it does not need
to check for changed data when responding to GETATTR requests.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b6808145

NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID · 86cfb041

NeilBrown authored Dec 19, 2016

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up. nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree. As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing. The request is then retried, with a
predicatable result (indefinite retries).

If the stateid is for a delegation, this open_owner will be used
to open files when the delegation is returned. For that to work,
a new open-owner needs to be presented to the server.

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but updates the 'create_time' so it looks like a new
open-owner. With this the indefinite retries no longer happen.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

86cfb041

NFSv4: ensure __nfs4_find_lock_state returns consistent result. · 3f8f2548

NeilBrown authored Dec 19, 2016



If a file has both flock locks and OFD locks, then it is possible that
two different nfs4 lock states could apply to file accesses from a
single process.

It is not possible to know, efficiently, which one is "correct".
Presumably the state which represents a lock that covers the region
undergoing IO would be the "correct" one to use, but finding that has
a non-trivial cost and would provide miniscule value.

Currently we just return whichever is first in the list, which could
result in inconsistent behaviour if an application ever put it self in
this position.  As consistent behaviour is preferable (when perfectly
correct behaviour is not available), change the search to return a
consistent result in this circumstance.
Specifically: if there is both a flock and OFD lock state, always return
the flock one.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

3f8f2548

NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. · cfd278c2

NeilBrown authored Dec 19, 2016



Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.

This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.

So add a test for ds_clp == NULL and return NULL in that case.

Fixes: c23266d5 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Olga Kornievskaia <aglo@umich.edu>
Acked-by: Adamson, Andy <William.Adamson@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

cfd278c2

pNFS/flexfiles: delete deviceid, don't mark inactive · 1c48cee8

Weston Andros Adamson authored Dec 14, 2016

Instead of marking a device inactive, remove it from the cache entirely.

Flexfiles has a way to report errors back to the server, so we don't want
to stop devices from being tried again for 120 seconds.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1c48cee8

NFS: Clean up nfs_attribute_timeout() · 187e593d

Trond Myklebust authored Dec 16, 2016



It can be made static.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

187e593d

NFS: Remove unused function nfs_revalidate_inode_rcu() · 3f642a13
Trond Myklebust authored Dec 16, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
3f642a13

NFS: Fix and clean up the access cache validity checking · 21c3ba7e

Trond Myklebust authored Dec 16, 2016



The access cache needs to check whether or not the mode bits, ownership,
or ACL has changed or the cache has timed out.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

21c3ba7e

NFS: Only look at the change attribute cache state in nfs_weak_revalidate() · 9cdd1d3f

Trond Myklebust authored Dec 16, 2016



Just like in nfs_check_verifier(), we want to use
nfs_mapping_need_revalidate_inode() to check our knowledge of the
change attribute is up to date.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

9cdd1d3f

NFS: Clean up cache validity checking · 61540bf6

Trond Myklebust authored Dec 08, 2016



Consolidate the open-coded checking of NFS_I(inode)->cache_validity
into a couple of helper functions.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

61540bf6

NFS: Don't revalidate the file on close if we hold a delegation · 58ff4184

Trond Myklebust authored Dec 16, 2016



If we're holding a delegation, we can skip sending the close-to-open
GETATTR until we're returning that delegation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

58ff4184

NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN · 0bc2c9b4

Trond Myklebust authored Dec 16, 2016



DELEGRETURN will always carry a reference to the inode except when
the latter is being freed, so let's ensure that we always use that
inode information to ensure close-to-open cache consistency, even
when the DELEGRETURN call is asynchronous.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

0bc2c9b4

NFSv4: Update the attribute cache info in update_changeattr · e603a4c1

Trond Myklebust authored Dec 16, 2016



If we successfully updated the change attribute, we should timestamp the
cache. While we do know that the other attributes are not completely up
to date, we have the NFS_INO_INVALID_ATTR flag that let us know that,
so it is valid to say that the cache has not timed out.
We can also clear NFS_INO_REVAL_PAGECACHE, since our change attribute
is now known to be valid.

Conversely, if the change attribute did not match, we should make sure to
also revalidate the access and ACL caches.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e603a4c1

quota: Fix bogus warning in dquot_disable() · 2700e606

Jan Kara authored Dec 19, 2016



dquot_disable() was warning when sb_has_quota_loaded() was true when
invalidating page cache for quota files. The thinking behind this
warning was that we must have raced with somebody else turning quotas on
and this should not happen because all places modifying quota state must
hold s_umount exclusively now. However sb_has_quota_loaded() can be also
true at this point when we are just suspending quotas on remount
read-only. Just restore the behavior to situation before commit
c3b00446 ("quota: Remove dqonoff_mutex") which introduced the
warning.

The code in dquot_disable() can be further simplified with the new
locking of quota state changes however let's leave that to a separate
commit that can get more testing exposure.

Fixes: c3b00446
Signed-off-by: Jan Kara <jack@suse.cz>

2700e606

Dec 16, 2016

reorganize do_make_slave() · 5235d448

Al Viro authored Nov 20, 2016



Make sure that clone_mnt() never returns a mount with MNT_SHARED in
flags, but without a valid ->mnt_group_id.  That allows to demystify
do_make_slave() quite a bit, among other things.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

5235d448