Skip to content
  1. Apr 06, 2018
  2. Apr 04, 2018
    • David Howells's avatar
      fscache: Attach the index key and aux data to the cookie · 402cb8dd
      David Howells authored
      
      
      Attach copies of the index key and auxiliary data to the fscache cookie so
      that:
      
       (1) The callbacks to the netfs for this stuff can be eliminated.  This
           can simplify things in the cache as the information is still
           available, even after the cache has relinquished the cookie.
      
       (2) Simplifies the locking requirements of accessing the information as we
           don't have to worry about the netfs object going away on us.
      
       (3) The cache can do lazy updating of the coherency information on disk.
           As long as the cache is flushed before reboot/poweroff, there's no
           need to update the coherency info on disk every time it changes.
      
       (4) Cookies can be hashed or put in a tree as the index key is easily
           available.  This allows:
      
           (a) Checks for duplicate cookies can be made at the top fscache layer
           	 rather than down in the bowels of the cache backend.
      
           (b) Caching can be added to a netfs object that has a cookie if the
           	 cache is brought online after the netfs object is allocated.
      
      A certain amount of space is made in the cookie for inline copies of the
      data, but if it won't fit there, extra memory will be allocated for it.
      
      The downside of this is that live cache operation requires more memory.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarAnna Schumaker <anna.schumaker@netapp.com>
      Tested-by: default avatarSteve Dickson <steved@redhat.com>
      402cb8dd
    • David Howells's avatar
      fscache: Add tracepoints · a18feb55
      David Howells authored
      
      
      Add some tracepoints to fscache:
      
       (*) fscache_cookie - Tracks a cookie's usage count.
      
       (*) fscache_netfs - Logs registration of a network filesystem, including
           the pointer to the cookie allocated.
      
       (*) fscache_acquire - Logs cookie acquisition.
      
       (*) fscache_relinquish - Logs cookie relinquishment.
      
       (*) fscache_enable - Logs enablement of a cookie.
      
       (*) fscache_disable - Logs disablement of a cookie.
      
       (*) fscache_osm - Tracks execution of states in the object state machine.
      
      and cachefiles:
      
       (*) cachefiles_ref - Tracks a cachefiles object's usage count.
      
       (*) cachefiles_lookup - Logs result of lookup_one_len().
      
       (*) cachefiles_mkdir - Logs result of vfs_mkdir().
      
       (*) cachefiles_create - Logs result of vfs_create().
      
       (*) cachefiles_unlink - Logs calls to vfs_unlink().
      
       (*) cachefiles_rename - Logs calls to vfs_rename().
      
       (*) cachefiles_mark_active - Logs an object becoming active.
      
       (*) cachefiles_wait_active - Logs a wait for an old object to be
           destroyed.
      
       (*) cachefiles_mark_inactive - Logs an object becoming inactive.
      
       (*) cachefiles_mark_buried - Logs the burial of an object.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a18feb55
    • David Howells's avatar
      fscache, cachefiles: Fix checker warnings · bfa3837e
      David Howells authored
      
      
      Fix a couple of checker warnings in fscache and cachefiles:
      
       (1) fscache_n_op_requeue is never used, so get rid of it.
      
       (2) cachefiles_uncache_page() is passed in a lock that it releases, so
           this needs annotating.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      bfa3837e
  3. Feb 11, 2018
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      
      
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  4. Nov 27, 2017
  5. Nov 16, 2017
    • Mel Gorman's avatar
      mm: remove __GFP_COLD · 453f85d4
      Mel Gorman authored
      As the page free path makes no distinction between cache hot and cold
      pages, there is no real useful ordering of pages in the free list that
      allocation requests can take advantage of.  Juding from the users of
      __GFP_COLD, it is likely that a number of them are the result of copying
      other sites instead of actually measuring the impact.  Remove the
      __GFP_COLD parameter which simplifies a number of paths in the page
      allocator.
      
      This is potentially controversial but bear in mind that the size of the
      per-cpu pagelists versus modern cache sizes means that the whole per-cpu
      list can often fit in the L3 cache.  Hence, there is only a potential
      benefit for microbenchmarks that alloc/free pages in a tight loop.  It's
      even worse when THP is taken into account which has little or no chance
      of getting a cache-hot page as the per-cpu list is bypassed and the
      zeroing of multiple pages will thrash the cache anyway.
      
      The truncate microbenchmarks are not shown as this patch affects the
      allocation path and not the free path.  A page fault microbenchmark was
      tested but it showed no sigificant difference which is not surprising
      given that the __GFP_COLD branches are a miniscule percentage of the
      fault path.
      
      Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
      
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      453f85d4
    • Mel Gorman's avatar
      mm, pagevec: remove cold parameter for pagevecs · 86679820
      Mel Gorman authored
      Every pagevec_init user claims the pages being released are hot even in
      cases where it is unlikely the pages are hot.  As no one cares about the
      hotness of pages being released to the allocator, just ditch the
      parameter.
      
      No performance impact is expected as the overhead is marginal.  The
      parameter is removed simply because it is a bit stupid to have a useless
      parameter copied everywhere.
      
      Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
      
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86679820
  6. Nov 02, 2017
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      
      
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  7. Jul 17, 2017
    • David Howells's avatar
      VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) · bc98a42c
      David Howells authored
      
      
      Firstly by applying the following with coccinelle's spatch:
      
      	@@ expression SB; @@
      	-SB->s_flags & MS_RDONLY
      	+sb_rdonly(SB)
      
      to effect the conversion to sb_rdonly(sb), then by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(!sb_rdonly(SB)) && A
      	+!sb_rdonly(SB) && A
      	|
      	-A != (sb_rdonly(SB))
      	+A != sb_rdonly(SB)
      	|
      	-A == (sb_rdonly(SB))
      	+A == sb_rdonly(SB)
      	|
      	-!(sb_rdonly(SB))
      	+!sb_rdonly(SB)
      	|
      	-A && (sb_rdonly(SB))
      	+A && sb_rdonly(SB)
      	|
      	-A || (sb_rdonly(SB))
      	+A || sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) != A
      	+sb_rdonly(SB) != A
      	|
      	-(sb_rdonly(SB)) == A
      	+sb_rdonly(SB) == A
      	|
      	-(sb_rdonly(SB)) && A
      	+sb_rdonly(SB) && A
      	|
      	-(sb_rdonly(SB)) || A
      	+sb_rdonly(SB) || A
      	)
      
      	@@ expression A, B, SB; @@
      	(
      	-(sb_rdonly(SB)) ? 1 : 0
      	+sb_rdonly(SB)
      	|
      	-(sb_rdonly(SB)) ? A : B
      	+sb_rdonly(SB) ? A : B
      	)
      
      to remove left over excess bracketage and finally by applying:
      
      	@@ expression A, SB; @@
      	(
      	-(A & MS_RDONLY) != sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
      	|
      	-(A & MS_RDONLY) == sb_rdonly(SB)
      	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
      	)
      
      to make comparisons against the result of sb_rdonly() (which is a bool)
      work correctly.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      bc98a42c
  8. Jun 20, 2017
    • Ingo Molnar's avatar
      sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming · 2055da97
      Ingo Molnar authored
      
      
      So I've noticed a number of instances where it was not obvious from the
      code whether ->task_list was for a wait-queue head or a wait-queue entry.
      
      Furthermore, there's a number of wait-queue users where the lists are
      not for 'tasks' but other entities (poll tables, etc.), in which case
      the 'task_list' name is actively confusing.
      
      To clear this all up, name the wait-queue head and entry list structure
      fields unambiguously:
      
      	struct wait_queue_head::task_list	=> ::head
      	struct wait_queue_entry::task_list	=> ::entry
      
      For example, this code:
      
      	rqw->wait.task_list.next != &wait->task_list
      
      ... is was pretty unclear (to me) what it's doing, while now it's written this way:
      
      	rqw->wait.head.next != &wait->entry
      
      ... which makes it pretty clear that we are iterating a list until we see the head.
      
      Other examples are:
      
      	list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
      	list_for_each_entry(wq, &fence->wait.task_list, task_list) {
      
      ... where it's unclear (to me) what we are iterating, and during review it's
      hard to tell whether it's trying to walk a wait-queue entry (which would be
      a bug), while now it's written as:
      
      	list_for_each_entry_safe(pos, next, &x->head, entry) {
      	list_for_each_entry(wq, &fence->wait.head, entry) {
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2055da97
    • Ingo Molnar's avatar
      sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> · 5dd43ce2
      Ingo Molnar authored
      
      
      The wait_bit*() types and APIs are mixed into wait.h, but they
      are a pretty orthogonal extension of wait-queues.
      
      Furthermore, only about 50 kernel files use these APIs, while
      over 1000 use the regular wait-queue functionality.
      
      So clean up the main wait.h by moving the wait-bit functionality
      out of it, into a separate .h and .c file:
      
        include/linux/wait_bit.h  for types and APIs
        kernel/sched/wait_bit.c   for the implementation
      
      Update all header dependencies.
      
      This reduces the size of wait.h rather significantly, by about 30%.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5dd43ce2
    • Ingo Molnar's avatar
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar authored
      
      
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ac6424b9
  9. Mar 02, 2017
  10. Oct 08, 2016
  11. Sep 27, 2016
    • David Howells's avatar
      cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] · a818101d
      David Howells authored
      
      
      An NULL-pointer dereference happens in cachefiles_mark_object_inactive()
      when it tries to read i_blocks so that it can tell the cachefilesd daemon
      how much space it's making available.
      
      The problem is that cachefiles_drop_object() calls
      cachefiles_mark_object_inactive() after calling cachefiles_delete_object()
      because the object being marked active staves off attempts to (re-)use the
      file at that filename until after it has been deleted.  This means that
      d_inode is NULL by the time we come to try to access it.
      
      To fix the problem, have the caller of cachefiles_mark_object_inactive()
      supply the number of blocks freed up.
      
      Without this, the following oops may occur:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
      IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
      ...
      CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G          I    ------------   3.10.0-470.el7.x86_64 #1
      Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
      Workqueue: fscache_object fscache_object_work_func [fscache]
      task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
      RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
      RSP: 0018:ffff8800b77c3d70  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
      RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
      RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
      R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
      R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
      FS:  0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
       ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
       ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
      Call Trace:
       [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
       [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
       [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
       [<ffffffff810a605b>] process_one_work+0x17b/0x470
       [<ffffffff810a6e96>] worker_thread+0x126/0x410
       [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
       [<ffffffff810ae64f>] kthread+0xcf/0xe0
       [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff81695418>] ret_from_fork+0x58/0x90
       [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
      
      The oopsing code shows:
      
      	callq  0xffffffff810af6a0 <wake_up_bit>
      	mov    0xf8(%r12),%rax
      	mov    0x30(%rax),%rax
      	mov    0x98(%rax),%rax   <---- oops here
      	lock add %rax,0x130(%rbx)
      
      where this is:
      
      	d_backing_inode(object->dentry)->i_blocks
      
      Fixes: a5b3a80b (CacheFiles: Provide read-and-reset release counters for cachefilesd)
      Reported-by: default avatarJianhong Yin <jiyin@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarSteve Dickson <steved@redhat.com>
      cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a818101d
    • Miklos Szeredi's avatar
      fs: rename "rename2" i_op to "rename" · 2773bf00
      Miklos Szeredi authored
      
      
      Generated patch:
      
      sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
      sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      2773bf00
    • Miklos Szeredi's avatar
      vfs: remove unused i_op->rename · 18fc84da
      Miklos Szeredi authored
      
      
      No in-tree uses remain.
      
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      18fc84da
  12. Aug 03, 2016
    • David Howells's avatar
      cachefiles: Fix race between inactivating and culling a cache object · db20a892
      David Howells authored
      
      
      There's a race between cachefiles_mark_object_inactive() and
      cachefiles_cull():
      
       (1) cachefiles_cull() can't delete a backing file until the cache object
           is marked inactive, but as soon as that's the case it's fair game.
      
       (2) cachefiles_mark_object_inactive() marks the object as being inactive
           and *only then* reads the i_blocks on the backing inode - but
           cachefiles_cull() might've managed to delete it by this point.
      
      Fix this by making sure cachefiles_mark_object_inactive() gets any data it
      needs from the backing inode before deactivating the object.
      
      Without this, the following oops may occur:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
      IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
      ...
      CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G          I    ------------   3.10.0-470.el7.x86_64 #1
      Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
      Workqueue: fscache_object fscache_object_work_func [fscache]
      task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
      RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
      RSP: 0018:ffff8800b77c3d70  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
      RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
      RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
      R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
      R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
      FS:  0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
       ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
       ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
      Call Trace:
       [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
       [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
       [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
       [<ffffffff810a605b>] process_one_work+0x17b/0x470
       [<ffffffff810a6e96>] worker_thread+0x126/0x410
       [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
       [<ffffffff810ae64f>] kthread+0xcf/0xe0
       [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff81695418>] ret_from_fork+0x58/0x90
       [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
      
      The oopsing code shows:
      
      	callq  0xffffffff810af6a0 <wake_up_bit>
      	mov    0xf8(%r12),%rax
      	mov    0x30(%rax),%rax
      	mov    0x98(%rax),%rax   <---- oops here
      	lock add %rax,0x130(%rbx)
      
      where this is:
      
      	d_backing_inode(object->dentry)->i_blocks
      
      Fixes: a5b3a80b (CacheFiles: Provide read-and-reset release counters for cachefilesd)
      Reported-by: default avatarJianhong Yin <jiyin@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarSteve Dickson <steved@redhat.com>
      cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      db20a892
  13. Jun 01, 2016
  14. May 29, 2016
  15. Apr 04, 2016
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      
      
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  16. Feb 01, 2016
    • David Howells's avatar
      CacheFiles: Provide read-and-reset release counters for cachefilesd · a5b3a80b
      David Howells authored
      
      
      Provide read-and-reset objects- and blocks-released counters for cachefilesd
      to use to work out whether there's anything new that can be culled.
      
      One of the problems cachefilesd has is that if all the objects in the cache
      are pinned by inodes lying dormant in the kernel inode cache, there isn't
      anything for it to cull.  In such a case, it just spins around walking the
      filesystem tree and scanning for something to cull.  This eats up a lot of
      CPU time.
      
      By telling cachefilesd if there have been any releases, the daemon can
      sleep until there is the possibility of something to do.
      
      cachefilesd finds this information by the following means:
      
       (1) When the control fd is read, the kernel presents a list of values of
           interest.  "freleased=N" and "breleased=N" are added to this list to
           indicate the number of files released and number of blocks released
           since the last read call.  At this point the counters are reset.
      
       (2) POLLIN is signalled if the number of files released becomes greater
           than 0.
      
      Note that by 'released' it just means that the kernel has released its
      interest in those files for the moment, not necessarily that the files
      should be deleted from the cache.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarSteve Dickson <steved@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a5b3a80b
  17. Jan 22, 2016
    • Al Viro's avatar
      wrappers for ->i_mutex access · 5955102c
      Al Viro authored
      
      
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  18. Jan 04, 2016
  19. Nov 17, 2015
  20. Nov 11, 2015
    • David Howells's avatar
      FS-Cache: Handle a write to the page immediately beyond the EOF marker · 102f4d90
      David Howells authored
      
      
      Handle a write being requested to the page immediately beyond the EOF
      marker on a cache object.  Currently this gets an assertion failure in
      CacheFiles because the EOF marker is used there to encode information about
      a partial page at the EOF - which could lead to an unknown blank spot in
      the file if we extend the file over it.
      
      The problem is actually in fscache where we check the index of the page
      being written against store_limit.  store_limit is set to the number of
      pages that we're allowed to store by fscache_set_store_limit() - which
      means it's one more than the index of the last page we're allowed to store.
      The problem is that we permit writing to a page with an index _equal_ to
      the store limit - when we should reject that case.
      
      Whilst we're at it, change the triggered assertion in CacheFiles to just
      return -ENOBUFS instead.
      
      The assertion failure looks something like this:
      
      CacheFiles: Assertion failed
      1000 < 7b1 is false
      ------------[ cut here ]------------
      kernel BUG at fs/cachefiles/rdwr.c:962!
      ...
      RIP: 0010:[<ffffffffa02c9e83>]  [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles]
      
      Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754fb (at least)
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      102f4d90
    • NeilBrown's avatar
      cachefiles: perform test on s_blocksize when opening cache file. · 95201a40
      NeilBrown authored
      
      
      cachefiles requires that s_blocksize in the cache is not greater than
      PAGE_SIZE, and performs the check every time a block is accessed.
      
      Move the test to the place where the file is "opened", where other
      file-validity tests are performed.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      95201a40
  21. Nov 07, 2015
  22. Apr 15, 2015
  23. Feb 24, 2015
  24. Feb 22, 2015
    • David Howells's avatar
      Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · ce40fa78
      David Howells authored
      
      
      Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack
      thereof) in cachefiles:
      
       (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as
           it doesn't want to deal with automounts in its cache.
      
       (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in
           cachefiles.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ce40fa78
    • David Howells's avatar
      VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8
      David Howells authored
      
      
      Convert the following where appropriate:
      
       (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
      
       (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
      
       (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
           complicated than it appears as some calls should be converted to
           d_can_lookup() instead.  The difference is whether the directory in
           question is a real dir with a ->lookup op or whether it's a fake dir with
           a ->d_automount op.
      
      In some circumstances, we can subsume checks for dentry->d_inode not being
      NULL into this, provided we the code isn't in a filesystem that expects
      d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
      use d_inode() rather than d_backing_inode() to get the inode pointer).
      
      Note that the dentry type field may be set to something other than
      DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
      manages the fall-through from a negative dentry to a lower layer.  In such a
      case, the dentry type of the negative union dentry is set to the same as the
      type of the lower dentry.
      
      However, if you know d_inode is not NULL at the call site, then you can use
      the d_is_xxx() functions even in a filesystem.
      
      There is one further complication: a 0,0 chardev dentry may be labelled
      DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
      intended for special directory entry types that don't have attached inodes.
      
      The following perl+coccinelle script was used:
      
      use strict;
      
      my @callers;
      open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
          die "Can't grep for S_ISDIR and co. callers";
      @callers = <$fd>;
      close($fd);
      unless (@callers) {
          print "No matches\n";
          exit(0);
      }
      
      my @cocci = (
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISLNK(E->d_inode->i_mode)',
          '+ d_is_symlink(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISDIR(E->d_inode->i_mode)',
          '+ d_is_dir(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISREG(E->d_inode->i_mode)',
          '+ d_is_reg(E)' );
      
      my $coccifile = "tmp.sp.cocci";
      open($fd, ">$coccifile") || die $coccifile;
      print($fd "$_\n") || die $coccifile foreach (@cocci);
      close($fd);
      
      foreach my $file (@callers) {
          chomp $file;
          print "Processing ", $file, "\n";
          system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
      	die "spatch failed";
      }
      
      [AV: overlayfs parts skipped]
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e36cb0b8
  25. Nov 19, 2014
  26. Oct 13, 2014
    • David Howells's avatar
      CacheFiles: Fix incorrect test for in-memory object collision · a30efe26
      David Howells authored
      
      
      When CacheFiles cache objects are in use, they have in-memory representations,
      as defined by the cachefiles_object struct.  These are kept in a tree rooted in
      the cache and indexed by dentry pointer (since there's a unique mapping between
      object index key and dentry).
      
      Collisions can occur between a representation already in the tree and a new
      representation being set up because it takes time to dispose of an old
      representation - particularly if it must be unlinked or renamed.
      
      When such a collision occurs, cachefiles_mark_object_active() is meant to check
      to see if the old, already-present representation is in the process of being
      discarded (ie. FSCACHE_OBJECT_IS_LIVE is not set on it) - and, if so, wait for
      the representation to be removed (ie. CACHEFILES_OBJECT_ACTIVE is then
      cleared).
      
      However, the test for whether the old representation is still live is checking
      the new object - which always will be live at this point.  This leads to an
      oops looking like:
      
      	CacheFiles: Error: Unexpected object collision
      	object: OBJ1b354
      	objstate=LOOK_UP_OBJECT fl=8 wbusy=2 ev=0[0]
      	ops=0 inp=0 exc=0
      	parent=ffff88053f5417c0
      	cookie=ffff880538f202a0 [pr=ffff8805381b7160 nd=ffff880509c6eb78 fl=27]
      	key=[8] '2490000000000000'
      	xobject: OBJ1a600
      	xobjstate=DROP_OBJECT fl=70 wbusy=2 ev=0[0]
      	xops=0 inp=0 exc=0
      	xparent=ffff88053f5417c0
      	xcookie=ffff88050f4cbf70 [pr=ffff8805381b7160 nd=          (null) fl=12]
      	------------[ cut here ]------------
      	kernel BUG at fs/cachefiles/namei.c:200!
      	...
      	Workqueue: fscache_object fscache_object_work_func [fscache]
      	...
      	RIP: ... cachefiles_walk_to_object+0x7ea/0x860 [cachefiles]
      	...
      	Call Trace:
      	 [<ffffffffa04dadd8>] ? cachefiles_lookup_object+0x58/0x100 [cachefiles]
      	 [<ffffffffa01affe9>] ? fscache_look_up_object+0xb9/0x1d0 [fscache]
      	 [<ffffffffa01afc4d>] ? fscache_parent_ready+0x2d/0x80 [fscache]
      	 [<ffffffffa01b0672>] ? fscache_object_work_func+0x92/0x1f0 [fscache]
      	 [<ffffffff8107e82b>] ? process_one_work+0x16b/0x400
      	 [<ffffffff8107fc16>] ? worker_thread+0x116/0x380
      	 [<ffffffff8107fb00>] ? manage_workers.isra.21+0x290/0x290
      	 [<ffffffff81085edc>] ? kthread+0xbc/0xe0
      	 [<ffffffff81085e20>] ? flush_kthread_worker+0x80/0x80
      	 [<ffffffff81502d0c>] ? ret_from_fork+0x7c/0xb0
      	 [<ffffffff81085e20>] ? flush_kthread_worker+0x80/0x80
      
      Reported-by: default avatarManuel Schölling <manuel.schoelling@gmx.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      a30efe26
  27. Oct 09, 2014
  28. Sep 30, 2014
    • David Howells's avatar
      CacheFiles: Handle object being killed before being set up · a3b7c004
      David Howells authored
      
      
      If a cache object gets killed whilst in the process of being set up - for
      instance if the netfs relinquishes the cookie that the object is associated
      with - then the object's state machine will transit to the DROP_OBJECT state
      without necessarily going through the LOOKUP_OBJECT or CREATE_OBJECT states.
      
      This is a problem for CacheFiles because cachefiles_drop_object() assumes that
      object->dentry will be set upon reaching the DROP_OBJECT state and has an
      ASSERT() to that effect (see the oops below) - but object->dentry doesn't get
      set until the LOOKUP_OBJECT or CREATE_OBJECT states (and not always then if
      they fail).
      
      To fix this, just make the dentry cleanup in cachefiles_drop_object()
      conditional on the dentry actually being set and remove the assertion.
      
      	CacheFiles: Assertion failed
      	------------[ cut here ]------------
      	kernel BUG at .../fs/cachefiles/namei.c:425!
      	...
      	Workqueue: fscache_object fscache_object_work_func [fscache]
      	...
      	RIP: ... cachefiles_delete_object+0xcd/0x110 [cachefiles]
      	...
      	Call Trace:
      	 [<ffffffffa043280f>] ? cachefiles_drop_object+0xff/0x130 [cachefiles]
      	 [<ffffffffa02ac511>] ? fscache_drop_object+0xd1/0x1d0 [fscache]
      	 [<ffffffffa02ac697>] ? fscache_object_work_func+0x87/0x210 [fscache]
      	 [<ffffffff81080635>] ? process_one_work+0x155/0x450
      	 [<ffffffff81081c44>] ? worker_thread+0x114/0x370
      	 [<ffffffff81081b30>] ? manage_workers.isra.21+0x2c0/0x2c0
      	 [<ffffffff81087fcc>] ? kthread+0xbc/0xe0
      	 [<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0
      	 [<ffffffff8150638c>] ? ret_from_fork+0x7c/0xb0
      	 [<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0
      
      Reported-by: default avatarManuel Schölling <manuel.schoelling@gmx.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarSteve Dickson <steved@redhat.com>
      a3b7c004
  29. Sep 26, 2014
  30. Sep 17, 2014
    • David Howells's avatar
      CacheFiles: Handle rename2 · e2cf1f1c
      David Howells authored
      
      
      Not all filesystems now provide the rename i_op - ext4 for one - but rather
      provide the rename2 i_op.  CacheFiles checks that the filesystem has rename
      and so will reject ext4 now with EPERM:
      
      	CacheFiles: Failed to register: -1
      
      Fix this by checking for rename2 as an alternative.  The call to vfs_rename()
      actually handles selection of the appropriate function, so we needn't worry
      about that.
      
      Turning on debugging shows:
      
      	[cachef] ==> cachefiles_get_directory(,,cache)
      	[cachef] subdir -> ffff88000b22b778 positive
      	[cachef] <== cachefiles_get_directory() = -1 [check]
      
      where -1 is EPERM.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      e2cf1f1c
Loading