Skip to content
  1. May 14, 2018
    • David Howells's avatar
      afs: Fix refcounting in callback registration · d4a96bec
      David Howells authored
      
      
      The refcounting on afs_cb_interest struct objects in
      afs_register_server_cb_interest() is wrong as it uses the server list
      entry's call back interest pointer without regard for the fact that it
      might be replaced at any time and the object thrown away.
      
      Fix this by:
      
       (1) Put a lock on the afs_server_list struct that can be used to
           mediate access to the callback interest pointers in the servers array.
      
       (2) Keep a ref on the callback interest that we get from the entry.
      
       (3) Dropping the old reference held by vnode->cb_interest if we replace
           the pointer.
      
      Fixes: c435ee34 ("afs: Overhaul the callback handling")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d4a96bec
    • David Howells's avatar
      afs: Fix giving up callbacks on server destruction · f2686b09
      David Howells authored
      
      
      When a server record is destroyed, we want to send a message to the server
      telling it that we're giving up all the callbacks it has promised us.
      
      Apply two fixes to this:
      
       (1) Only send the FS.GiveUpAllCallBacks message if we actually got a
           callback from that server.  We assume this to be the case if we
           performed at least one successful FS operation on that server.
      
       (2) Send it to the address last used for that server rather than always
           picking the first address in the list (which might be unreachable).
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f2686b09
    • David Howells's avatar
      afs: Fix address list parsing · 01fd79e6
      David Howells authored
      
      
      The parsing of port specifiers in the address list obtained from the DNS
      resolution upcall doesn't work as in4_pton() and in6_pton() will fail on
      encountering an unexpected delimiter (in this case, the '+' marking the
      port number).  However, in*_pton() can't be given multiple specifiers.
      
      Fix this by finding the delimiter in advance and not relying on in*_pton()
      to find the end of the address for us.
      
      Fixes: 8b2a464c ("afs: Add an address list concept")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      01fd79e6
    • David Howells's avatar
      afs: Fix directory page locking · b61f7dcf
      David Howells authored
      
      
      The afs directory loading code (primarily afs_read_dir()) locks all the
      pages that hold a directory's content blob to defend against
      getdents/getdents races and getdents/lookup races where the competitors
      issue conflicting reads on the same data.  As the reads will complete
      consecutively, they may retrieve different versions of the data and
      one may overwrite the data that the other is busy parsing.
      
      Fix this by not locking the pages at all, but rather by turning the
      validation lock into an rwsem and getting an exclusive lock on it whilst
      reading the data or validating the attributes and a shared lock whilst
      parsing the data.  Sharing the attribute validation lock should be fine as
      the data fetch will retrieve the attributes also.
      
      The individual page locks aren't needed at all as the only place they're
      being used is to serialise data loading.
      
      Without this patch, the:
      
       	if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
      		...
      	}
      
      part of afs_read_dir() may be skipped, leaving the pages unlocked when we
      hit the success: clause - in which case we try to unlock the not-locked
      pages, leading to the following oops:
      
        page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
        flags: 0xfffe000001004(referenced|private)
        raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
        raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
        page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
        page->mem_cgroup:ffff98156b27c000
        ------------[ cut here ]------------
        kernel BUG at mm/filemap.c:1205!
        ...
        RIP: 0010:unlock_page+0x43/0x50
        ...
        Call Trace:
         afs_dir_iterate+0x789/0x8f0 [kafs]
         ? _cond_resched+0x15/0x30
         ? kmem_cache_alloc_trace+0x166/0x1d0
         ? afs_do_lookup+0x69/0x490 [kafs]
         ? afs_do_lookup+0x101/0x490 [kafs]
         ? key_default_cmp+0x20/0x20
         ? request_key+0x3c/0x80
         ? afs_lookup+0xf1/0x340 [kafs]
         ? __lookup_slow+0x97/0x150
         ? lookup_slow+0x35/0x50
         ? walk_component+0x1bf/0x490
         ? path_lookupat.isra.52+0x75/0x200
         ? filename_lookup.part.66+0xa0/0x170
         ? afs_end_vnode_operation+0x41/0x60 [kafs]
         ? __check_object_size+0x9c/0x171
         ? strncpy_from_user+0x4a/0x170
         ? vfs_statx+0x73/0xe0
         ? __do_sys_newlstat+0x39/0x70
         ? __x64_sys_getdents+0xc9/0x140
         ? __x64_sys_getdents+0x140/0x140
         ? do_syscall_64+0x5b/0x160
         ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: f3ddee8d ("afs: Fix directory handling")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b61f7dcf
  2. May 12, 2018
  3. May 10, 2018
  4. May 03, 2018
    • Jan Kara's avatar
      bdi: Fix oops in wb_workfn() · b8b78495
      Jan Kara authored
      
      
      Syzbot has reported that it can hit a NULL pointer dereference in
      wb_workfn() due to wb->bdi->dev being NULL. This indicates that
      wb_workfn() was called for an already unregistered bdi which should not
      happen as wb_shutdown() called from bdi_unregister() should make sure
      all pending writeback works are completed before bdi is unregistered.
      Except that wb_workfn() itself can requeue the work with:
      
      	mod_delayed_work(bdi_wq, &wb->dwork, 0);
      
      and if this happens while wb_shutdown() is waiting in:
      
      	flush_delayed_work(&wb->dwork);
      
      the dwork can get executed after wb_shutdown() has finished and
      bdi_unregister() has cleared wb->bdi->dev.
      
      Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
      the necessary precautions against racing with bdi unregistration.
      
      CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      CC: Tejun Heo <tj@kernel.org>
      Fixes: 839a8e86
      Reported-by: default avatarsyzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b8b78495
  5. May 02, 2018
    • Darrick J. Wong's avatar
      xfs: cap the length of deduplication requests · 021ba8e9
      Darrick J. Wong authored
      
      
      Since deduplication potentially has to read in all the pages in both
      files in order to compare the contents, cap the deduplication request
      length at MAX_RW_COUNT/2 (roughly 1GB) so that we have /some/ upper bound
      on the request length and can't just lock up the kernel forever.  Found
      by running generic/304 after commit 1ddae54555b62 ("common/rc: add
      missing 'local' keywords").
      
      Reported-by: default avatar <matorola@gmail.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      021ba8e9
    • Filipe Manana's avatar
      Btrfs: send, fix missing truncate for inode with prealloc extent past eof · a6aa10c7
      Filipe Manana authored
      
      
      An incremental send operation can miss a truncate operation when an inode
      has an increased size in the send snapshot and a prealloc extent beyond
      its size.
      
      Consider the following scenario where a necessary truncate operation is
      missing in the incremental send stream:
      
      1) In the parent snapshot an inode has a size of 1282957 bytes and it has
         no prealloc extents beyond its size;
      
      2) In the the send snapshot it has a size of 5738496 bytes and has a new
         extent at offsets 1884160 (length of 106496 bytes) and a prealloc
         extent beyond eof at offset 6729728 (and a length of 339968 bytes);
      
      3) When processing the prealloc extent, at offset 6729728, we end up at
         send.c:send_write_or_clone() and set the @len variable to a value of
         18446744073708560384 because @offset plus the original @len value is
         larger then the inode's size (6729728 + 339968 > 5738496). We then
         call send_extent_data(), with that @offset and @len, which in turn
         calls send_write(), and then the later calls fill_read_buf(). Because
         the offset passed to fill_read_buf() is greater then inode's i_size,
         this function returns 0 immediately, which makes send_write() and
         send_extent_data() do nothing and return immediately as well. When
         we get back to send.c:send_write_or_clone() we adjust the value
         of sctx->cur_inode_next_write_offset to @offset plus @len, which
         corresponds to 6729728 + 18446744073708560384 = 5738496, which is
         precisely the the size of the inode in the send snapshot;
      
      4) Later when at send.c:finish_inode_if_needed() we determine that
         we don't need to issue a truncate operation because the value of
         sctx->cur_inode_next_write_offset corresponds to the inode's new
         size, 5738496 bytes. This is wrong because the last write operation
         that was issued started at offset 1884160 with a length of 106496
         bytes, so the correct value for sctx->cur_inode_next_write_offset
         should be 1990656 (1884160 + 106496), so that a truncate operation
         with a value of 5738496 bytes would have been sent to insert a
         trailing hole at the destination.
      
      So fix the issue by making send.c:send_write_or_clone() not attempt
      to send write or clone operations for extents that start beyond the
      inode's size, since such attempts do nothing but waste time by
      calling helper functions and allocating path structures, and send
      currently has no fallocate command in order to create prealloc extents
      at the destination (either beyond a file's eof or not).
      
      The issue was found running the test btrfs/007 from fstests using a seed
      value of 1524346151 for fsstress.
      
      Reported-by: default avatarGu, Jinxiang <gujx@cn.fujitsu.com>
      Fixes: ffa7c429 ("Btrfs: send, do not issue unnecessary truncate operations")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a6aa10c7
    • ethanwu's avatar
      btrfs: Take trans lock before access running trans in check_delayed_ref · 998ac6d2
      ethanwu authored
      
      
      In preivous patch:
      Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist
      We avoid starting btrfs transaction and get this information from
      fs_info->running_transaction directly.
      
      When accessing running_transaction in check_delayed_ref, there's a
      chance that current transaction will be freed by commit transaction
      after the NULL pointer check of running_transaction is passed.
      
      After looking all the other places using fs_info->running_transaction,
      they are either protected by trans_lock or holding the transactions.
      
      Fix this by using trans_lock and increasing the use_count.
      
      Fixes: e4c3b2dc ("Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist")
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: default avatarethanwu <ethanwu@synology.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      998ac6d2
  6. Apr 27, 2018
  7. Apr 26, 2018
  8. Apr 25, 2018
  9. Apr 24, 2018
  10. Apr 23, 2018
  11. Apr 21, 2018
  12. Apr 20, 2018
    • Aurelien Aptel's avatar
      CIFS: fix typo in cifs_dbg · 596632de
      Aurelien Aptel authored
      
      
      Signed-off-by: default avatarAurelien Aptel <aaptel@suse.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Reported-by: default avatarLong Li <longli@microsoft.com>
      596632de
    • Steve French's avatar
      cifs: do not allow creating sockets except with SMB1 posix exensions · 1d0cffa6
      Steve French authored
      
      
      RHBZ: 1453123
      
      Since at least the 3.10 kernel and likely a lot earlier we have
      not been able to create unix domain sockets in a cifs share
      when mounted using the SFU mount option (except when mounted
      with the cifs unix extensions to Samba e.g.)
      Trying to create a socket, for example using the af_unix command from
      xfstests will cause :
      BUG: unable to handle kernel NULL pointer dereference at 00000000
      00000040
      
      Since no one uses or depends on being able to create unix domains sockets
      on a cifs share the easiest fix to stop this vulnerability is to simply
      not allow creation of any other special files than char or block devices
      when sfu is used.
      
      Added update to Ronnie's patch to handle a tcon link leak, and
      to address a buf leak noticed by Gustavo and Colin.
      
      Acked-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      CC:  Colin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Reported-by: default avatarEryu Guan <eguan@redhat.com>
      Signed-off-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Cc: stable@vger.kernel.org
      1d0cffa6
    • Long Li's avatar
      cifs: smbd: Dump SMB packet when configured · ff30b89e
      Long Li authored
      
      
      When sending through SMB Direct, also dump the packet in SMB send path.
      
      Also fixed a typo in debug message.
      
      Signed-off-by: default avatarLong Li <longli@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Reviewed-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
      ff30b89e
    • Qu Wenruo's avatar
      btrfs: print-tree: debugging output enhancement · c0872323
      Qu Wenruo authored
      
      
      This patch enhances the following things:
      
      - tree block header
        * add generation and owner output for node and leaf
      - node pointer generation output
      - allow btrfs_print_tree() to not follow nodes
        * just like btrfs-progs
      
      Please note that, although function btrfs_print_tree() is not called by
      anyone right now, it's still a pretty useful function to debug kernel.
      So that function is still kept for later use.
      
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c0872323
    • Nikolay Borisov's avatar
      btrfs: Fix race condition between delayed refs and blockgroup removal · 5e388e95
      Nikolay Borisov authored
      
      
      When the delayed refs for a head are all run, eventually
      cleanup_ref_head is called which (in case of deletion) obtains a
      reference for the relevant btrfs_space_info struct by querying the bg
      for the range. This is problematic because when the last extent of a
      bg is deleted a race window emerges between removal of that bg and the
      subsequent invocation of cleanup_ref_head. This can result in cache being null
      and either a null pointer dereference or assertion failure.
      
      	task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000
      	RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs]
      	RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292
      	RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000
      	RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8
      	RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001
      	R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0
      	R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0
      	FS:  00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      	CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0
      	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      	Call Trace:
      	 __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs]
      	 btrfs_run_delayed_refs+0x68/0x250 [btrfs]
      	 btrfs_should_end_transaction+0x42/0x60 [btrfs]
      	 btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs]
      	 btrfs_evict_inode+0x4c6/0x5c0 [btrfs]
      	 evict+0xc6/0x190
      	 do_unlinkat+0x19c/0x300
      	 do_syscall_64+0x74/0x140
      	 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      	RIP: 0033:0x7fbf589c57a7
      
      To fix this, introduce a new flag "is_system" to head_ref structs,
      which is populated at insertion time. This allows to decouple the
      querying for the spaceinfo from querying the possibly deleted bg.
      
      Fixes: d7eae340 ("Btrfs: rework delayed ref total_bytes_pinned accounting")
      CC: stable@vger.kernel.org # 4.14+
      Suggested-by: default avatarOmar Sandoval <osandov@osandov.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5e388e95
    • David Howells's avatar
      vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion · a9e5b732
      David Howells authored
      In do_mount() when the MS_* flags are being converted to MNT_* flags,
      MS_RDONLY got accidentally convered to SB_RDONLY.
      
      Undo this change.
      
      Fixes: e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags")
      Signed-off-by: David Howells <dhowells@redhat.com>
      Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
      a9e5b732
    • David Howells's avatar
      afs: Fix server record deletion · 66062592
      David Howells authored
      
      
      AFS server records get removed from the net->fs_servers tree when
      they're deleted, but not from the net->fs_addresses{4,6} lists, which
      can lead to an oops in afs_find_server() when a server record has been
      removed, for instance during rmmod.
      
      Fix this by deleting the record from the by-address lists before posting
      it for RCU destruction.
      
      The reason this hasn't been noticed before is that the fileserver keeps
      probing the local cache manager, thereby keeping the service record
      alive, so the oops would only happen when a fileserver eventually gets
      bored and stops pinging or if the module gets rmmod'd and a call comes
      in from the fileserver during the window between the server records
      being destroyed and the socket being closed.
      
      The oops looks something like:
      
        BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
        ...
        Workqueue: kafsd afs_process_async_call [kafs]
        RIP: 0010:afs_find_server+0x271/0x36f [kafs]
        ...
        Call Trace:
         afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
         afs_deliver_to_call+0x1ee/0x5e8 [kafs]
         afs_process_async_call+0x5b/0xd0 [kafs]
         process_one_work+0x2c2/0x504
         worker_thread+0x1d4/0x2ac
         kthread+0x11f/0x127
         ret_from_fork+0x24/0x30
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      66062592
    • Al Viro's avatar
      Don't leak MNT_INTERNAL away from internal mounts · 16a34adb
      Al Viro authored
      
      
      We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
      their copies.  As it is, creating a deep stack of bindings of /proc/*/ns/*
      somewhere in a new namespace and exiting yields a stack overflow.
      
      Cc: stable@kernel.org
      Reported-by: default avatarAlexander Aring <aring@mojatatu.com>
      Bisected-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: default avatarAlexander Aring <aring@mojatatu.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      16a34adb
  13. Apr 19, 2018
  14. Apr 18, 2018
    • David Sterba's avatar
      btrfs: fix unaligned access in readdir · 92d32170
      David Sterba authored
      
      
      The last update to readdir introduced a temporary buffer to store the
      emitted readdir data, but as there are file names of variable length,
      there's a lot of unaligned access.
      
      This was observed on a sparc64 machine:
      
        Kernel unaligned access at TPC[102f3080] btrfs_real_readdir+0x51c/0x718 [btrfs]
      
      Fixes: 23b5ec74 ("btrfs: fix readdir deadlock with pagefault")
      CC: stable@vger.kernel.org # 4.14+
      Reported-and-tested-by: default avatarRené Rebe <rene@exactcode.com>
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      92d32170
    • Theodore Ts'o's avatar
      ext4: set h_journal if there is a failure starting a reserved handle · b2569260
      Theodore Ts'o authored
      
      
      If ext4 tries to start a reserved handle via
      jbd2_journal_start_reserved(), and the journal has been aborted, this
      can result in a NULL pointer dereference.  This is because the fields
      h_journal and h_transaction in the handle structure share the same
      memory, via a union, so jbd2_journal_start_reserved() will clear
      h_journal before calling start_this_handle().  If this function fails
      due to an aborted handle, h_journal will still be NULL, and the call
      to jbd2_journal_free_reserved() will pass a NULL journal to
      sub_reserve_credits().
      
      This can be reproduced by running "kvm-xfstests -c dioread_nolock
      generic/475".
      
      Cc: stable@kernel.org # 3.11
      Fixes: 8f7d89f3 ("jbd2: transaction reservation support")
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      b2569260
    • Qu Wenruo's avatar
      btrfs: Fix wrong btrfs_delalloc_release_extents parameter · 336a8bb8
      Qu Wenruo authored
      
      
      Commit 43b18595 ("btrfs: qgroup: Use separate meta reservation type
      for delalloc") merged into mainline is not the latest version submitted
      to mail list in Dec 2017.
      
      It has a fatal wrong @qgroup_free parameter, which results increasing
      qgroup metadata pertrans reserved space, and causing a lot of early EDQUOT.
      
      Fix it by applying the correct diff on top of current branch.
      
      Fixes: 43b18595 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      336a8bb8
Loading