Skip to content
  1. Dec 20, 2010
  2. Dec 17, 2010
  3. Dec 16, 2010
  4. Dec 15, 2010
  5. Dec 14, 2010
    • Theodore Ts'o's avatar
      ext4: Turn off multiple page-io submission by default · 1449032b
      Theodore Ts'o authored
      
      
      Jon Nelson has found a test case which causes postgresql to fail with
      the error:
      
      psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581
      
      Under memory pressure, it looks like part of a file can end up getting
      replaced by zero's.  Until we can figure out the cause, we'll roll
      back the change and use block_write_full_page() instead of
      ext4_bio_write_page().  The new, more efficient writing function can
      be used via the mount option mblk_io_submit, so we can test and fix
      the new page I/O code.
      
      To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
      memory such that the system just at the end of triggering writeback
      before running the following sql script:
      
      begin;
      create temporary table foo as select x as a, ARRAY[x] as b FROM
      generate_series(1, 10000000 ) AS x;
      create index foo_a_idx on foo (a);
      create index foo_b_idx on foo USING GIN (b);
      rollback;
      
      If the temporary table is created on a hard drive partition which is
      encrypted using dm_crypt, then under memory pressure, approximately
      30-40% of the time, pgsql will issue the above failure.
      
      This patch should fix this problem, and the problem will come back if
      the file system is mounted with the mblk_io_submit mount option.
      
      Reported-by: default avatarJon Nelson <jnelson@jamponi.net>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1449032b
    • Chris Mason's avatar
      Btrfs: prevent RAID level downgrades when space is low · 83a50de9
      Chris Mason authored
      
      
      The extent allocator has code that allows us to fill
      allocations from any available block group, even if it doesn't
      match the raid level we've requested.
      
      This was put in because adding a new drive to a filesystem
      made with the default mkfs options actually upgrades the metadata from
      single spindle dup to full RAID1.
      
      But, the code also allows us to allocate from a raid0 chunk when we
      really want a raid1 or raid10 chunk.  This can cause big trouble because
      mkfs creates a small (4MB) raid0 chunk for data and metadata which then
      goes unused for raid1/raid10 installs.
      
      The allocator will happily wander in and allocate from that chunk when
      things get tight, which is not correct.
      
      The fix here is to make sure that we provide duplication when the
      caller has asked for it.  It does all the dups to be any raid level,
      which preserves the dup->raid1 upgrade abilities.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      83a50de9
    • Chris Mason's avatar
      Btrfs: account for missing devices in RAID allocation profiles · cd02dca5
      Chris Mason authored
      
      
      When we mount in RAID degraded mode without adding a new device to
      replace the failed one, we can end up using the wrong RAID flags for
      allocations.
      
      This results in strange combinations of block groups (raid1 in a raid10
      filesystem) and corruptions when we try to allocate blocks from single
      spindle chunks on drives that are actually missing.
      
      The first device has two small 4MB chunks in it that mkfs creates and
      these are usually unused in a raid1 or raid10 setup.  But, in -o degraded,
      the allocator will fall back to these because the mask of desired raid groups
      isn't correct.
      
      The fix here is to count the missing devices as we build up the list
      of devices in the system.  This count is used when picking the
      raid level to make sure we continue using the same levels that were
      in place before we lost a drive.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      cd02dca5
  6. Dec 13, 2010
    • Chris Mason's avatar
      Btrfs: EIO when we fail to read tree roots · 68433b73
      Chris Mason authored
      
      
      If we just get a plain IO error when we read tree roots, the code
      wasn't properly sending that error up the chain.  This allowed mounts to
      continue when they should failed, and allowed operations
      on partially setup root structs.  The end result was usually oopsen
      on spinlocks that hadn't been spun up correctly.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      68433b73
  7. Dec 10, 2010
  8. Dec 09, 2010
  9. Dec 08, 2010
  10. Dec 07, 2010
    • Lino Sanfilippo's avatar
      fanotify: Dont try to open a file descriptor for the overflow event · fdbf3cee
      Lino Sanfilippo authored
      
      
      We should not try to open a file descriptor for the overflow event since this
      will always fail.
      
      Signed-off-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      fdbf3cee
    • Eric Paris's avatar
      fanotify: do not leak user reference on allocation failure · 26379198
      Eric Paris authored
      
      
      If fanotify_init is unable to allocate a new fsnotify group it will
      return but will not drop its reference on the associated user struct.
      Drop that reference on error.
      
      Reported-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      26379198
    • Eric Paris's avatar
      inotify: stop kernel memory leak on file creation failure · a2ae4cc9
      Eric Paris authored
      
      
      If inotify_init is unable to allocate a new file for the new inotify
      group we leak the new group.  This patch drops the reference on the
      group on file allocation failure.
      
      Reported-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      cc: stable@kernel.org
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      a2ae4cc9
    • Lino Sanfilippo's avatar
      fanotify: on group destroy allow all waiters to bypass permission check · 09e5f14e
      Lino Sanfilippo authored
      
      
      When fanotify_release() is called, there may still be processes waiting for
      access permission. Currently only processes for which an event has already been
      queued into the groups access list will be woken up.  Processes for which no
      event has been queued will continue to sleep and thus cause a deadlock when
      fsnotify_put_group() is called.
      Furthermore there is a race allowing further processes to be waiting on the
      access wait queue after wake_up (if they arrive before clear_marks_by_group()
      is called).
      This patch corrects this by setting a flag to inform processes that the group
      is about to be destroyed and thus not to wait for access permission.
      
      [additional changelog from eparis]
      Lets think about the 4 relevant code paths from the PoV of the
      'operator' 'listener' 'responder' and 'closer'.  Where operator is the
      process doing an action (like open/read) which could require permission.
      Listener is the task (or in this case thread) slated with reading from
      the fanotify file descriptor.  The 'responder' is the thread responsible
      for responding to access requests.  'Closer' is the thread attempting to
      close the fanotify file descriptor.
      
      The 'operator' is going to end up in:
      fanotify_handle_event()
        get_response_from_access()
          (THIS BLOCKS WAITING ON USERSPACE)
      
      The 'listener' interesting code path
      fanotify_read()
        copy_event_to_user()
          prepare_for_access_response()
            (THIS CREATES AN fanotify_response_event)
      
      The 'responder' code path:
      fanotify_write()
        process_access_response()
          (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator')
      
      The 'closer':
      fanotify_release()
        (SUPPOSED TO CLEAN UP THE REST OF THIS MESS)
      
      What we have today is that in the closer we remove all of the
      fanotify_response_events and set a bit so no more response events are
      ever created in prepare_for_access_response().
      
      The bug is that we never wake all of the operators up and tell them to
      move along.  You fix that in fanotify_get_response_from_access().  You
      also fix other operators which haven't gotten there yet.  So I agree
      that's a good fix.
      [/additional changelog from eparis]
      
      [remove additional changes to minimize patch size]
      [move initialization so it was inside CONFIG_FANOTIFY_PERMISSION]
      
      Signed-off-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      09e5f14e
    • Lino Sanfilippo's avatar
      fanotify: Dont allow a mask of 0 if setting or removing a mark · 1734dee4
      Lino Sanfilippo authored
      
      
      In mark_remove_from_mask() we destroy marks that have their event mask cleared.
      Thus we should not allow the creation of those marks in the first place.
      With this patch we check if the mask given from user is 0 in case of FAN_MARK_ADD.
      If so we return an error. Same for FAN_MARK_REMOVE since this does not have any
      effect.
      
      Signed-off-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      1734dee4
    • Lino Sanfilippo's avatar
      fanotify: correct broken ref counting in case adding a mark failed · fa218ab9
      Lino Sanfilippo authored
      
      
      If adding a mount or inode mark failed fanotify_free_mark() is called explicitly.
      But at this time the mark has already been put into the destroy list of the
      fsnotify_mark kernel thread. If the thread is too slow it will try to decrease
      the reference of a mark, that has already been freed by fanotify_free_mark().
      (If its fast enough it will only decrease the marks ref counter from 2 to 1 - note
      that the counter has been increased to 2 in add_mark() - which has practically no
      effect.)
      
      This patch fixes the ref counting by not calling free_mark() explicitly, but
      decreasing the ref counter and rely on the fsnotify_mark thread to cleanup in
      case adding the mark has failed.
      
      Signed-off-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      fa218ab9
    • Lino Sanfilippo's avatar
      fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called · b1085ba8
      Lino Sanfilippo authored
      
      
      Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm()
      is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission
      checks, so a user can still disable permission checks by setting this flag
      in an open() call.
      This patch corrects this by unsetting the flag before fsnotify_perm is called.
      
      Signed-off-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      b1085ba8
    • Eric Paris's avatar
      fanotify: deny permissions when no event was sent · ecf6f5e7
      Eric Paris authored
      
      
      If no event was sent to userspace we cannot expect userspace to respond to
      permissions requests.  Today such requests just hang forever. This patch will
      deny any permissions event which was unable to be sent to userspace.
      
      Reported-by: default avatarTvrtko Ursulin <tvrtko.ursulin@sophos.com>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      ecf6f5e7
    • Jeff Layton's avatar
      cifs: allow calling cifs_build_path_to_root on incomplete cifs_sb · 7d161b7f
      Jeff Layton authored
      
      
      It's possible that cifs_mount will call cifs_build_path_to_root on a
      newly instantiated cifs_sb. In that case, it's likely that the
      master_tlink pointer has not yet been instantiated.
      
      Fix this by having cifs_build_path_to_root take a cifsTconInfo pointer
      as well, and have the caller pass that in.
      
      Reported-and-Tested-by: default avatarRobbert Kouprie <robbert@exx.nl>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
      7d161b7f
Loading