Skip to content
  1. Dec 17, 2010
  2. Dec 15, 2010
  3. Dec 14, 2010
    • Theodore Ts'o's avatar
      ext4: Turn off multiple page-io submission by default · 1449032b
      Theodore Ts'o authored
      
      
      Jon Nelson has found a test case which causes postgresql to fail with
      the error:
      
      psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581
      
      Under memory pressure, it looks like part of a file can end up getting
      replaced by zero's.  Until we can figure out the cause, we'll roll
      back the change and use block_write_full_page() instead of
      ext4_bio_write_page().  The new, more efficient writing function can
      be used via the mount option mblk_io_submit, so we can test and fix
      the new page I/O code.
      
      To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
      memory such that the system just at the end of triggering writeback
      before running the following sql script:
      
      begin;
      create temporary table foo as select x as a, ARRAY[x] as b FROM
      generate_series(1, 10000000 ) AS x;
      create index foo_a_idx on foo (a);
      create index foo_b_idx on foo USING GIN (b);
      rollback;
      
      If the temporary table is created on a hard drive partition which is
      encrypted using dm_crypt, then under memory pressure, approximately
      30-40% of the time, pgsql will issue the above failure.
      
      This patch should fix this problem, and the problem will come back if
      the file system is mounted with the mblk_io_submit mount option.
      
      Reported-by: default avatarJon Nelson <jnelson@jamponi.net>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1449032b
    • Chris Mason's avatar
      Btrfs: prevent RAID level downgrades when space is low · 83a50de9
      Chris Mason authored
      
      
      The extent allocator has code that allows us to fill
      allocations from any available block group, even if it doesn't
      match the raid level we've requested.
      
      This was put in because adding a new drive to a filesystem
      made with the default mkfs options actually upgrades the metadata from
      single spindle dup to full RAID1.
      
      But, the code also allows us to allocate from a raid0 chunk when we
      really want a raid1 or raid10 chunk.  This can cause big trouble because
      mkfs creates a small (4MB) raid0 chunk for data and metadata which then
      goes unused for raid1/raid10 installs.
      
      The allocator will happily wander in and allocate from that chunk when
      things get tight, which is not correct.
      
      The fix here is to make sure that we provide duplication when the
      caller has asked for it.  It does all the dups to be any raid level,
      which preserves the dup->raid1 upgrade abilities.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      83a50de9
    • Chris Mason's avatar
      Btrfs: account for missing devices in RAID allocation profiles · cd02dca5
      Chris Mason authored
      
      
      When we mount in RAID degraded mode without adding a new device to
      replace the failed one, we can end up using the wrong RAID flags for
      allocations.
      
      This results in strange combinations of block groups (raid1 in a raid10
      filesystem) and corruptions when we try to allocate blocks from single
      spindle chunks on drives that are actually missing.
      
      The first device has two small 4MB chunks in it that mkfs creates and
      these are usually unused in a raid1 or raid10 setup.  But, in -o degraded,
      the allocator will fall back to these because the mask of desired raid groups
      isn't correct.
      
      The fix here is to count the missing devices as we build up the list
      of devices in the system.  This count is used when picking the
      raid level to make sure we continue using the same levels that were
      in place before we lost a drive.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      cd02dca5
  4. Dec 13, 2010
    • Chris Mason's avatar
      Btrfs: EIO when we fail to read tree roots · 68433b73
      Chris Mason authored
      
      
      If we just get a plain IO error when we read tree roots, the code
      wasn't properly sending that error up the chain.  This allowed mounts to
      continue when they should failed, and allowed operations
      on partially setup root structs.  The end result was usually oopsen
      on spinlocks that hadn't been spun up correctly.
      
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      68433b73
  5. Dec 10, 2010
  6. Dec 09, 2010
  7. Dec 08, 2010
  8. Dec 07, 2010
  9. Dec 06, 2010
  10. Dec 02, 2010
    • Frederic Weisbecker's avatar
      reiserfs: don't acquire lock recursively in reiserfs_acl_chmod · 238af875
      Frederic Weisbecker authored
      
      
      reiserfs_acl_chmod() can be called by reiserfs_set_attr() and then take
      the reiserfs lock a second time.  Thereafter it may call journal_begin()
      that definitely requires the lock not to be nested in order to release
      it before taking the journal mutex because the reiserfs lock depends on
      the journal mutex already.
      
      So, aviod nesting the lock in reiserfs_acl_chmod().
      
      Reported-by: default avatarPawel Zawora <pzawora@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: default avatarPawel Zawora <pzawora@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: <stable@kernel.org>		[2.6.32.x+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      238af875
    • Suresh Jayaraman's avatar
      cifs: add attribute cache timeout (actimeo) tunable · 6d20e840
      Suresh Jayaraman authored
      
      
      Currently, the attribute cache timeout for CIFS is hardcoded to 1 second. This
      means that the client might have to issue a QPATHINFO/QFILEINFO call every 1
      second to verify if something has changes, which seems too expensive. On the
      other hand, if the timeout is hardcoded to a higher value, workloads that
      expect strict cache coherency might see unexpected results.
      
      Making attribute cache timeout as a tunable will allow us to make a tradeoff
      between performance and cache metadata correctness depending on the
      application/workload needs.
      
      Add 'actimeo' tunable that can be used to tune the attribute cache timeout.
      The default timeout is set to 1 second. Also, display actimeo option value in
      /proc/mounts.
      
      It appears to me that 'actimeo' and the proposed (but not yet merged)
      'strictcache' option cannot coexist, so care must be taken that we reset the
      other option if one of them is set.
      
      Changes since last post:
         - fix option parsing and handle possible values correcly
      
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
      6d20e840
    • Trond Myklebust's avatar
      NFS: Fix a memory leak in nfs_readdir · 11de3b11
      Trond Myklebust authored
      
      
      We need to ensure that the entries in the nfs_cache_array get cleared
      when the page is removed from the page cache. To do so, we use the
      freepage address_space operation.
      
      Change nfs_readdir_clear_array to use kmap_atomic(), so that the
      function can be safely called from all contexts.
      
      Finally, modify the cache_page_release helper to call
      nfs_readdir_clear_array directly, when dealing with an anonymous
      page from 'uncached_readdir'.
      
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      11de3b11
  11. Dec 01, 2010
Loading