Commits · 1ab6c4997e04a00c50c6d786c2f046adc0d1f5de · jan.koester / Linux

Jul 12, 2013

bcache: Allocation kthread fixes · 79826c35

Kent Overstreet authored Jul 10, 2013



The alloc kthread should've been using try_to_freeze() - and also there
was the potential for the alloc kthread to get woken up after it had
shut down, which would have been bad.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>

79826c35

bcache: Fix GC_SECTORS_USED() calculation · 29ebf465

Kent Overstreet authored Jul 11, 2013

Part of the job of garbage collection is to add up however many sectors
of live data it finds in each bucket, but that doesn't work very well if
it doesn't reset GC_SECTORS_USED() when it starts. Whoops.

This wouldn't have broken anything horribly, but allocation tries to
preferentially reclaim buckets that are mostly empty and that's not
gonna work with an incorrect GC_SECTORS_USED() value.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

29ebf465

bcache: Journal replay fix · faa56736

Kent Overstreet authored Jul 11, 2013



The journal replay code starts by finding something that looks like a
valid journal entry, then it does a binary search over the unchecked
region of the journal for the journal entries with the highest sequence
numbers.

Trouble is, the logic was wrong - journal_read_bucket() returns true if
it found journal entries we need, but if the range of journal entries
we're looking for loops around the end of the journal - in that case
journal_read_bucket() could return true when it hadn't found the highest
sequence number we'd seen yet, and in that case the binary search did
the wrong thing. Whoops.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

faa56736

bcache: Shutdown fix · 5caa52af

Kent Overstreet authored Jul 10, 2013



Stopping a cache set is supposed to make it stop attached backing
devices, but somewhere along the way that code got lost. Fixing this
mainly has the effect of fixing our reboot notifier.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

5caa52af

bcache: Fix a sysfs splat on shutdown · c9502ea4

Kent Overstreet authored Jul 10, 2013



If we stopped a bcache device when we were already detaching (or
something like that), bcache_device_unlink() would try to remove a
symlink from sysfs that was already gone because the bcache dev kobject
had already been removed from sysfs.

So keep track of whether we've removed stuff from sysfs.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

c9502ea4

bcache: Advertise that flushes are supported · 54d12f2b

Kent Overstreet authored Jul 10, 2013



Whoops - bcache's flush/FUA was mostly correct, but flushes get filtered
out unless we say we support them...

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

54d12f2b

bcache: check for allocation failures · d2a65ce2

Dan Carpenter authored Jul 05, 2013



There is a missing NULL check after the kzalloc().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

d2a65ce2

bcache: Fix a dumb race · 6aa8f1a6

Kent Overstreet authored Jul 10, 2013

In the far-too-complicated closure code - closures can have destructors,
for probably dubious reasons; they get run after the closure is no
longer waiting on anything but before dropping the parent ref, intended
just for freeing whatever memory the closure is embedded in.

Trouble is, when remaining goes to 0 and we've got nothing more to run -
we also have to unlock the closure, setting remaining to -1. If there's
a destructor, that unlock isn't doing anything - nobody could be trying
to lock it if we're about to free it - but if the unlock _is needed...
that check for a destructor was racy. Argh.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org> # >= v3.10

6aa8f1a6

Jul 01, 2013

bcache: Use standard utility code · 8e51e414

Kent Overstreet authored Jun 06, 2013



Some of bcache's utility code has made it into the rest of the kernel,
so drop the bcache versions.

Bcache used to have a workaround for allocating from a bio set under
generic_make_request() (if you allocated more than once, the bios you
already allocated would get stuck on current->bio_list when you
submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT
when allocating bios under generic_make_request() so that allocation
could fail and it could retry from workqueue. But bio_alloc_bioset() has
a workaround now, so we can drop this hack and the associated error
handling.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

8e51e414

bcache: Delete fuzz tester · f3059a54

Kent Overstreet authored May 15, 2013



This code has rotted and it hasn't been used in ages anyways.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>

f3059a54

bcache: Document shrinker reserve better · 36c9ea98
Kent Overstreet authored Jun 03, 2013
```
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
```
36c9ea98

bcache: FUA fixes · e49c7c37

Kent Overstreet authored Jun 26, 2013



Journal writes need to be marked FUA, not just REQ_FLUSH. And btree node
writes have... weird ordering requirements.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

e49c7c37

Jun 27, 2013

bcache: Send label uevents · ab9e1400

Gabriel de Perthuis authored Jun 09, 2013



Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>

ab9e1400

bcache: Send a uevent with a cached device's UUID · a25c32be
Gabriel de Perthuis authored Jun 07, 2013
```
Signed-off-by: Gabriel de Perthuis <g2p.code@gmail.com>
```
a25c32be

bcache: Write out full stripes · 72c27061

Kent Overstreet authored Jun 05, 2013



Now that we're tracking dirty data per stripe, we can add two
optimizations for raid5/6:

 * If a stripe is already dirty, force writes to that stripe to
   writeback mode - to help build up full stripes of dirty data

 * When flushing dirty data, preferentially write out full stripes first
   if there are any.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

72c27061

bcache: Track dirty data by stripe · 279afbad

Kent Overstreet authored Jun 05, 2013

To make background writeback aware of raid5/6 stripes, we first need to
track the amount of dirty data within each stripe - we do this by
breaking up the existing sectors_dirty into per stripe atomic_ts

Signed-off-by: Kent Overstreet <koverstreet@google.com>

279afbad

bcache: Initialize sectors_dirty when attaching · 444fc0b6

Kent Overstreet authored May 11, 2013



Previously, dirty_data wouldn't get initialized until the first garbage
collection... which was a bit of a problem for background writeback (as
the PD controller keys off of it) and also confusing for users.

This is also prep work for making background writeback aware of raid5/6
stripes.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

444fc0b6

bcache: Improve lazy sorting · 6ded34d1

Kent Overstreet authored May 11, 2013



The old lazy sorting code was kind of hacky - rewrite in a way that
mathematically makes more sense; the idea is that the size of the sets
of keys in a btree node should increase by a more or less fixed ratio
from smallest to biggest.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

6ded34d1

bcache: Rip out pkey()/pbtree() · 85b1492e

Kent Overstreet authored May 14, 2013



Old gcc doesnt like the struct hack, and it is kind of ugly. So finish
off the work to convert pr_debug() statements to tracepoints, and delete
pkey()/pbtree().

Signed-off-by: Kent Overstreet <koverstreet@google.com>

85b1492e

bcache: Fix/revamp tracepoints · c37511b8

Kent Overstreet authored Apr 26, 2013



The tracepoints were reworked to be more sensible, and fixed a null
pointer deref in one of the tracepoints.

Converted some of the pr_debug()s to tracepoints - this is partly a
performance optimization; it used to be that with DEBUG or
CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it
was changed to an empty inline function.

Some of the pr_debug() statements had rather expensive function calls as
part of the arguments, so this code was getting run unnecessarily even
on non debug kernels - in some fast paths, too.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

c37511b8

bcache: Refactor btree io · 57943511

Kent Overstreet authored Apr 25, 2013



The most significant change is that btree reads are now done
synchronously, instead of asynchronously and doing the post read stuff
from a workqueue.

This was originally done because we can't block on IO under
generic_make_request(). But - we already have a mechanism to punt cache
lookups to workqueue if needed, so if we just use that we don't have to
deal with the complexity of doing things asynchronously.

The main benefit is this makes the locking situation saner; we can hold
our write lock on the btree node until we're finished reading it, and we
don't need that btree_node_read_done() flag anymore.

Also, for writes, btree_write() was broken out into btree_node_write()
and btree_leaf_dirty() - the old code with the boolean argument was dumb
and confusing.

The prio_blocked mechanism was improved a bit too, now the only counter
is in struct btree_write, we don't mess with transfering a count from
struct btree anymore.

This required changing garbage collection to block prios at the start
and unblock when it finishes, which is cleaner than what it was doing
anyways (the old code had mostly the same effect, but was doing it in a
convoluted way)

And the btree iter btree_node_read_done() uses was converted to a real
mempool.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

57943511

bcache: Convert allocator thread to kthread · 119ba0f8

Kent Overstreet authored Apr 24, 2013



Using a workqueue when we just want a single thread is a bit silly.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

119ba0f8

bcache: Warn when a device is already registered. · a9dd53ad

Gabriel de Perthuis authored May 04, 2013



Signed-off-by: Gabriel de Perthuis <g2p.code+bcache@gmail.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>

a9dd53ad

bcache: fix a spurious gcc complaint, use scnprintf · bbc77aa7

Kent Overstreet authored May 28, 2013



An old version of gcc was complaining about using a const int as the
size of a stack allocated array. Which should be fine - but using
ARRAY_SIZE() is better, anyways.

Also, refactor the code to use scnprintf().

Signed-off-by: Kent Overstreet <koverstreet@google.com>

bbc77aa7

md: bcache: io.c: fix a potential NULL pointer dereference · 5c694129

Kumar Amit Mehta authored May 28, 2013



bio_alloc_bioset returns NULL on failure. This fix adds a missing check
for potential NULL pointer dereferencing.

Signed-off-by: Kumar Amit Mehta <gmate.amit@gmail.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>

5c694129

Jun 18, 2013

md: bcache: Fixed a typo with the word 'arithmetic' · 48a73025

Phil Viana authored Jun 03, 2013



The word 'arithmetic' was typed as 'arithmatic'

Signed-off-by: Phil Viana <phillip.l.viana@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

48a73025

May 15, 2013

bcache: Fix error handling in init code · f59fce84

Kent Overstreet authored May 15, 2013



This code appears to have rotted... fix various bugs and do some
refactoring.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

f59fce84

bcache: drop "select CLOSURES" · bbb1c3b5

Paul Bolle authored May 13, 2013



The Kconfig entry for BCACHE selects CLOSURES. But there's no Kconfig
symbol CLOSURES. That symbol was used in development versions of bcache,
but was removed when the closures code was no longer provided as a
kernel library. It can safely be dropped.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>

bbb1c3b5

bcache: Fix incompatible pointer type warning · 867e1162

Emil Goode authored May 09, 2013



The function pointer release in struct block_device_operations
should point to functions declared as void.

Sparse warnings:

drivers/md/bcache/super.c:656:27: warning:
	incorrect type in initializer (different base types)
	drivers/md/bcache/super.c:656:27:
	expected void ( *release )( ... )
	drivers/md/bcache/super.c:656:27:
	got int ( static [toplevel] *<noident> )( ... )

drivers/md/bcache/super.c:656:2: warning:
	initialization from incompatible pointer type [enabled by default]

drivers/md/bcache/super.c:656:2: warning:
	(near initialization for ‘bcache_ops.release’) [enabled by default]

Signed-off-by: Emil Goode <emilgoode@gmail.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>

867e1162

May 01, 2013

bcache: Use bd_link_disk_holder() · ee668506
Kent Overstreet authored Feb 01, 2013
```
Signed-off-by: Kent Overstreet <koverstreet@google.com>
```
ee668506

bcache: Allocator cleanup/fixes · 86b26b82

Kent Overstreet authored Apr 30, 2013



The main fix is that bch_allocator_thread() wasn't waiting on
garbage collection to finish (if invalidate_buckets had set
ca->invalidate_needs_gc); we need that to make sure the allocator
doesn't spin and potentially block gc from finishing.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

86b26b82

Apr 24, 2013

bcache: Make sure blocksize isn't smaller than device blocksize · 8abb2a5d

Kent Overstreet authored Apr 23, 2013



Sanity check to make sure we don't end up doing IO the device doesn't
support.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

8abb2a5d

Apr 22, 2013

bcache: Fix merge_bvec_fn usage for when it modifies the bvm · a09ded8e

Kent Overstreet authored Apr 22, 2013



Stacked md devices reuse the bvm for the subordinate device, causing
problems...

Reported-by: Michael Balser <michael.balser@profitbricks.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>

a09ded8e

Apr 21, 2013

bcache: Correctly check against BIO_MAX_PAGES · 1545f137

Kent Overstreet authored Apr 10, 2013

bch_bio_max_sectors() was checking against BIO_MAX_PAGES as if the limit
was for the total bytes in the bio, not the number of segments.

Signed-off-by: Kent Overstreet <koverstreet@google.com>

1545f137

bcache: Hack around stuff that clones up to bi_max_vecs · bca97ada
Kent Overstreet authored Apr 20, 2013
```
Signed-off-by: Kent Overstreet <koverstreet@google.com>
```
bca97ada
bcache: Set ra_pages based on backing device's ra_pages · 4f0fd955
Kent Overstreet authored Mar 27, 2013
```
Signed-off-by: Kent Overstreet <koverstreet@google.com>
```
4f0fd955

bcache: Take data offset from the bdev superblock. · 2903381f

Kent Overstreet authored Apr 11, 2013



Add a new superblock version, and consolidate related defines.

Signed-off-by: Gabriel de Perthuis <g2p.code+bcache@gmail.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>

2903381f

Apr 08, 2013

bcache: Disable broken btree fuzz tester · cef52797

Kent Overstreet authored Apr 05, 2013



Reported-by:  <sasha.levin@oracle.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>

cef52797

bcache: Fix a format string overflow · 91bbcfc3

Kent Overstreet authored Apr 05, 2013



Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>

91bbcfc3

bcache: Fix a minor memory leak on device teardown · 8ef74790

Kent Overstreet authored Apr 05, 2013



Reported-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>

8ef74790