Commits · b1c1b6a30eac88665a35a207cc5e6233090b9d65 · jan.koester / Linux

Aug 22, 2006

elv_unregister: fix possible crash on module unload · 2d8f6131

Oleg Nesterov authored Aug 22, 2006



An exiting task or process which didn't do I/O yet have no io context,
elv_unregister() should check it is not NULL.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2d8f6131

Aug 21, 2006

[PATCH] cfq_cic_link: fix usage of wrong cfq_io_context · be33c3a6

Oleg Nesterov authored Aug 21, 2006



Obviously, cfq_cic_link() shouldn't free a just allocated cfq_io_context?
The dead key is from __cic, so drop that.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@suse.de>

be33c3a6

[PATCH] Fix current_io_context() vs set_task_ioprio() race · 9f83e45e

Oleg Nesterov authored Aug 21, 2006



I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe.

current_io_context() initializes "struct io_context", then sets ->io_context.
set_task_ioprio() running on another cpu may see the changes out of order, so
->set_ioprio(ioc) may use io_context which was not initialized properly.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <axboe@suse.de>

9f83e45e

Jul 25, 2006

[PATCH] cfq-iosched: don't use a hard jiffies value, translate from msecs · 44eb1231

Jens Axboe authored Jul 25, 2006



The CIC_SEEKY() test really wants to use the minimum of either:

- 2 msecs (not jiffies)

- or, the pending slice time

So code it like that.

Signed-off-by: Jens Axboe <axboe@suse.de>

44eb1231

[PATCH] blktrace: fix read-ahead bit · ad01b1ca

Milton Miller authored Jul 25, 2006



It should be toggling the same bit on and off, fix it up.

Signed-off-by: Jens Axboe <axboe@suse.de>

ad01b1ca

Jul 15, 2006

[PATCH] lockdep: annotate the BLKPG_DEL_PARTITION ioctl · ddca60c5

Arjan van de Ven authored Jul 14, 2006



The delete partition IOCTL takes the bd_mutex for both the disk and the
partition; these have an obvious hierarchical relationship and this patch
annotates this relationship for lockdep.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ddca60c5

Jul 06, 2006

[PATCH] Only the first two bits in bio->bi_rw and rq->flags match · 1959d212

Jens Axboe authored Jul 06, 2006



Not three, as assumed. This causes the barrier bit to be needlessly set
for some IO.

Signed-off-by: Jens Axboe <axboe@suse.de>

1959d212

[PATCH] blktrace: readahead support · 40359ccb

Nathan Scott authored Jul 06, 2006



Provide the needed kernel support for distinguishing readahead
from regular read requests when tracing block devices.

Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Jens Axboe <axboe@suse.de>

40359ccb

Jul 03, 2006

[PATCH] lockdep: annotate on-stack completions · 60be6b9a

Ingo Molnar authored Jul 03, 2006



lockdep needs to have the waitqueue lock initialized for on-stack waitqueues
implicitly initialized by DECLARE_COMPLETION().  Annotate on-stack completions
accordingly.

Has no effect on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

60be6b9a

Jun 30, 2006

[PATCH] Light weight event counters · f8891e5e

Christoph Lameter authored Jun 30, 2006

The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2



Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f8891e5e

Remove obsolete #include <linux/config.h> · 6ab3d562

Jörn Engel authored Jun 30, 2006



Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

6ab3d562

Jun 28, 2006

[PATCH] cpu hotplug: use hotplug version of cpu notifier in appropriate places · 5a67e4c5

Chandra Seetharaman authored Jun 27, 2006



Make use the of newly defined hotplug version of cpu_notifier functionality
wherever appropriate.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5a67e4c5

[PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17 · 054cc8a2

Chandra Seetharaman authored Jun 27, 2006



This patch reverts notifier_block changes made in 2.6.17

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

054cc8a2

Jun 26, 2006

spelling fixes · d6e05edc

Andreas Mohr authored Jun 26, 2006



acquired (aquired)
contiguous (contigious)
successful (succesful, succesfull)
surprise (suprise)
whether (weather)
some other misspellings

Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

d6e05edc

Jun 23, 2006

[BLOCK] Fix bounce limit address check · 8269730b

Andi Kleen authored Jun 21, 2006



Do a safer check for when to enable DMA. Currently we enable ISA DMA
for cases that do not need it, resulting in OOM conditions when ZONE_DMA
runs out of space.

Signed-off-by: Jens Axboe <axboe@suse.de>

8269730b

[PATCH] rbtree: support functions used by the io schedulers · dd67d051

Jens Axboe authored Jun 21, 2006



They all duplicate macros to check for empty root and/or node, and
clearing a node. So put those in rbtree.h.

Signed-off-by: Jens Axboe <axboe@suse.de>

dd67d051

[PATCH] cfq-iosched: rq update fixes · fd61af03

Jens Axboe authored Jun 16, 2006



- Remember to set ->last_sector so that the cfq_choose_req() logic
  works correctly.

- Remove redundant call to cfq_choose_req()

Signed-off-by: Jens Axboe <axboe@suse.de>

fd61af03

[PATCH] cfq-iosched: many performance fixes · caaa5f9f

Jens Axboe authored Jun 16, 2006



This is a collection of patches that greatly improve CFQ performance
in some circumstances.

- Change the idling logic to only kick in after a request is done and we
  are deciding what to do. Before the idling included the request service
  time, so it was hard to adjust. Now it's true think/idle time.

- Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep
  it in control for sync and sequential (or close to) workloads.

- Expire queues immediately and move on to other busy queues, if we are
  not going to idle after the current one finishes.

- Don't rearm idle timer if there are no busy queues. Just leave the
  system idle.

Signed-off-by: Jens Axboe <axboe@suse.de>

caaa5f9f

[PATCH] cfq-iosched: correctly set ioprio on both targets · 35e6077c

Jens Axboe authored Jun 14, 2006



Patch originally from Vasily Tarasov <vtaras@sw.ru>

If you set io-priority of process 1 using sys_ioprio_set system call by
another process 2 (like ionice do), then cfq_init_prio_data() function
sets priority of process 2 (current) on queue of process 1 and clears
the flag, that designates change of ioprio.  So the process  1 will work
like with priority of process 2.

I propose not to call cfq_init_prio_data() on io-priority change, but
only mark queue as queue with changed prority.  Every time when new
request comes cfq-scheduler checks for this flag and atomaticaly changes
priority of queue to new value.

Signed-off-by: Jens Axboe <axboe@suse.de>

35e6077c

[PATCH] Make CFQ the default IO scheduler · b17fd9bc
Jens Axboe authored Jun 19, 2006
```
Signed-off-by: Jens Axboe <axboe@suse.de>
```
b17fd9bc

[PATCH] Kill PF_SYNCWRITE flag · b31dc66a

Jens Axboe authored Jun 13, 2006

A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.

Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.

Signed-off-by: Jens Axboe <axboe@suse.de>

b31dc66a

[PATCH] cfq-iosched: Don't set the queue batching limits · 271f18f1

Jens Axboe authored Jun 13, 2006



We cannot update them if the user changes nr_requests, so don't
set it in the first place. The gains are pretty questionable as
well. The batching loss has been shown to decrease throughput.

Signed-off-by: Jens Axboe <axboe@suse.de>

271f18f1

[PATCH] remove dead code from elevator switching · acf42175

Dave Jones authored Jun 12, 2006



We already drop the refcount in elevator_exit(), and as
we're setting 'e' to NULL, we'll never take that branch anyway.
Finally, as 'e' is a local var that isn't referenced afterwards,
setting it to NULL is pointless.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@suse.de>

acf42175

[PATCH] blk_start_queue() must be called with irq disabled - add warning · a038e253

Paolo 'Blaisorblade' Giarrusso authored Jun 05, 2006



The queue lock can be taken from interrupts so it must always be taken with
irq disabling primitives.  Some primitives already verify this.
blk_start_queue() is called under this lock, so interrupts must be
disabled.

Also document this requirement clearly in blk_init_queue(), where the queue
spinlock is set.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>

a038e253

[PATCH] iosched: use hlist for request hashtable · bae386f7

Akinobu Mita authored Apr 24, 2006

Use hlist instead of list_head for request hashtable in deadline-iosched
and as-iosched. It also can remove the flag to know hashed or unhashed.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Jens Axboe <axboe@suse.de>

block/as-iosched.c | 45 +++++++++++++++++++--------------------------
block/deadline-iosched.c | 39 ++++++++++++++++-----------------------
2 files changed, 35 insertions(+), 49 deletions(-)

bae386f7

[PATCH] list: use list_replace_init() instead of list_splice_init() · 626ab0e6

Oleg Nesterov authored Jun 23, 2006



list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1.  We can use list_replace_init() instead.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

626ab0e6

Jun 21, 2006

[PATCH] Driver core: add generic "subsystem" link to all devices · b9d9c82b

Kay Sievers authored Jun 15, 2006



Like the SUBSYTEM= key we find in the environment of the uevent, this
creates a generic "subsystem" link in sysfs for every device. Userspace
usually doesn't care at all if its a "class" or a "bus" device. This
provides an unified way to determine the subsytem of a device, regardless
of the way the driver core has created it.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b9d9c82b

Fix up CFQ scheduler for recent rbtree node shrinkage · 6b41fd17

Linus Torvalds authored Jun 20, 2006



The color is now in the low bits of the parent pointer, and initializing
it to 0 happens as part of the whole memset above, so just remove the
unnecessary RB_CLEAR_COLOR.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6b41fd17

Jun 14, 2006

[PATCH] cfq-iosched: fix crash in do_div() · 553698f9

Jens Axboe authored Jun 14, 2006



We don't clear the seek stat values in cfq_alloc_io_context(), and if
->seek_mean is unlucky enough to be set to -36 by chance, the first
invocation of cfq_update_io_seektime() will oops with a divide by zero
in do_div().

Just memset the entire cic instead of filling invididual values
independently.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

553698f9

Jun 08, 2006

[PATCH] elevator switching race · bc1c1169

Jens Axboe authored Jun 08, 2006



There's a race between shutting down one io scheduler and firing up the
next, in which a new io could enter and cause the io scheduler to be
invoked with bad or NULL data.

To fix this, we need to maintain the queue lock for a bit longer.
Unfortunately we cannot do that, since the elevator init requires to be
run without the lock held.  This isn't easily fixable, without also
changing the mempool API.  So split the initialization into two parts,
and alloc-init operation and an attach operation.  Then we can
preallocate the io scheduler and related structures, and run the attach
inside the lock after we detach the old one.

This patch has survived 30 minutes of 1 second io scheduler switching
with a very busy io load.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bc1c1169

Jun 01, 2006

[PATCH] cfq-iosched: busy_rr fairness fix · b52a8348