- Sep 15, 2009
-
-
Jens Axboe authored
kernel/built-in.o:(.data+0x17b0): undefined reference to `blk_iopoll_enabled' Since the extern declaration makes the compile work, but the actual symbol is missing when block/blk-iopoll.o isn't linked in. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Ming Lei authored
Placing dma-coherent.c in driver/base is better than in kernel, since it contains code to do per-device coherent dma memory handling. Signed-off-by:
Ming Lei <tom.leiming@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Xiao Guangrong authored
If we pass a big size data over perf_counter_open() syscall, the kernel will copy this data to a small buffer, it will cause kernel crash. This bug makes the kernel unsafe and non-root local user can trigger it. Signed-off-by:
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by:
Peter Zijlstra <peterz@infradead.org> Acked-by:
Paul Mackerras <paulus@samba.org> Cc: <stable@kernel.org> LKML-Reference: <4AAF37D4.5010706@cn.fujitsu.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Anirban Sinha authored
console_print() is an old legacy interface mostly unused in the entire kernel tree. It's best to clean up its existing use and let developers use their own implementation of it as they feel fit. Signed-off-by:
Anirban Sinha <asinha@zeugmasystems.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 14, 2009
-
-
David Howells authored
put_cred() will oops if given a NULL groups list, but that is now possible with the existence of cred_alloc_blank(), as used in keyctl_session_to_parent(). Added in commit: commit ee18d64c Author: David Howells <dhowells@redhat.com> Date: Wed Sep 2 09:14:21 2009 +0100 KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] Reported-by:
Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
James Morris <jmorris@namei.org>
-
Wu Fengguang authored
Fix the definition of BM_BITS_PER_BLOCK and kerneldoc description of create_bm_block_list(). [rjw: Added changelog.] Signed-off-by:
Wu Fengguang <fengguang.wu@intel.com> Signed-off-by:
Rafael J. Wysocki <rjw@sisk.pl>
-
Gerald Schaefer authored
Use for_each_populated_zone() instead of for_each_zone() in hibernation code. This fixes a bug on s390, where we allow both config options HIBERNATION and MEMORY_HOTPLUG, so that we also have a ZONE_MOVABLE here. We only allow hibernation if no memory hotplug operation was performed, so in fact both features can only be used exclusively, but this way we don't need 2 differently configured (distribution) kernels. If we have an unpopulated ZONE_MOVABLE, we allow hibernation but run into a BUG_ON() in memory_bm_test/set/clear_bit() because hibernation code iterates through all zones, not only the populated zones, in several places. For example, swsusp_free() does for_each_zone() and then checks for pfn_valid(), which is true even if the zone is not populated, resulting in a BUG_ON() later because the pfn cannot be found in the memory bitmap. Replacing all occurences of for_each_zone() in hibernation code with for_each_populated_zone() would fix this issue. [rjw: Rebased on top of linux-next hibernation patches.] Signed-off-by:
Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by:
Rafael J. Wysocki <rjw@sisk.pl>
-
Rafael J. Wysocki authored
We want to avoid attempting to free too much memory too hard during hibernation, so estimate the minimum size of the image to use as the lower limit for preallocating memory. The approach here is based on the (experimental) observation that we can't free more page frames than the sum of: * global_page_state(NR_SLAB_RECLAIMABLE) * global_page_state(NR_ACTIVE_ANON) * global_page_state(NR_INACTIVE_ANON) * global_page_state(NR_ACTIVE_FILE) * global_page_state(NR_INACTIVE_FILE) minus * global_page_state(NR_FILE_MAPPED) Namely, if this number is subtracted from the number of saveable pages in the system, we get a good estimate of the minimum reasonable size of a hibernation image. Signed-off-by:
Rafael J. Wysocki <rjw@sisk.pl> Acked-by:
Wu Fengguang <fengguang.wu@intel.com>
-
Rafael J. Wysocki authored
Since the hibernation code is now going to use allocations of memory to make enough room for the image, it can also use the page frames allocated at this stage as image page frames. The low-level hibernation code needs to be rearranged for this purpose, but it allows us to avoid freeing a great number of pages and allocating these same pages once again later, so it generally is worth doing. [rev. 2: Take highmem into account correctly.] Signed-off-by:
Rafael J. Wysocki <rjw@sisk.pl>
-
Rafael J. Wysocki authored
Rework swsusp_shrink_memory() so that it calls shrink_all_memory() just once to make some room for the image and then allocates memory to apply more pressure to the memory management subsystem, if necessary. Unfortunately, we don't seem to be able to drop shrink_all_memory() entirely just yet, because that would lead to huge performance regressions in some test cases. Signed-off-by:
Rafael J. Wysocki <rjw@sisk.pl> Acked-by:
Pavel Machek <pavel@ucw.cz>
-
Thadeu Lima de Souza Cascardo authored
Although the same label name is used somewhere else in the file, this particular label was consistently typoed in all of its uses. Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by:
Rafael J. Wysocki <rjw@sisk.pl>
-
- Sep 11, 2009
-
-
Jens Axboe authored
This borrows some code from NAPI and implements a polled completion mode for block devices. The idea is the same as NAPI - instead of doing the command completion when the irq occurs, schedule a dedicated softirq in the hopes that we will complete more IO when the iopoll handler is invoked. Devices have a budget of commands assigned, and will stay in polled mode as long as they continue to consume their budget from the iopoll softirq handler. If they do not, the device is set back to interrupt completion mode. This patch holds the core bits for blk-iopoll, device driver support sold separately. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
Jens Axboe authored
This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by:
Jens Axboe <jens.axboe@oracle.com>
-
- Sep 10, 2009
-
-
Ingo Molnar authored
This weird perf trace output: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] Is caused by setting one component field of the delta to zero a bit too early. Move it to later. ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug, it's just a reporting bug in essence. ) Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nikos Chantziaras <realnc@arcor.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Ingo Molnar authored
Nikos Chantziaras and Jens Axboe reported that turning off NEW_FAIR_SLEEPERS improves desktop interactivity visibly. Nikos described his experiences the following way: " With this setting, I can do "nice -n 19 make -j20" and still have a very smooth desktop and watch a movie at the same time. Various other annoyances (like the "logout/shutdown/restart" dialog of KDE not appearing at all until the background fade-out effect has finished) are also gone. So this seems to be the single most important setting that vastly improves desktop behavior, at least here. " Jens described it the following way, referring to a 10-seconds xmodmap scheduling delay he was trying to debug: " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then I get: Performance counter stats for 'xmodmap .xmodmap-carl': 9.009137 task-clock-msecs # 0.447 CPUs 18 context-switches # 0.002 M/sec 1 CPU-migrations # 0.000 M/sec 315 page-faults # 0.035 M/sec 0.020167093 seconds time elapsed Woot! " So disable it for now. In perf trace output i can see weird delta timestamps: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] That nsec field is not supposed to be that large. More digging is needed - but lets turn it off while the real bug is found. Reported-by:
Nikos Chantziaras <realnc@arcor.de> Tested-by:
Nikos Chantziaras <realnc@arcor.de> Reported-by:
Jens Axboe <jens.axboe@oracle.com> Tested-by:
Jens Axboe <jens.axboe@oracle.com> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
- Sep 09, 2009
-
-
Mike Galbraith authored
Removes kthread/workqueue priority boost, they increase worst-case desktop latencies. Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Mike Galbraith authored
Reduce the latency target from 20 msecs to 5 msecs. Why? Larger latencies increase spread, which is good for scaling, but bad for worst case latency. We still have the ilog(nr_cpus) rule to scale up on bigger server boxes. Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Mike Galbraith authored
Set child_runs_first default to off. It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs get preempted by child tasks, reducing parallelism. Note, this patch might make existing races in user applications more prominent than before - so breakages might be bisected to this commit. Child-runs-first is broken on SMP to begin with, and we already had it off briefly in v2.6.23 so most of the offenders ought to be fixed. Would be nice not to revert this commit but fix those apps finally ... Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case people want to work around broken apps. ] Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
- Sep 08, 2009
-
-
Mike Galbraith authored
A fork/exec load is usually "pass the baton", so the child should never be placed behind the parent. With START_DEBIT we make room for the new task, but with child_runs_first, that room comes out of the _parent's_ hide. There's nothing to say that the parent wasn't ahead of min_vruntime at fork() time, which means that the "baton carrier", who is essentially the parent in drag, can gain time and increase scheduling latencies for waiters. With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first enabled, we essentially pass the sleeper fairness off to the child, which is fine, but if we don't base placement on the parent's updated vruntime, we can end up compounding latency woes if the child itself then does fork/exec. The debit incurred at fork doesn't hurt the parent who is then going to sleep and maybe exit, but the child who acquires the error harms all comers. This improves latencies of make -j<n> kernel build workloads. Reported-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
Mike Galbraith <efault@gmx.de> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
- Sep 07, 2009
-
-
Peter Zijlstra authored
wake_affine() would always fail under low-load situations where both prev and this were idle, because adding a single task will always be a significant imbalance, even if there's nothing around that could balance it. Deal with this by allowing imbalance when there's nothing you can do about it. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Peter Zijlstra authored
select_task_rq_fair() incorrectly skips the wake_affine() logic, remove this. When prev_cpu == this_cpu, the code jumps straight to the wake_idle() logic, this doesn't give the wake_affine() logic the chance to pin the task to this cpu. Signed-off-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by:
Ingo Molnar <mingo@elte.hu>
-
Jaswinder Singh Rajput authored
fix the following 'make includecheck' warning: kernel/sysctl.c: linux/security.h is included more than once. Signed-off-by:
Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by:
James Morris <jmorris@namei.org>
-
- Sep 04, 2009
-
-
Steven Rostedt authored
Since the ability to swap the cpu buffers adds a small overhead to the recording of a trace, we only want to add it when needed. Only the irqsoff and preemptoff tracers use this feature, and both are not recommended for production kernels. This patch disables its use when neither irqsoff nor preemptoff is configured. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
Because the irqsoff tracer can swap an internal CPU buffer, it is possible that a swap happens between the start of the write and before the committing bit is set (the committing bit will disable swapping). This patch adds a check for this and will fail the write if it detects it. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
The irqsoff tracer will fail to swap the cpu buffer with the max buffer if it preempts a commit. Instead of ignoring this, this patch makes the tracer report it if the last max latency failed due to preempting a current commit. The output of the latency tracer will look like this: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 2.6.31-rc5 # -------------------------------------------------------------------- # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: save_args # => ended at: __do_softirq # # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / # ||||| delay # cmd pid ||||| time | caller # \ / ||||| \ | / bash-4281 1d.s6 265us : update_max_tr_single: Failed to swap buffers due to commit in progress Note the latency time and the functions that disabled the irqs or preemption will still be listed. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
This patch adds a trace_array_printk to allow a tracer to use the trace_printk on its own trace array. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
The latency tracers (irqsoff and wakeup) can swap trace buffers on the fly. If an event is happening and has reserved data on one of the buffers, and the latency tracer swaps the global buffer with the max buffer, the result is that the event may commit the data to the wrong buffer. This patch changes the API to the trace recording to be recieve the buffer that was used to reserve a commit. Then this buffer can be passed in to the commit. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
Reseting the trace buffer without first disabling the buffer and waiting for any writers to complete, can corrupt the ring buffer. This patch makes the external version of tracing_reset safe from corruption by disabling the ring buffer and calling synchronize_sched. This version can no longer be called from interrupt context. But all those callers have been removed. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
Currently the latency tracers reset the ring buffer. Unfortunately if a commit is in process (due to a trace event), this can corrupt the ring buffer. When this happens, the ring buffer will detect the corruption and then permanently disable the ring buffer. The bug does not crash the system, but it does prevent further tracing after the bug is hit. Instead of reseting the trace buffers, the timestamp of the start of the trace is used instead. The buffers will still contain the previous data, but the output will not count any data that is before the timestamp of the trace. Note, this only affects the static trace output (trace) and not the runtime trace output (trace_pipe). The runtime trace output does not make sense for the latency tracers anyway. Reported-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Li Zefan authored
The predicates of an event and their filter structure are allocated when we create an event filter for the first time. These objects must be created once but each time we come with a new filter, we overwrite such pre-existing allocation, if any. Thus, this patch checks if the filter has already been allocated before going ahead. Spotted-by:
Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by:
Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <4A9CB1BA.3060402@cn.fujitsu.com> Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com>
-
Steven Rostedt authored
The function tracing_reset is deprecated for outside use of trace.c. The new function to reset the the buffers is tracing_reset_online_cpus. The reason for this is that resetting the buffers while the event trace points are active can corrupt the buffers, because they may be writing at the time of reset. The tracing_reset_online_cpus disables writes and waits for current writers to finish. This patch replaces all users of tracing_reset except for the latency tracers. Those changes require more work and will be removed in the following patches. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
Resetting the ring buffers while traces are happening can corrupt the ring buffer and disable it (no kernel crash to worry about). The safest thing to do is disable the ring buffers, call synchronize_sched() to wait for all current writers to finish and then reset the buffer. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
When reading the tracer from the trace file, updating the max latency may corrupt the output. This patch disables the tracing of the max latency while reading the trace file. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
During development of the tracer, we would copy information from the live tracer to the max tracer with one memcpy. Since then we added a generic ring buffer and we handle the copies differently now. Unfortunately, we never copied the critical section information, and we lost the output: # => started at: kmem_cache_alloc # => ended at: kmem_cache_alloc This patch adds back the critical start and end copying as well as removes the unused "trace_idx" and "overrun" fields of the trace_array_cpu structure. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
Currently the way RB_WARN_ON works, is to disable either the current CPU buffer or all CPU buffers, depending on whether a ring_buffer or ring_buffer_per_cpu struct was passed into the macro. Most users of the RB_WARN_ON pass in the CPU buffer, so only the one CPU buffer gets disabled but the rest are still active. This may confuse users even though a warning is sent to the console. This patch changes the macro to disable the entire buffer even if the CPU buffer is passed in. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
The latency tracers report the number of items in the trace buffer. This uses the ring buffer data to calculate this. Because discarded events are also counted, the numbers do not match the number of items that are printed. The ring buffer also adds a "padding" item to the end of each buffer page which also gets counted as a discarded item. This patch decrements the counter to the page entries on a discard. This allows us to ignore discarded entries while reading the buffer. Decrementing the counter is still safe since it can only happen while the committing flag is still set. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
The function ring_buffer_event_discard can be used on any item in the ring buffer, even after the item was committed. This function provides no safety nets and is very race prone. An item may be safely removed from the ring buffer before it is committed with the ring_buffer_discard_commit. Since there are currently no users of this function, and because this function is racey and error prone, this patch removes it altogether. Note, removing this function also allows the counters to ignore all discarded events (patches will follow). Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
When the ring buffer uses an iterator (static read mode, not on the fly reading), when it crosses a page boundery, it will skip the first entry on the next page. The reason is that the last entry of a page is usually padding if the page is not full. The padding will not be returned to the user. The problem arises on ring_buffer_read because it also increments the iterator. Because both the read and peek use the same rb_iter_peek, the rb_iter_peak will return the padding but also increment to the next item. This is because the ring_buffer_peek will not incerment it itself. The ring_buffer_read will increment it again and then call rb_iter_peek again to get the next item. But that will be the second item, not the first one on the page. The reason this never showed up before, is because the ftrace utility always calls ring_buffer_peek first and only uses ring_buffer_read to increment to the next item. The ring_buffer_peek will always keep the pointer to a valid item and not padding. This just hid the bug. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
The loops in the ring buffer that use cpu_relax are not dependent on other CPUs. They simply came across some padding in the ring buffer and are skipping over them. It is a normal loop and does not require a cpu_relax. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
Steven Rostedt authored
If a commit is taking place on a CPU ring buffer, do not allow it to be swapped. Return -EBUSY when this is detected instead. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-