- Mar 29, 2012
-
-
Daniel Lezcano authored
In the case of a child pid namespace, rebooting the system does not really makes sense. When the pid namespace is used in conjunction with the other namespaces in order to create a linux container, the reboot syscall leads to some problems. A container can reboot the host. That can be fixed by dropping the sys_reboot capability but we are unable to correctly to poweroff/ halt/reboot a container and the container stays stuck at the shutdown time with the container's init process waiting indefinitively. After several attempts, no solution from userspace was found to reliabily handle the shutdown from a container. This patch propose to make the init process of the child pid namespace to exit with a signal status set to : SIGINT if the child pid namespace called "halt/poweroff" and SIGHUP if the child pid namespace called "reboot". When the reboot syscall is called and we are not in the initial pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART", and "RESTART2". Otherwise we return EINVAL. Returning EINVAL is also an easy way to check if this feature is supported by the kernel when invoking another 'reboot' option like CAD. By this way the parent process of the child pid namespace knows if it rebooted or not and can take the right decision. Test case: ========== #include <alloca.h> #include <stdio.h> #include <sched.h> #include <unistd.h> #include <signal.h> #include <sys/reboot.h> #include <sys/types.h> #include <sys/wait.h> #include <linux/reboot.h> static int do_reboot(void *arg) { int *cmd = arg; if (reboot(*cmd)) printf("failed to reboot(%d): %m\n", *cmd); } int test_reboot(int cmd, int sig) { long stack_size = 4096; void *stack = alloca(stack_size) + stack_size; int status; pid_t ret; ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd); if (ret < 0) { printf("failed to clone: %m\n"); return -1; } if (wait(&status) < 0) { printf("unexpected wait error: %m\n"); return -1; } if (!WIFSIGNALED(status)) { printf("child process exited but was not signaled\n"); return -1; } if (WTERMSIG(status) != sig) { printf("signal termination is not the one expected\n"); return -1; } return 0; } int main(int argc, char *argv[]) { int status; status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1); if (status >= 0) { printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n"); return 1; } printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n"); return 0; } [akpm@linux-foundation.org: tweak and add comments] [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by:
Daniel Lezcano <daniel.lezcano@free.fr> Acked-by:
Serge Hallyn <serge.hallyn@canonical.com> Tested-by:
Serge Hallyn <serge.hallyn@canonical.com> Reviewed-by:
Oleg Nesterov <oleg@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Akinobu Mita authored
Use bitmap_set() instead of using set_bit() for each bit. This conversion is valid because the bitmap is private in the function call and atomic bitops were unnecessary. This also includes minor change. - Use bitmap_copy() for shorter typing Signed-off-by:
Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Zhenzhong Duan authored
When using crashkernel=2M-256M, the kernel doesn't give any warning. This is misleading sometimes. Signed-off-by:
Zhenzhong Duan <zhenzhong.duan@oracle.com> Acked-by:
"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Will Deacon authored
nommu platforms don't have very interesting swapper_pg_dir pointers and usually just #define them to NULL, meaning that we can't include them in the vmcoreinfo on the kexec crash path. This patch only saves the swapper_pg_dir if we have an MMU. Signed-off-by:
Will Deacon <will.deacon@arm.com> Reviewed-by:
Simon Horman <horms@verge.net.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Gilad Ben-Yossef authored
Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and calculates the cpumask of cpus to IPI by calling a function supplied as a parameter in order to determine whether to IPI each specific cpu. The function works around allocation failure of cpumask variable in CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time via smp_call_function_single(). The function is useful since it allows to seperate the specific code that decided in each case whether to IPI a specific cpu for a specific request from the common boilerplate code of handling creating the mask, handling failures etc. [akpm@linux-foundation.org: s/gfpflags/gfp_flags/] [akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func'] [akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment] Signed-off-by:
Gilad Ben-Yossef <gilad@benyossef.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Avi Kivity <avi@redhat.com> Acked-by:
Michal Nazarewicz <mina86@mina86.org> Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com> Cc: Milton Miller <miltonm@bga.com> Reviewed-by:
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Gilad Ben-Yossef authored
We have lots of infrastructure in place to partition multi-core systems such that we have a group of CPUs that are dedicated to specific task: cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter. Still, kernel code will at times interrupt all CPUs in the system via IPIs for various needs. These IPIs are useful and cannot be avoided altogether, but in certain cases it is possible to interrupt only specific CPUs that have useful work to do and not the entire system. This patch set, inspired by discussions with Peter Zijlstra and Frederic Weisbecker when testing the nohz task patch set, is a first stab at trying to explore doing this by locating the places where such global IPI calls are being made and turning the global IPI into an IPI for a specific group of CPUs. The purpose of the patch set is to get feedback if this is the right way to go for dealing with this issue and indeed, if the issue is even worth dealing with at all. Based on the feedback from this patch set I plan to offer further patches that address similar issue in other code paths. This patch creates an on_each_cpu_mask() and on_each_cpu_cond() infrastructure API (the former derived from existing arch specific versions in Tile and Arm) and uses them to turn several global IPI invocation to per CPU group invocations. Core kernel: on_each_cpu_mask() calls a function on processors specified by cpumask, which may or may not include the local processor. You must not call this function with disabled interrupts or from a hardware interrupt handler or from a bottom half handler. arch/arm: Note that the generic version is a little different then the Arm one: 1. It has the mask as first parameter 2. It calls the function on the calling CPU with interrupts disabled, but this should be OK since the function is called on the other CPUs with interrupts disabled anyway. arch/tile: The API is the same as the tile private one, but the generic version also calls the function on the with interrupts disabled in UP case This is OK since the function is called on the other CPUs with interrupts disabled. Signed-off-by:
Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by:
Christoph Lameter <cl@linux.com> Acked-by:
Chris Metcalf <cmetcalf@tilera.com> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Avi Kivity <avi@redhat.com> Acked-by:
Michal Nazarewicz <mina86@mina86.org> Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com> Cc: Milton Miller <miltonm@bga.com> Cc: Russell King <linux@arm.linux.org.uk> Acked-by:
Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 28, 2012
-
-
David Howells authored
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
asm/system.h is a cause of circular dependency problems because it contains commonly used primitive stuff like barrier definitions and uncommonly used stuff like switch_to() that might require MMU definitions. asm/system.h has been disintegrated by this point on all arches into the following common segments: (1) asm/barrier.h Moved memory barrier definitions here. (2) asm/cmpxchg.h Moved xchg() and cmpxchg() here. #included in asm/atomic.h. (3) asm/bug.h Moved die() and similar here. (4) asm/exec.h Moved arch_align_stack() here. (5) asm/elf.h Moved AT_VECTOR_SIZE_ARCH here. (6) asm/switch_to.h Moved switch_to() here. Signed-off-by:
David Howells <dhowells@redhat.com>
-
David Howells authored
Disintegrate asm/system.h for Sparc. Signed-off-by:
David Howells <dhowells@redhat.com> cc: sparclinux@vger.kernel.org
-
- Mar 26, 2012
-
-
Sasha Levin authored
Module size was limited to 64MB, this was legacy limitation due to vmalloc() which was removed a while ago. Limiting module size to 64MB is both pointless and affects real world use cases. Cc: Tim Abbott <tim.abbott@oracle.com> Cc: stable@vger.kernel.org Signed-off-by:
Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
Steven Rostedt authored
With the preempt, tracepoint and everything, it's getting a bit chubby. For an Ubuntu-based config: Before: $ size -t `find * -name '*.ko'` | grep TOTAL 56199906 3870760 1606616 61677282 3ad1ee2 (TOTALS) $ size vmlinux text data bss dec hex filename 8509342 850368 3358720 12718430 c2115e vmlinux After: $ size -t `find * -name '*.ko'` | grep TOTAL 56183760 3867892 1606616 61658268 3acd49c (TOTALS) $ size vmlinux text data bss dec hex filename 8501842 849088 3358720 12709650 c1ef12 vmlinux Signed-off-by:
Steven Rostedt <rostedt@goodmis.org> Acked-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (made all out-of-line)
-
Pawel Moll authored
This patch adds a set of macros that can be used to declare kernel parameters to be parsed _before_ initcalls at a chosen level are executed. We rename the now-unused "flags" field of struct kernel_param as the level. It's signed, for when we use this for early params as well, in future. Linker macro collating init calls had to be modified in order to add additional symbols between levels that are later used by the init code to split the calls into blocks. Signed-off-by:
Pawel Moll <pawel.moll@arm.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
Rusty Russell authored
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. This eliminates that code (though leaves the flags field in the struct, for impending use). Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
Dave Young authored
Sometimes we need to test a kernel of same version with code or config option changes. We already have sysctl to disable module load, but add a kernel parameter will be more convenient. Since modules_disabled is int, so here use bint type in core_param. TODO: make sysctl accept bool and change modules_disabled to bool Signed-off-by:
Dave Young <dyoung@redhat.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
- Mar 23, 2012
-
-
Oleg Nesterov authored
As Tetsuo Handa pointed out, request_module() can stress the system while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE. The task T uses "almost all" memory, then it does something which triggers request_module(). Say, it can simply call sys_socket(). This in turn needs more memory and leads to OOM. oom-killer correctly chooses T and kills it, but this can't help because it sleeps in TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the TIF_MEMDIE task T. Make __request_module() killable. The only necessary change is that call_modprobe() should kmalloc argv and module_name, they can't live in the stack if we use UMH_KILLABLE. This memory is freed via call_usermodehelper_freeinfo()->cleanup. Reported-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
No functional changes. Move the call_usermodehelper code from __request_module() into the new simple helper, call_modprobe(). Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Minor cleanup. ____call_usermodehelper() can simply return, no need to call do_exit() explicitely. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
No functional changes. It is not sane to use UMH_KILLABLE with enum umh_wait, but obviously we do not want another argument in call_usermodehelper_* helpers. Kill this enum, use the plain int. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC. The caller must ensure that subprocess_info->path/etc can not go away until call_usermodehelper_freeinfo(). call_usermodehelper_exec(UMH_KILLABLE) does wait_for_completion_killable. If it fails, it uses xchg(&sub_info->complete, NULL) to serialize with umh_complete() which does the same xhcg() to access sub_info->complete. If call_usermodehelper_exec wins, it can safely return. umh_complete() should get NULL and call call_usermodehelper_freeinfo(). Otherwise we know that umh_complete() was already called, in this case call_usermodehelper_exec() falls back to wait_for_completion() which should succeed "very soon". Note: UMH_NO_WAIT == -1 but it obviously should not be used with UMH_KILLABLE. We delay the neccessary cleanup to simplify the back porting. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Preparation. Add the new trivial helper, umh_complete(). Currently it simply does complete(sub_info->complete). Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Change zap_pid_ns_processes() to use SEND_SIG_FORCED, it looks more clear compared to SEND_SIG_NOINFO which relies on from_ancestor_ns logic send_signal(). It is also more efficient if we need to kill a lot of tasks because it doesn't alloc sigqueue. While at it, add the __fatal_signal_pending(task) check as a minor optimization. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Cosmetic, rename the from_ancestor_ns argument in prepare_signal() paths. After the previous change it doesn't match the reality. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
force_sig_info() and friends have the special semantics for synchronous signals, this interface should not be used if the target is not current. And it needs the fixes, in particular the clearing of SIGNAL_UNKILLABLE is not exactly right. However there are callers which have to use force_ exactly because it clears SIGNAL_UNKILLABLE and thus it can kill the CLONE_NEWPID tasks, although this is almost always is wrong by various reasons. With this patch SEND_SIG_FORCED ignores SIGNAL_UNKILLABLE, like we do if the signal comes from the ancestor namespace. This makes the naming in prepare_signal() paths insane, fixed by the next cleanup. Note: this only affects SIGKILL/SIGSTOP, but this is enough for force_sig() abusers. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Denys Vlasenko authored
PTRACE_SEIZE code is tested and ready for production use, remove the code which requires special bit in data argument to make PTRACE_SEIZE work. Strace team prepares for a new release of strace, and we would like to ship the code which uses PTRACE_SEIZE, preferably after this change goes into released kernel. Signed-off-by:
Denys Vlasenko <vda.linux@googlemail.com> Acked-by:
Tejun Heo <tj@kernel.org> Acked-by:
Oleg Nesterov <oleg@redhat.com> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Denys Vlasenko authored
This can be used to close a few corner cases in strace where we get unwanted racy behavior after attach, but before we have a chance to set options (the notorious post-execve SIGTRAP comes to mind), and removes the need to track "did we set opts for this task" state in strace internals. While we are at it: Make it possible to extend SEIZE in the future with more functionality by passing non-zero 'addr' parameter. To that end, error out if 'addr' is non-zero. PTRACE_ATTACH did not (and still does not) have such check, and users (strace) do pass garbage there... let's avoid repeating this mistake with SEIZE. Set all task->ptrace bits in one operation - before this change, we were adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops. This was probably ok (not a bug), but let's be on a safer side. Changes since v2: use (unsigned long) casts instead of (long) ones, move PTRACE_SEIZE_DEVEL-related code to separate lines of code. Signed-off-by:
Denys Vlasenko <vda.linux@googlemail.com> Acked-by:
Tejun Heo <tj@kernel.org> Cc: Pedro Alves <palves@redhat.com> Reviewed-by:
Oleg Nesterov <oleg@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Denys Vlasenko authored
Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes PT_option bits contiguous and therefore makes code in ptrace_setoptions() much simpler. Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event) instead of using explicit numeric constants, to ensure we don't mess up relationship between bit positions and event ids. PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with value of PT_EVENT_FLAG_SHIFT-1 is easier to use. PT_TRACE_MASK constant is nuked, the only its use is replaced by (PTRACE_O_MASK << PT_OPT_FLAG_SHIFT). Signed-off-by:
Denys Vlasenko <vda.linux@googlemail.com> Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Oleg Nesterov <oleg@redhat.com> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Denys Vlasenko authored
On ptrace(PTRACE_SETOPTIONS, pid, 0, <opts>), we used to set those option bits which are known, and then fail with -EINVAL if there are some unknown bits in <opts>. This is inconsistent with typical error handling, which does not change any state if input is invalid. This patch changes PTRACE_SETOPTIONS behavior so that in this case, we return -EINVAL and don't change any bits in task->ptrace. It's very unlikely that there is userspace code in the wild which will be affected by this change: it should have the form ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_BOGUSOPT) where PTRACE_O_BOGUSOPT is a constant unknown to the kernel. But kernel headers, naturally, don't contain any PTRACE_O_BOGUSOPTs, thus the only way userspace can use one if it defines one itself. I can't see why anyone would do such a thing deliberately. Signed-off-by:
Denys Vlasenko <vda.linux@googlemail.com> Acked-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
Oleg Nesterov <oleg@redhat.com> Cc: Pedro Alves <palves@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
Revelation from Peter. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@tglx.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
It fixes some 80-col wordwrappings and adds some consistency. Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
If the system is loaded while hotplugging a CPU we might end up with a bogus hardlockup detection. This has been seen during LTP pounder test executed in parallel with hotplug test. The main problem is that enable_watchdog (called when CPU is brought up) registers perf event which periodically checks per-cpu counter (hrtimer_interrupts), updated from a hrtimer callback, but the hrtimer is fired from the kernel thread. This means that while we already do check for the hard lockup the kernel thread might be sitting on the runqueue with zillions of tasks so there is nobody to update the value we rely on and so we KABOOM. Let's fix this by boosting the watchdog thread priority before we wake it up rather than when it's already running. This still doesn't handle a case where we have the same amount of high prio FIFO tasks but that doesn't seem to be common. The current implementation doesn't handle that case anyway so this is not worse at least. Unfortunately, we cannot start perf counter from the watchdog thread because we could miss a real lock up and also we cannot start the hrtimer watchdog_enable because we there is no way (at least I don't know any) to start a hrtimer from a different CPU. [dzickus@redhat.com: fix compile issue with param] Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by:
Mandeep Singh Baines <msb@chromium.org> Signed-off-by:
Michal Hocko <mhocko@suse.cz> Signed-off-by:
Don Zickus <dzickus@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Denys Vlasenko authored
I just received another user's pleas for help when their init mysteriously died. I again explained that they need to check whether it died because of bad instruction, a segv, or something else. Which was an annoying detour into writing a trivial C program to spawn his init and print its exit code: http://lists.busybox.net/pipermail/busybox/2012-January/077172.html I hear you saying "just test it under /bin/sh". Well, the crashing init _was_ /bin/sh. Which prompted me to make kernel do this first step automatically. We can print exit code, which makes it possible to see that death was from e.g. SIGILL without writing test programs. [akpm@linux-foundation.org: add 0x to hex number output] Signed-off-by:
Denys Vlasenko <vda.linux@googlemail.com> Acked-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Lennart Poettering authored
Userspace service managers/supervisors need to track their started services. Many services daemonize by double-forking and get implicitly re-parented to PID 1. The service manager will no longer be able to receive the SIGCHLD signals for them, and is no longer in charge of reaping the children with wait(). All information about the children is lost at the moment PID 1 cleans up the re-parented processes. With this prctl, a service manager process can mark itself as a sort of 'sub-init', able to stay as the parent for all orphaned processes created by the started services. All SIGCHLD signals will be delivered to the service manager. Receiving SIGCHLD and doing wait() is in cases of a service-manager much preferred over any possible asynchronous notification about specific PIDs, because the service manager has full access to the child process data in /proc and the PID can not be re-used until the wait(), the service-manager itself is in charge of, has happened. As a side effect, the relevant parent PID information does not get lost by a double-fork, which results in a more elaborate process tree and 'ps' output: before: # ps afx 253 ? Ss 0:00 /bin/dbus-daemon --system --nofork 294 ? Sl 0:00 /usr/libexec/polkit-1/polkitd 328 ? S 0:00 /usr/sbin/modem-manager 608 ? Sl 0:00 /usr/libexec/colord 658 ? Sl 0:00 /usr/libexec/upowerd 819 ? Sl 0:00 /usr/libexec/imsettings-daemon 916 ? Sl 0:00 /usr/libexec/udisks-daemon 917 ? S 0:00 \_ udisks-daemon: not polling any devices after: # ps afx 294 ? Ss 0:00 /bin/dbus-daemon --system --nofork 426 ? Sl 0:00 \_ /usr/libexec/polkit-1/polkitd 449 ? S 0:00 \_ /usr/sbin/modem-manager 635 ? Sl 0:00 \_ /usr/libexec/colord 705 ? Sl 0:00 \_ /usr/libexec/upowerd 959 ? Sl 0:00 \_ /usr/libexec/udisks-daemon 960 ? S 0:00 | \_ udisks-daemon: not polling any devices 977 ? Sl 0:00 \_ /usr/libexec/packagekitd This prctl is orthogonal to PID namespaces. PID namespaces are isolated from each other, while a service management process usually requires the services to live in the same namespace, to be able to talk to each other. Users of this will be the systemd per-user instance, which provides init-like functionality for the user's login session and D-Bus, which activates bus services on-demand. Both need init-like capabilities to be able to properly keep track of the services they start. Many thanks to Oleg for several rounds of review and insights. [akpm@linux-foundation.org: fix comment layout and spelling] [akpm@linux-foundation.org: add lengthy code comment from Oleg] Reviewed-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Lennart Poettering <lennart@poettering.net> Signed-off-by:
Kay Sievers <kay.sievers@vrfy.org> Acked-by:
Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 22, 2012
-
-
Jason Wessel authored
On x86, if CONFIG_DEBUG_RODATA is set, one cannot set breakpoints via KDB. Apparently this is a well-known problem, as at least one distribution now ships with both KDB enabled and CONFIG_DEBUG_RODATA=y for security reasons. This patch adds an printk message to the breakpoint failure case, in order to provide suggestions about how to use the debugger. Reported-by:
Tim Bird <tim.bird@am.sony.com> Signed-off-by:
Jason Wessel <jason.wessel@windriver.com> Acked-by:
Tim Bird <tim.bird@am.sony.com>
-
Tim Bird authored
This fixes a bug with setting a breakpoint during kdb initialization (from kdb_cmds). Any call to kdb_printf() before the initialization of the kgdboc serial console driver (which happens much later during bootup than kdb_init), results in kernel panic due to the use of dbg_io_ops before it is initialized. Signed-off-by:
Tim Bird <tim.bird@am.sony.com> Signed-off-by:
Jason Wessel <jason.wessel@windriver.com>
-
Jason Wessel authored
Sometimes it is desirable to stop the kernel debugger before allowing a system to reboot either with kdb or kgdb. This patch adds the ability to turn the reboot notifier on and off or enter the debugger and stop kernel execution before rebooting. It is possible to change the setting after booting the kernel with the following: echo 1 > /sys/module/debug_core/parameters/kgdbreboot It is also possible to change this setting using kdb / kgdb to manipulate the variable directly. Using KDB: mm kgdbreboot 1 Using gdb: set kgdbreboot=1 Reported-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Jason Wessel <jason.wessel@windriver.com>
-
Andrei Warkentin authored
This fixes the following problems: 1) Typematic-repeat of 'enter' gives warning message and leaks make/break if KDB exits. Repeats look something like 0x1c 0x1c .... 0x9c 2) Use of 'keypad enter' gives warning message and leaks the ENTER break/make code out if KDB exits. KP ENTER repeats look someting like 0xe0 0x1c 0xe0 0x1c ... 0xe0 0x9c. 3) Lag on the order of seconds between "break" and "make" when expecting the enter "break" code. Seen under virtualized environments such as VMware ESX. The existing special enter handler tries to glob the enter break code, but this fails if the other (KP) enter was used, or if there was a key repeat. It also fails if you mashed some keys along with enter, and you ended up with a non-enter make or non-enter break code coming after the enter make code. So first, we modify the handler to handle these cases. But performing these actions on every enter is annoying since now you can't hold ENTER down to scroll <more>d messages in KDB. Since this special behaviour is only necessary to handle the exiting KDB ('g' + ENTER) without leaking scancodes to the OS. This cleanup needs to get executed anytime the kdb_main loop exits. Tested on QEMU. Set a bp on atkbd.c to verify no scan code was leaked. Cc: Andrei Warkentin <andreiw@vmware.com> [jason.wessel@windriver.com: move cleanup calls to kdb_main.c] Signed-off-by:
Andrei Warkentin <andrey.warkentin@gmail.com> Signed-off-by:
Jason Wessel <jason.wessel@windriver.com>
-
Jason Wessel authored
The gdbstub and kdb should get detached if the system is rebooting. Calling gdbstub_exit() will set the proper debug core state and send a message to any debugger that is connected to correctly detach. An attached debugger will receive the exit code from include/linux/reboot.h based on SYS_HALT, SYS_REBOOT, etc... Reported-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Jason Wessel <jason.wessel@windriver.com>
-
Jan Kiszka authored
Not all kgdb I/O drivers implement a flush operation. Adjust gdbstub_exit accordingly. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Jason Wessel <jason.wessel@windriver.com>
-
Hugh Dickins authored
Remove lock and unlock around css_get_next()'s call to idr_get_next(). memcg iterators (only users of css_get_next) already did rcu_read_lock(), and its comment demands that; but add a WARN_ON_ONCE to make sure of it. Signed-off-by:
Hugh Dickins <hughd@google.com> Acked-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by:
Li Zefan <lizf@cn.fujitsu.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Commit c1e2ee2d ("memcg: replace ss->id_lock with a rwlock") has now been seen to cause the unfair behavior we should have expected from converting a spinlock to an rwlock: softlockup in cgroup_mkdir(), whose get_new_cssid() is waiting for the wlock, while there are 19 tasks using the rlock in css_get_next() to get on with their memcg workload (in an artificial test, admittedly). Yet lib/idr.c was made suitable for RCU way back: revert that commit, restoring ss->id_lock to a spinlock. Signed-off-by:
Hugh Dickins <hughd@google.com> Acked-by:
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by:
Li Zefan <lizf@cn.fujitsu.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-