Skip to content
  1. Nov 09, 2007
    • Ingo Molnar's avatar
      sched: clean up the wakeup preempt check · 77d9cc44
      Ingo Molnar authored
      
      
      clean up the wakeup preemption check. No code changed:
      
         text    data     bss     dec     hex filename
        44227    3326      36   47589    b9e5 sched.o.before
        44227    3326      36   47589    b9e5 sched.o.after
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      77d9cc44
    • Ingo Molnar's avatar
      sched: wakeup preemption fix · 8bc6767a
      Ingo Molnar authored
      
      
      wakeup preemption fix: do not make it dependent on p->prio.
      Preemption purely depends on ->vruntime.
      
      This improves preemption in mixed-nice-level workloads.
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8bc6767a
    • Ingo Molnar's avatar
      sched: remove PREEMPT_RESTRICT · 3e3e13f3
      Ingo Molnar authored
      
      
      remove PREEMPT_RESTRICT. (this is a separate commit so that any
      regression related to the removal itself is bisectable)
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3e3e13f3
    • Ingo Molnar's avatar
      sched: turn off PREEMPT_RESTRICT · 52d3da1a
      Ingo Molnar authored
      
      
      PREEMPT_RESTRICT was a method aimed at reducing the amount of wakeup
      related preemption. It has a disadvantage though, it can prevent
      legitimate wakeups if a task is 'unlucky' to be hit too early by a tick
      that clears peer_preempt.
      
      Now that the wakeup preemption has been cleaned up we dont seem to have
      excessive preemptions anymore, so this feature can be turned off. (and
      removed in the next patch)
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      52d3da1a
    • Eric Dumazet's avatar
      sched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SEC · d6322faf
      Eric Dumazet authored
      
      
      1) hardcoded 1000000000 value is used five times in places where
         NSEC_PER_SEC might be more readable.
      
      2) A conversion from nsec to msec uses the hardcoded 1000000 value,
         which is a candidate for NSEC_PER_MSEC.
      
      no code changed:
      
          text    data     bss     dec     hex filename
         44359    3326      36   47721    ba69 sched.o.before
         44359    3326      36   47721    ba69 sched.o.after
      
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d6322faf
    • Ingo Molnar's avatar
      sched: reintroduce SMP tunings again · 19978ca6
      Ingo Molnar authored
      
      
      Yanmin Zhang reported an aim7 regression and bisected it down to:
      
       |  commit 38ad464d
       |  Author: Ingo Molnar <mingo@elte.hu>
       |  Date:   Mon Oct 15 17:00:02 2007 +0200
       |
       |     sched: uniform tunings
       |
       |     use the same defaults on both UP and SMP.
      
      fix this by reintroducing similar SMP tunings again. This resolves
      the regression.
      
      (also update the comments to match the ilog2(nr_cpus) tuning effect)
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      19978ca6
    • Paul Mackerras's avatar
      sched: restore deterministic CPU accounting on powerpc · fa13a5a1
      Paul Mackerras authored
      
      
      Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
      deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
      broken on powerpc, because we end up counting user time twice: once in
      timer_interrupt() and once in update_process_times().
      
      This fixes the problem by pulling the code in update_process_times
      that updates utime and stime into a separate function called
      account_process_tick.  If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
      there is a version of account_process_tick in kernel/timer.c that
      simply accounts a whole tick to either utime or stime as before.  If
      CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
      implement account_process_tick.
      
      This also lets us simplify the s390 code a bit; it means that the s390
      timer interrupt can now call update_process_times even when
      CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
      suitable account_process_tick().
      
      account_process_tick() now takes the task_struct * as an argument.
      Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      fa13a5a1
    • Balbir Singh's avatar
      sched: fix delay accounting regression · 9a41785c
      Balbir Singh authored
      
      
      Fix the delay accounting regression introduced by commit
      75d4ef16. rq no longer has sched_info
      data associated with it. task_struct sched_info structure is used by delay
      accounting to provide back statistics to user space.
      
      also remove direct use of sched_clock() (which is not a valid thing to
      do anymore) and use rq->clock instead.
      
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9a41785c
    • Peter Zijlstra's avatar
      sched: reintroduce the sched_min_granularity tunable · b2be5e96
      Peter Zijlstra authored
      
      
      we lost the sched_min_granularity tunable to a clever optimization
      that uses the sched_latency/min_granularity ratio - but the ratio
      is quite unintuitive to users and can also crash the kernel if the
      ratio is set to 0. So reintroduce the min_granularity tunable,
      while keeping the ratio maintained internally.
      
      no functionality changed.
      
      [ mingo@elte.hu: some fixlets. ]
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b2be5e96
    • Peter Zijlstra's avatar
      sched: documentation: place_entity() comments · 2cb8600e
      Peter Zijlstra authored
      
      
      Add a few comments to place_entity(). No code changed.
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2cb8600e
    • Peter Zijlstra's avatar
      sched: fix vslice · 10b77724
      Peter Zijlstra authored
      
      
      vslice was missing a factor NICE_0_LOAD, as weight is in
      weight*NICE_0_LOAD units.
      
      the effect of this bug was larger initial slices and
      thus latency-noisier forks.
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      10b77724
  2. Nov 05, 2007
  3. Oct 31, 2007
  4. Oct 30, 2007
  5. Oct 29, 2007
  6. Oct 28, 2007
  7. Oct 25, 2007
    • Adrian Bunk's avatar
      cpuidle: unexport tick_nohz_get_sleep_length · 4d8b4e1e
      Adrian Bunk authored
      
      
      This patch removes the unused
      EXPORT_SYMBOL_GPL(tick_nohz_get_sleep_length),
      which we no long user b/c we no longer build optional modules.
      
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Acked-by: default avatarVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      4d8b4e1e
    • Peter Zijlstra's avatar
      sched: fix unconditional irq lock · ab63a633
      Peter Zijlstra authored
      
      
      Lockdep noticed that this lock can also be taken from hardirq context, and can
      thus not unconditionally disable/enable irqs.
      
       WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()
        [show_trace_log_lvl+26/48] show_trace_log_lvl+0x1a/0x30
        [show_trace+18/32] show_trace+0x12/0x20
        [dump_stack+22/32] dump_stack+0x16/0x20
        [trace_hardirqs_on+405/416] trace_hardirqs_on+0x195/0x1a0
        [_read_unlock_irq+34/48] _read_unlock_irq+0x22/0x30
        [sched_debug_show+2615/4224] sched_debug_show+0xa37/0x1080
        [show_state_filter+326/368] show_state_filter+0x146/0x170
        [sysrq_handle_showstate+10/16] sysrq_handle_showstate+0xa/0x10
        [__handle_sysrq+123/288] __handle_sysrq+0x7b/0x120
        [handle_sysrq+40/64] handle_sysrq+0x28/0x40
        [kbd_event+1045/1680] kbd_event+0x415/0x690
        [input_pass_event+206/208] input_pass_event+0xce/0xd0
        [input_handle_event+170/928] input_handle_event+0xaa/0x3a0
        [input_event+95/112] input_event+0x5f/0x70
        [atkbd_interrupt+434/1456] atkbd_interrupt+0x1b2/0x5b0
        [serio_interrupt+59/128] serio_interrupt+0x3b/0x80
        [i8042_interrupt+263/576] i8042_interrupt+0x107/0x240
        [handle_IRQ_event+40/96] handle_IRQ_event+0x28/0x60
        [handle_edge_irq+175/320] handle_edge_irq+0xaf/0x140
        [do_IRQ+64/128] do_IRQ+0x40/0x80
        [common_interrupt+46/52] common_interrupt+0x2e/0x34
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ab63a633
  8. Oct 24, 2007
    • Peter Williams's avatar
      sched: isolate SMP balancing code a bit more · 681f3e68
      Peter Williams authored
      
      
      At the moment, a lot of load balancing code that is irrelevant to non
      SMP systems gets included during non SMP builds.
      
      This patch addresses this issue and reduces the binary size on non
      SMP systems:
      
         text    data     bss     dec     hex filename
        10983      28    1192   12203    2fab sched.o.before
        10739      28    1192   11959    2eb7 sched.o.after
      
      Signed-off-by: default avatarPeter Williams <pwil3058@bigpond.net.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      681f3e68
    • Peter Williams's avatar
      sched: reduce balance-tasks overhead · e1d1484f
      Peter Williams authored
      
      
      At the moment, balance_tasks() provides low level functionality for both
        move_tasks() and move_one_task() (indirectly) via the load_balance()
      function (in the sched_class interface) which also provides dual
      functionality.  This dual functionality complicates the interfaces and
      internal mechanisms and makes the run time overhead of operations that
      are called with two run queue locks held.
      
      This patch addresses this issue and reduces the overhead of these
      operations.
      
      Signed-off-by: default avatarPeter Williams <pwil3058@bigpond.net.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e1d1484f
    • Adrian Bunk's avatar
      sched: make cpu_shares_{show,store}() static · a0f846aa
      Adrian Bunk authored
      
      
      cpu_shares_{show,store}() can become static.
      
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a0f846aa
    • Paul Menage's avatar
      sched: clean up some control group code · 2b01dfe3
      Paul Menage authored
      
      
      - replace "cont" with "cgrp" in a few places in the CFS cgroup code, 
      - use write_uint rather than write for cpu.shares write function
      
      Signed-off-by: default avatarPaul Menage <menage@google.com>
      Acked-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2b01dfe3
    • Mel Gorman's avatar
      sched: document profile=sleep requiring CONFIG_SCHEDSTATS · b3da2a73
      Mel Gorman authored
      
      
      profile=sleep only works if CONFIG_SCHEDSTATS is set. This patch notes
      the limitation in Documentation/kernel-parameters.txt and prints a
      warning at boot-time if profile=sleep is used without CONFIG_SCHEDSTAT.
      
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b3da2a73
    • Satyam Sharma's avatar
      sched: use show_regs() to improve __schedule_bug() output · 838225b4
      Satyam Sharma authored
      
      
      A full register dump along with stack backtrace would make the
      "scheduling while atomic" message more helpful. Use show_regs() instead
      of dump_stack() for this. We already know we're atomic in here (that is
      why this function was called) so show_regs()'s atomicity expectations
      are guaranteed.
      
      Also, modify the output of the "BUG: scheduling while atomic:" header a
      bit to keep task->comm and task->pid together and preempt_count() after
      them.
      
      Signed-off-by: default avatarSatyam Sharma <satyam@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      838225b4
    • Ingo Molnar's avatar
      sched: clean up sched_domain_debug() · 4dcf6aff
      Ingo Molnar authored
      
      
      clean up sched_domain_debug().
      
      this also shrinks the code a bit:
      
         text    data     bss     dec     hex filename
        50474    4306     480   55260    d7dc sched.o.before
        50404    4306     480   55190    d796 sched.o.after
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4dcf6aff
    • Ingo Molnar's avatar
      sched: fix fastcall mismatch in completion APIs · b15136e9
      Ingo Molnar authored
      
      
      Jeff Dike noticed that wait_for_completion_interruptible()'s prototype
      had a mismatched fastcall.
      
      Fix this by removing the fastcall attributes from all the completion APIs.
      
      Found-by: default avatarJeff Dike <jdike@linux.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b15136e9
    • Milton Miller's avatar
      sched: fix sched_domain sysctl registration again · 7378547f
      Milton Miller authored
      
      
      commit  029190c5 (cpuset
      sched_load_balance flag) was not tested SCHED_DEBUG enabled as
      committed as it dereferences NULL when used and it reordered
      the sysctl registration to cause it to never show any domains
      or their tunables.
      
      Fixes:
      
      1) restore arch_init_sched_domains ordering
      	we can't walk the domains before we build them
      
      	presently we register cpus with empty directories (no domain
      	directories or files).
      
      2) make unregister_sched_domain_sysctl do nothing when already unregistered
      	detach_destroy_domains is now called one set of cpus at a time
      	unregister_syctl dereferences NULL if called with a null.
      
      	While the the function would always dereference null if called
      	twice, in the previous code it was always called once and then
      	was followed a register.  So only the hidden bug of the
      	sysctl_root_table not being allocated followed by an attempt to
      	free it would have shown the error.
      
      3) always call unregister and register in partition_sched_domains
      	The code is "smart" about unregistering only needed domains.
      	Since we aren't guaranteed any calls to unregister, always 
      	unregister.   Without calling register on the way out we
      	will not have a table or any sysctl tree.
      
      4) warn if register is called without unregistering
      	The previous table memory is lost, leaving pointers to the
      	later freed memory in sysctl and leaking the memory of the
      	tables.
      
      Before this patch on a 2-core 4-thread box compiled for SMT and NUMA,
      the domains appear empty (there are actually 3 levels per cpu).  And as
      soon as two domains a null pointer is dereferenced (unreliable in this
      case is stack garbage):
      
      bu19a:~# ls -R /proc/sys/kernel/sched_domain/
      /proc/sys/kernel/sched_domain/:
      cpu0  cpu1  cpu2  cpu3
      
      /proc/sys/kernel/sched_domain/cpu0:
      
      /proc/sys/kernel/sched_domain/cpu1:
      
      /proc/sys/kernel/sched_domain/cpu2:
      
      /proc/sys/kernel/sched_domain/cpu3:
      
      bu19a:~# mkdir /dev/cpuset
      bu19a:~# mount -tcpuset cpuset /dev/cpuset/
      bu19a:~# cd /dev/cpuset/
      bu19a:/dev/cpuset# echo 0 > sched_load_balance 
      bu19a:/dev/cpuset# mkdir one
      bu19a:/dev/cpuset# echo 1 > one/cpus               
      bu19a:/dev/cpuset# echo 0 > one/sched_load_balance 
      Unable to handle kernel paging request for data at address 0x00000018
      Faulting instruction address: 0xc00000000006b608
      NIP: c00000000006b608 LR: c00000000006b604 CTR: 0000000000000000
      REGS: c000000018d973f0 TRAP: 0300   Not tainted  (2.6.23-bml)
      MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28242442  XER: 00000000
      DAR: 0000000000000018, DSISR: 0000000040000000
      TASK = c00000001912e340[1987] 'bash' THREAD: c000000018d94000 CPU: 2
      ..
      NIP [c00000000006b608] .unregister_sysctl_table+0x38/0x110
      LR [c00000000006b604] .unregister_sysctl_table+0x34/0x110
      Call Trace:
      [c000000018d97670] [c000000007017270] 0xc000000007017270 (unreliable)
      [c000000018d97720] [c000000000058710] .detach_destroy_domains+0x30/0xb0
      [c000000018d977b0] [c00000000005cf1c] .partition_sched_domains+0x1bc/0x230
      [c000000018d97870] [c00000000009fdc4] .rebuild_sched_domains+0xb4/0x4c0
      [c000000018d97970] [c0000000000a02e8] .update_flag+0x118/0x170
      [c000000018d97a80] [c0000000000a1768] .cpuset_common_file_write+0x568/0x820
      [c000000018d97c00] [c00000000009d95c] .cgroup_file_write+0x7c/0x180
      [c000000018d97cf0] [c0000000000e76b8] .vfs_write+0xe8/0x1b0
      [c000000018d97d90] [c0000000000e810c] .sys_write+0x4c/0x90
      [c000000018d97e30] [c00000000000852c] syscall_exit+0x0/0x40
      
      Signed-off-by: default avatarMilton Miller <miltonm@bga.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7378547f
Loading