Skip to content
  1. Sep 15, 2009
  2. Sep 14, 2009
  3. Sep 11, 2009
    • Jens Axboe's avatar
      block: add blk-iopoll, a NAPI like approach for block devices · 5e605b64
      Jens Axboe authored
      
      
      This borrows some code from NAPI and implements a polled completion
      mode for block devices. The idea is the same as NAPI - instead of
      doing the command completion when the irq occurs, schedule a dedicated
      softirq in the hopes that we will complete more IO when the iopoll
      handler is invoked. Devices have a budget of commands assigned, and will
      stay in polled mode as long as they continue to consume their budget
      from the iopoll softirq handler. If they do not, the device is set back
      to interrupt completion mode.
      
      This patch holds the core bits for blk-iopoll, device driver support
      sold separately.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      5e605b64
    • Jens Axboe's avatar
      writeback: add name to backing_dev_info · d993831f
      Jens Axboe authored
      
      
      This enables us to track who does what and print info. Its main use
      is catching dirty inodes on the default_backing_dev_info, so we can
      fix that up.
      
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      d993831f
  4. Sep 10, 2009
    • Ingo Molnar's avatar
      sched: Fix sched::sched_stat_wait tracepoint field · e1f84508
      Ingo Molnar authored
      
      
      This weird perf trace output:
      
        cc1-9943  [001]  2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]
      
      Is caused by setting one component field of the delta to zero
      a bit too early. Move it to later.
      
      ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug,
        it's just a reporting bug in essence. )
      
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nikos Chantziaras <realnc@arcor.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <4AA93D34.8040500@arcor.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e1f84508
    • Ingo Molnar's avatar
      sched: Disable NEW_FAIR_SLEEPERS for now · 3f2aa307
      Ingo Molnar authored
      
      
      Nikos Chantziaras and Jens Axboe reported that turning off
      NEW_FAIR_SLEEPERS improves desktop interactivity visibly.
      
      Nikos described his experiences the following way:
      
        " With this setting, I can do "nice -n 19 make -j20" and
          still have a very smooth desktop and watch a movie at
          the same time.  Various other annoyances (like the
          "logout/shutdown/restart" dialog of KDE not appearing
          at all until the background fade-out effect has finished)
          are also gone.  So this seems to be the single most
          important setting that vastly improves desktop behavior,
          at least here. "
      
      Jens described it the following way, referring to a 10-seconds
      xmodmap scheduling delay he was trying to debug:
      
        " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
          I get:
      
          Performance counter stats for 'xmodmap .xmodmap-carl':
      
               9.009137  task-clock-msecs         #      0.447 CPUs
                     18  context-switches         #      0.002 M/sec
                      1  CPU-migrations           #      0.000 M/sec
                    315  page-faults              #      0.035 M/sec
      
          0.020167093  seconds time elapsed
      
          Woot! "
      
      So disable it for now. In perf trace output i can see weird
      delta timestamps:
      
        cc1-9943  [001]  2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]
      
      That nsec field is not supposed to be that large. More digging
      is needed - but lets turn it off while the real bug is found.
      
      Reported-by: default avatarNikos Chantziaras <realnc@arcor.de>
      Tested-by: default avatarNikos Chantziaras <realnc@arcor.de>
      Reported-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Tested-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <4AA93D34.8040500@arcor.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3f2aa307
  5. Sep 09, 2009
  6. Sep 08, 2009
    • Mike Galbraith's avatar
      sched: Ensure that a child can't gain time over it's parent after fork() · b5d9d734
      Mike Galbraith authored
      
      
      A fork/exec load is usually "pass the baton", so the child
      should never be placed behind the parent.  With START_DEBIT we
      make room for the new task, but with child_runs_first, that
      room comes out of the _parent's_ hide. There's nothing to say
      that the parent wasn't ahead of min_vruntime at fork() time,
      which means that the "baton carrier", who is essentially the
      parent in drag, can gain time and increase scheduling latencies
      for waiters.
      
      With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first
      enabled, we essentially pass the sleeper fairness off to the
      child, which is fine, but if we don't base placement on the
      parent's updated vruntime, we can end up compounding latency
      woes if the child itself then does fork/exec.  The debit
      incurred at fork doesn't hurt the parent who is then going to
      sleep and maybe exit, but the child who acquires the error
      harms all comers.
      
      This improves latencies of make -j<n> kernel build workloads.
      
      Reported-by: default avatarJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b5d9d734
  7. Sep 07, 2009
  8. Sep 04, 2009
    • Steven Rostedt's avatar
      ring-buffer: only enable ring_buffer_swap_cpu when needed · 85bac32c
      Steven Rostedt authored
      
      
      Since the ability to swap the cpu buffers adds a small overhead to
      the recording of a trace, we only want to add it when needed.
      
      Only the irqsoff and preemptoff tracers use this feature, and both are
      not recommended for production kernels. This patch disables its use
      when neither irqsoff nor preemptoff is configured.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      85bac32c
    • Steven Rostedt's avatar
      ring-buffer: check for swapped buffers in start of committing · 62f0b3eb
      Steven Rostedt authored
      
      
      Because the irqsoff tracer can swap an internal CPU buffer, it is possible
      that a swap happens between the start of the write and before the committing
      bit is set (the committing bit will disable swapping).
      
      This patch adds a check for this and will fail the write if it detects it.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      62f0b3eb
    • Steven Rostedt's avatar
      tracing: report error in trace if we fail to swap latency buffer · e8165dbb
      Steven Rostedt authored
      
      
      The irqsoff tracer will fail to swap the cpu buffer with the max
      buffer if it preempts a commit. Instead of ignoring this, this patch
      makes the tracer report it if the last max latency failed due to preempting
      a current commit.
      
      The output of the latency tracer will look like this:
      
       # tracer: irqsoff
       #
       # irqsoff latency trace v1.1.5 on 2.6.31-rc5
       # --------------------------------------------------------------------
       # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
       #    -----------------
       #    | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0)
       #    -----------------
       #  => started at: save_args
       #  => ended at:   __do_softirq
       #
       #
       #                  _------=> CPU#
       #                 / _-----=> irqs-off
       #                | / _----=> need-resched
       #                || / _---=> hardirq/softirq
       #                ||| / _--=> preempt-depth
       #                |||| /
       #                |||||     delay
       #  cmd     pid   ||||| time  |   caller
       #     \   /      |||||   \   |   /
          bash-4281    1d.s6  265us : update_max_tr_single: Failed to swap buffers due to commit in progress
      
      Note the latency time and the functions that disabled the irqs or preemption
      will still be listed.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e8165dbb
    • Steven Rostedt's avatar
      tracing: add trace_array_printk for internal tracers to use · 659372d3
      Steven Rostedt authored
      
      
      This patch adds a trace_array_printk to allow a tracer to use the
      trace_printk on its own trace array.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      659372d3
    • Steven Rostedt's avatar
      tracing: pass around ring buffer instead of tracer · e77405ad
      Steven Rostedt authored
      
      
      The latency tracers (irqsoff and wakeup) can swap trace buffers
      on the fly. If an event is happening and has reserved data on one of
      the buffers, and the latency tracer swaps the global buffer with the
      max buffer, the result is that the event may commit the data to the
      wrong buffer.
      
      This patch changes the API to the trace recording to be recieve the
      buffer that was used to reserve a commit. Then this buffer can be passed
      in to the commit.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e77405ad
    • Steven Rostedt's avatar
      tracing: make tracing_reset safe for external use · f633903a
      Steven Rostedt authored
      
      
      Reseting the trace buffer without first disabling the buffer and
      waiting for any writers to complete, can corrupt the ring buffer.
      
      This patch makes the external version of tracing_reset safe from
      corruption by disabling the ring buffer and calling synchronize_sched.
      
      This version can no longer be called from interrupt context. But all those
      callers have been removed.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f633903a
    • Steven Rostedt's avatar
      tracing: use timestamp to determine start of latency traces · 2f26ebd5
      Steven Rostedt authored
      
      
      Currently the latency tracers reset the ring buffer. Unfortunately
      if a commit is in process (due to a trace event), this can corrupt
      the ring buffer. When this happens, the ring buffer will detect
      the corruption and then permanently disable the ring buffer.
      
      The bug does not crash the system, but it does prevent further tracing
      after the bug is hit.
      
      Instead of reseting the trace buffers, the timestamp of the start of
      the trace is used instead. The buffers will still contain the previous
      data, but the output will not count any data that is before the
      timestamp of the trace.
      
      Note, this only affects the static trace output (trace) and not the
      runtime trace output (trace_pipe). The runtime trace output does not
      make sense for the latency tracers anyway.
      
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2f26ebd5
    • Li Zefan's avatar
      tracing/filters: Defer pred allocation, fix memory leak · c58b4321
      Li Zefan authored
      
      
      The predicates of an event and their filter structure are allocated
      when we create an event filter for the first time.
      
      These objects must be created once but each time we come with a new
      filter, we overwrite such pre-existing allocation, if any.
      
      Thus, this patch checks if the filter has already been allocated
      before going ahead.
      
      Spotted-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      LKML-Reference: <4A9CB1BA.3060402@cn.fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      c58b4321
    • Steven Rostedt's avatar
      tracing: remove users of tracing_reset · 76f0d073
      Steven Rostedt authored
      
      
      The function tracing_reset is deprecated for outside use of trace.c.
      
      The new function to reset the the buffers is tracing_reset_online_cpus.
      
      The reason for this is that resetting the buffers while the event
      trace points are active can corrupt the buffers, because they may
      be writing at the time of reset. The tracing_reset_online_cpus disables
      writes and waits for current writers to finish.
      
      This patch replaces all users of tracing_reset except for the latency
      tracers. Those changes require more work and will be removed in the
      following patches.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      76f0d073
    • Steven Rostedt's avatar
      tracing: disable buffers and synchronize_sched before resetting · 621968cd
      Steven Rostedt authored
      
      
      Resetting the ring buffers while traces are happening can corrupt
      the ring buffer and disable it (no kernel crash to worry about).
      
      The safest thing to do is disable the ring buffers, call synchronize_sched()
      to wait for all current writers to finish and then reset the buffer.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      621968cd
    • Steven Rostedt's avatar
      tracing: disable update max tracer while reading trace · b8de7bd1
      Steven Rostedt authored
      
      
      When reading the tracer from the trace file, updating the max latency
      may corrupt the output. This patch disables the tracing of the max
      latency while reading the trace file.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b8de7bd1
    • Steven Rostedt's avatar
      tracing: print out start and stop in latency traces · 8248ac05
      Steven Rostedt authored
      
      
      During development of the tracer, we would copy information from
      the live tracer to the max tracer with one memcpy. Since then we
      added a generic ring buffer and we handle the copies differently now.
      Unfortunately, we never copied the critical section information, and
      we lost the output:
      
       #  => started at: kmem_cache_alloc
       #  => ended at:   kmem_cache_alloc
      
      This patch adds back the critical start and end copying as well as
      removes the unused "trace_idx" and "overrun" fields of the
      trace_array_cpu structure.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8248ac05
    • Steven Rostedt's avatar
      ring-buffer: disable all cpu buffers when one finds a problem · 077c5407
      Steven Rostedt authored
      
      
      Currently the way RB_WARN_ON works, is to disable either the current
      CPU buffer or all CPU buffers, depending on whether a ring_buffer or
      ring_buffer_per_cpu struct was passed into the macro.
      
      Most users of the RB_WARN_ON pass in the CPU buffer, so only the one
      CPU buffer gets disabled but the rest are still active. This may
      confuse users even though a warning is sent to the console.
      
      This patch changes the macro to disable the entire buffer even if
      the CPU buffer is passed in.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      077c5407
    • Steven Rostedt's avatar
      ring-buffer: do not count discarded events · a1863c21
      Steven Rostedt authored
      
      
      The latency tracers report the number of items in the trace buffer.
      This uses the ring buffer data to calculate this. Because discarded
      events are also counted, the numbers do not match the number of items
      that are printed. The ring buffer also adds a "padding" item to the
      end of each buffer page which also gets counted as a discarded item.
      
      This patch decrements the counter to the page entries on a discard.
      This allows us to ignore discarded entries while reading the buffer.
      
      Decrementing the counter is still safe since it can only happen while
      the committing flag is still set.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a1863c21
    • Steven Rostedt's avatar
      ring-buffer: remove ring_buffer_event_discard · dc892f73
      Steven Rostedt authored
      
      
      The function ring_buffer_event_discard can be used on any item in the
      ring buffer, even after the item was committed. This function provides
      no safety nets and is very race prone.
      
      An item may be safely removed from the ring buffer before it is committed
      with the ring_buffer_discard_commit.
      
      Since there are currently no users of this function, and because this
      function is racey and error prone, this patch removes it altogether.
      
      Note, removing this function also allows the counters to ignore
      all discarded events (patches will follow).
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      dc892f73
    • Steven Rostedt's avatar
      ring-buffer: fix ring_buffer_read crossing pages · 7e9391cf
      Steven Rostedt authored
      
      
      When the ring buffer uses an iterator (static read mode, not on the
      fly reading), when it crosses a page boundery, it will skip the first
      entry on the next page. The reason is that the last entry of a page
      is usually padding if the page is not full. The padding will not be
      returned to the user.
      
      The problem arises on ring_buffer_read because it also increments the
      iterator. Because both the read and peek use the same rb_iter_peek,
      the rb_iter_peak will return the padding but also increment to the next
      item. This is because the ring_buffer_peek will not incerment it
      itself.
      
      The ring_buffer_read will increment it again and then call rb_iter_peek
      again to get the next item. But that will be the second item, not the
      first one on the page.
      
      The reason this never showed up before, is because the ftrace utility
      always calls ring_buffer_peek first and only uses ring_buffer_read
      to increment to the next item. The ring_buffer_peek will always keep
      the pointer to a valid item and not padding. This just hid the bug.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7e9391cf
    • Steven Rostedt's avatar
      ring-buffer: remove unnecessary cpu_relax · 1b959e18
      Steven Rostedt authored
      
      
      The loops in the ring buffer that use cpu_relax are not dependent on
      other CPUs. They simply came across some padding in the ring buffer and
      are skipping over them. It is a normal loop and does not require a
      cpu_relax.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1b959e18
    • Steven Rostedt's avatar
      ring-buffer: do not swap buffers during a commit · 98277991
      Steven Rostedt authored
      
      
      If a commit is taking place on a CPU ring buffer, do not allow it to
      be swapped. Return -EBUSY when this is detected instead.
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      98277991
Loading