Skip to content
  1. Oct 31, 2020
  2. Oct 28, 2020
  3. Oct 25, 2020
  4. Oct 22, 2020
  5. Oct 21, 2020
  6. Oct 20, 2020
  7. Oct 19, 2020
  8. Oct 18, 2020
  9. Oct 16, 2020
  10. Oct 15, 2020
  11. Oct 14, 2020
    • Ian Rogers's avatar
      perf bench: Use condition variables in numa. · f9299385
      Ian Rogers authored
      
      
      The existing approach to synchronization between threads in the numa
      benchmark is unbalanced mutexes.
      
      This synchronization causes thread sanitizer to warn of locks being
      taken twice on a thread without an unlock, as well as unlocks with no
      corresponding locks.
      
      This change replaces the synchronization with more regular condition
      variables.
      
      While this fixes one class of thread sanitizer warnings, there still
      remain warnings of data races due to threads reading and writing shared
      memory without any atomics.
      
      Committer testing:
      
        Basic run on a non-NUMA machine.
      
        # perf bench numa
      
                # List of available benchmarks for collection 'numa':
      
                   mem: Benchmark for NUMA workloads
                   all: Run all NUMA benchmarks
      
        # perf bench numa all
        # Running numa/mem benchmark...
      
         # Running main, "perf bench numa numa-mem"
         #
         # Running test on: Linux five 5.8.12-200.fc32.x86_64 #1 SMP Mon Sep 28 12:17:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
         #
      
         # Running RAM-bw-local, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp  1 --no-data_rand_walk"
                 20.076 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.073 secs average thread-runtime
                  0.190 % difference between max/avg runtime
                241.828 GB data processed, per thread
                241.828 GB data processed, total
                  0.083 nsecs/byte/thread runtime
                 12.045 GB/sec/thread speed
                 12.045 GB/sec total speed
      
         # Running RAM-bw-local-NOTHP, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp  1 --no-data_rand_walk --thp -1"
                 20.045 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.014 secs average thread-runtime
                  0.111 % difference between max/avg runtime
                234.304 GB data processed, per thread
                234.304 GB data processed, total
                  0.086 nsecs/byte/thread runtime
                 11.689 GB/sec/thread speed
                 11.689 GB/sec total speed
      
         # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp  1 --no-data_rand_walk"
      
        Test not applicable, system has only 1 nodes.
      
         # Running RAM-bw-local-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 0x2 -s 20 -zZq --thp  1 --no-data_rand_walk"
                 20.138 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.121 secs average thread-runtime
                  0.342 % difference between max/avg runtime
                135.961 GB data processed, per thread
                271.922 GB data processed, total
                  0.148 nsecs/byte/thread runtime
                  6.752 GB/sec/thread speed
                 13.503 GB/sec total speed
      
         # Running RAM-bw-remote-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 1x2 -s 20 -zZq --thp  1 --no-data_rand_walk"
      
        Test not applicable, system has only 1 nodes.
      
         # Running RAM-bw-cross, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp  1 --no-data_rand_walk"
      
        Test not applicable, system has only 1 nodes.
      
         # Running  1x3-convergence, "perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1"
                  0.747 secs latency to NUMA-converge
                  0.747 secs slowest (max) thread-runtime
                  0.000 secs fastest (min) thread-runtime
                  0.714 secs average thread-runtime
                 50.000 % difference between max/avg runtime
                  3.228 GB data processed, per thread
                  9.683 GB data processed, total
                  0.231 nsecs/byte/thread runtime
                  4.321 GB/sec/thread speed
                 12.964 GB/sec total speed
      
         # Running  1x4-convergence, "perf bench numa mem -p 1 -t 4 -P 512 -s 100 -zZ0qcm --thp  1"
                  1.127 secs latency to NUMA-converge
                  1.127 secs slowest (max) thread-runtime
                  1.000 secs fastest (min) thread-runtime
                  1.089 secs average thread-runtime
                  5.624 % difference between max/avg runtime
                  3.765 GB data processed, per thread
                 15.062 GB data processed, total
                  0.299 nsecs/byte/thread runtime
                  3.342 GB/sec/thread speed
                 13.368 GB/sec total speed
      
         # Running  1x6-convergence, "perf bench numa mem -p 1 -t 6 -P 1020 -s 100 -zZ0qcm --thp  1"
                  1.003 secs latency to NUMA-converge
                  1.003 secs slowest (max) thread-runtime
                  0.000 secs fastest (min) thread-runtime
                  0.889 secs average thread-runtime
                 50.000 % difference between max/avg runtime
                  2.141 GB data processed, per thread
                 12.847 GB data processed, total
                  0.469 nsecs/byte/thread runtime
                  2.134 GB/sec/thread speed
                 12.805 GB/sec total speed
      
         # Running  2x3-convergence, "perf bench numa mem -p 2 -t 3 -P 1020 -s 100 -zZ0qcm --thp  1"
                  1.814 secs latency to NUMA-converge
                  1.814 secs slowest (max) thread-runtime
                  1.000 secs fastest (min) thread-runtime
                  1.716 secs average thread-runtime
                 22.440 % difference between max/avg runtime
                  3.747 GB data processed, per thread
                 22.483 GB data processed, total
                  0.484 nsecs/byte/thread runtime
                  2.065 GB/sec/thread speed
                 12.393 GB/sec total speed
      
         # Running  3x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp  1"
                  2.065 secs latency to NUMA-converge
                  2.065 secs slowest (max) thread-runtime
                  1.000 secs fastest (min) thread-runtime
                  1.947 secs average thread-runtime
                 25.788 % difference between max/avg runtime
                  2.855 GB data processed, per thread
                 25.694 GB data processed, total
                  0.723 nsecs/byte/thread runtime
                  1.382 GB/sec/thread speed
                 12.442 GB/sec total speed
      
         # Running  4x4-convergence, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp  1"
                  1.912 secs latency to NUMA-converge
                  1.912 secs slowest (max) thread-runtime
                  1.000 secs fastest (min) thread-runtime
                  1.775 secs average thread-runtime
                 23.852 % difference between max/avg runtime
                  1.479 GB data processed, per thread
                 23.668 GB data processed, total
                  1.293 nsecs/byte/thread runtime
                  0.774 GB/sec/thread speed
                 12.378 GB/sec total speed
      
         # Running  4x4-convergence-NOTHP, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp  1 --thp -1"
                  1.783 secs latency to NUMA-converge
                  1.783 secs slowest (max) thread-runtime
                  1.000 secs fastest (min) thread-runtime
                  1.633 secs average thread-runtime
                 21.960 % difference between max/avg runtime
                  1.345 GB data processed, per thread
                 21.517 GB data processed, total
                  1.326 nsecs/byte/thread runtime
                  0.754 GB/sec/thread speed
                 12.067 GB/sec total speed
      
         # Running  4x6-convergence, "perf bench numa mem -p 4 -t 6 -P 1020 -s 100 -zZ0qcm --thp  1"
                  5.396 secs latency to NUMA-converge
                  5.396 secs slowest (max) thread-runtime
                  4.000 secs fastest (min) thread-runtime
                  4.928 secs average thread-runtime
                 12.937 % difference between max/avg runtime
                  2.721 GB data processed, per thread
                 65.306 GB data processed, total
                  1.983 nsecs/byte/thread runtime
                  0.504 GB/sec/thread speed
                 12.102 GB/sec total speed
      
         # Running  4x8-convergence, "perf bench numa mem -p 4 -t 8 -P 512 -s 100 -zZ0qcm --thp  1"
                  3.121 secs latency to NUMA-converge
                  3.121 secs slowest (max) thread-runtime
                  2.000 secs fastest (min) thread-runtime
                  2.836 secs average thread-runtime
                 17.962 % difference between max/avg runtime
                  1.194 GB data processed, per thread
                 38.192 GB data processed, total
                  2.615 nsecs/byte/thread runtime
                  0.382 GB/sec/thread speed
                 12.236 GB/sec total speed
      
         # Running  8x4-convergence, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp  1"
                  4.302 secs latency to NUMA-converge
                  4.302 secs slowest (max) thread-runtime
                  3.000 secs fastest (min) thread-runtime
                  4.045 secs average thread-runtime
                 15.133 % difference between max/avg runtime
                  1.631 GB data processed, per thread
                 52.178 GB data processed, total
                  2.638 nsecs/byte/thread runtime
                  0.379 GB/sec/thread speed
                 12.128 GB/sec total speed
      
         # Running  8x4-convergence-NOTHP, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp  1 --thp -1"
                  4.418 secs latency to NUMA-converge
                  4.418 secs slowest (max) thread-runtime
                  3.000 secs fastest (min) thread-runtime
                  4.104 secs average thread-runtime
                 16.045 % difference between max/avg runtime
                  1.664 GB data processed, per thread
                 53.254 GB data processed, total
                  2.655 nsecs/byte/thread runtime
                  0.377 GB/sec/thread speed
                 12.055 GB/sec total speed
      
         # Running  3x1-convergence, "perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0qcm --thp  1"
                  0.973 secs latency to NUMA-converge
                  0.973 secs slowest (max) thread-runtime
                  0.000 secs fastest (min) thread-runtime
                  0.955 secs average thread-runtime
                 50.000 % difference between max/avg runtime
                  4.124 GB data processed, per thread
                 12.372 GB data processed, total
                  0.236 nsecs/byte/thread runtime
                  4.238 GB/sec/thread speed
                 12.715 GB/sec total speed
      
         # Running  4x1-convergence, "perf bench numa mem -p 4 -t 1 -P 512 -s 100 -zZ0qcm --thp  1"
                  0.820 secs latency to NUMA-converge
                  0.820 secs slowest (max) thread-runtime
                  0.000 secs fastest (min) thread-runtime
                  0.808 secs average thread-runtime
                 50.000 % difference between max/avg runtime
                  2.555 GB data processed, per thread
                 10.220 GB data processed, total
                  0.321 nsecs/byte/thread runtime
                  3.117 GB/sec/thread speed
                 12.468 GB/sec total speed
      
         # Running  8x1-convergence, "perf bench numa mem -p 8 -t 1 -P 512 -s 100 -zZ0qcm --thp  1"
                  0.667 secs latency to NUMA-converge
                  0.667 secs slowest (max) thread-runtime
                  0.000 secs fastest (min) thread-runtime
                  0.607 secs average thread-runtime
                 50.000 % difference between max/avg runtime
                  1.009 GB data processed, per thread
                  8.069 GB data processed, total
                  0.661 nsecs/byte/thread runtime
                  1.512 GB/sec/thread speed
                 12.095 GB/sec total speed
      
         # Running 16x1-convergence, "perf bench numa mem -p 16 -t 1 -P 256 -s 100 -zZ0qcm --thp  1"
                  1.546 secs latency to NUMA-converge
                  1.546 secs slowest (max) thread-runtime
                  1.000 secs fastest (min) thread-runtime
                  1.485 secs average thread-runtime
                 17.664 % difference between max/avg runtime
                  1.162 GB data processed, per thread
                 18.594 GB data processed, total
                  1.331 nsecs/byte/thread runtime
                  0.752 GB/sec/thread speed
                 12.025 GB/sec total speed
      
         # Running 32x1-convergence, "perf bench numa mem -p 32 -t 1 -P 128 -s 100 -zZ0qcm --thp  1"
                  0.812 secs latency to NUMA-converge
                  0.812 secs slowest (max) thread-runtime
                  0.000 secs fastest (min) thread-runtime
                  0.739 secs average thread-runtime
                 50.000 % difference between max/avg runtime
                  0.309 GB data processed, per thread
                  9.874 GB data processed, total
                  2.630 nsecs/byte/thread runtime
                  0.380 GB/sec/thread speed
                 12.166 GB/sec total speed
      
         # Running  2x1-bw-process, "perf bench numa mem -p 2 -t 1 -P 1024 -s 20 -zZ0q --thp  1"
                 20.044 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.020 secs average thread-runtime
                  0.109 % difference between max/avg runtime
                125.750 GB data processed, per thread
                251.501 GB data processed, total
                  0.159 nsecs/byte/thread runtime
                  6.274 GB/sec/thread speed
                 12.548 GB/sec total speed
      
         # Running  3x1-bw-process, "perf bench numa mem -p 3 -t 1 -P 1024 -s 20 -zZ0q --thp  1"
                 20.148 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.090 secs average thread-runtime
                  0.367 % difference between max/avg runtime
                 85.267 GB data processed, per thread
                255.800 GB data processed, total
                  0.236 nsecs/byte/thread runtime
                  4.232 GB/sec/thread speed
                 12.696 GB/sec total speed
      
         # Running  4x1-bw-process, "perf bench numa mem -p 4 -t 1 -P 1024 -s 20 -zZ0q --thp  1"
                 20.169 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.100 secs average thread-runtime
                  0.419 % difference between max/avg runtime
                 63.144 GB data processed, per thread
                252.576 GB data processed, total
                  0.319 nsecs/byte/thread runtime
                  3.131 GB/sec/thread speed
                 12.523 GB/sec total speed
      
         # Running  8x1-bw-process, "perf bench numa mem -p 8 -t 1 -P  512 -s 20 -zZ0q --thp  1"
                 20.175 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.107 secs average thread-runtime
                  0.433 % difference between max/avg runtime
                 31.267 GB data processed, per thread
                250.133 GB data processed, total
                  0.645 nsecs/byte/thread runtime
                  1.550 GB/sec/thread speed
                 12.398 GB/sec total speed
      
         # Running  8x1-bw-process-NOTHP, "perf bench numa mem -p 8 -t 1 -P  512 -s 20 -zZ0q --thp  1 --thp -1"
                 20.216 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.113 secs average thread-runtime
                  0.535 % difference between max/avg runtime
                 30.998 GB data processed, per thread
                247.981 GB data processed, total
                  0.652 nsecs/byte/thread runtime
                  1.533 GB/sec/thread speed
                 12.266 GB/sec total speed
      
         # Running 16x1-bw-process, "perf bench numa mem -p 16 -t 1 -P 256 -s 20 -zZ0q --thp  1"
                 20.234 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.174 secs average thread-runtime
                  0.577 % difference between max/avg runtime
                 15.377 GB data processed, per thread
                246.039 GB data processed, total
                  1.316 nsecs/byte/thread runtime
                  0.760 GB/sec/thread speed
                 12.160 GB/sec total speed
      
         # Running  1x4-bw-thread, "perf bench numa mem -p 1 -t 4 -T 256 -s 20 -zZ0q --thp  1"
                 20.040 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.028 secs average thread-runtime
                  0.099 % difference between max/avg runtime
                 66.832 GB data processed, per thread
                267.328 GB data processed, total
                  0.300 nsecs/byte/thread runtime
                  3.335 GB/sec/thread speed
                 13.340 GB/sec total speed
      
         # Running  1x8-bw-thread, "perf bench numa mem -p 1 -t 8 -T 256 -s 20 -zZ0q --thp  1"
                 20.064 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.034 secs average thread-runtime
                  0.160 % difference between max/avg runtime
                 32.911 GB data processed, per thread
                263.286 GB data processed, total
                  0.610 nsecs/byte/thread runtime
                  1.640 GB/sec/thread speed
                 13.122 GB/sec total speed
      
         # Running 1x16-bw-thread, "perf bench numa mem -p 1 -t 16 -T 128 -s 20 -zZ0q --thp  1"
                 20.092 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.052 secs average thread-runtime
                  0.230 % difference between max/avg runtime
                 16.131 GB data processed, per thread
                258.088 GB data processed, total
                  1.246 nsecs/byte/thread runtime
                  0.803 GB/sec/thread speed
                 12.845 GB/sec total speed
      
         # Running 1x32-bw-thread, "perf bench numa mem -p 1 -t 32 -T 64 -s 20 -zZ0q --thp  1"
                 20.099 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.063 secs average thread-runtime
                  0.247 % difference between max/avg runtime
                  7.962 GB data processed, per thread
                254.773 GB data processed, total
                  2.525 nsecs/byte/thread runtime
                  0.396 GB/sec/thread speed
                 12.676 GB/sec total speed
      
         # Running  2x3-bw-process, "perf bench numa mem -p 2 -t 3 -P 512 -s 20 -zZ0q --thp  1"
                 20.150 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.120 secs average thread-runtime
                  0.372 % difference between max/avg runtime
                 44.827 GB data processed, per thread
                268.960 GB data processed, total
                  0.450 nsecs/byte/thread runtime
                  2.225 GB/sec/thread speed
                 13.348 GB/sec total speed
      
         # Running  4x4-bw-process, "perf bench numa mem -p 4 -t 4 -P 512 -s 20 -zZ0q --thp  1"
                 20.258 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.168 secs average thread-runtime
                  0.636 % difference between max/avg runtime
                 17.079 GB data processed, per thread
                273.263 GB data processed, total
                  1.186 nsecs/byte/thread runtime
                  0.843 GB/sec/thread speed
                 13.489 GB/sec total speed
      
         # Running  4x6-bw-process, "perf bench numa mem -p 4 -t 6 -P 512 -s 20 -zZ0q --thp  1"
                 20.559 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.382 secs average thread-runtime
                  1.359 % difference between max/avg runtime
                 10.758 GB data processed, per thread
                258.201 GB data processed, total
                  1.911 nsecs/byte/thread runtime
                  0.523 GB/sec/thread speed
                 12.559 GB/sec total speed
      
         # Running  4x8-bw-process, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp  1"
                 20.744 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.516 secs average thread-runtime
                  1.792 % difference between max/avg runtime
                  8.069 GB data processed, per thread
                258.201 GB data processed, total
                  2.571 nsecs/byte/thread runtime
                  0.389 GB/sec/thread speed
                 12.447 GB/sec total speed
      
         # Running  4x8-bw-process-NOTHP, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp  1 --thp -1"
                 20.855 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.561 secs average thread-runtime
                  2.050 % difference between max/avg runtime
                  8.069 GB data processed, per thread
                258.201 GB data processed, total
                  2.585 nsecs/byte/thread runtime
                  0.387 GB/sec/thread speed
                 12.381 GB/sec total speed
      
         # Running  3x3-bw-process, "perf bench numa mem -p 3 -t 3 -P 512 -s 20 -zZ0q --thp  1"
                 20.134 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.077 secs average thread-runtime
                  0.333 % difference between max/avg runtime
                 28.091 GB data processed, per thread
                252.822 GB data processed, total
                  0.717 nsecs/byte/thread runtime
                  1.395 GB/sec/thread speed
                 12.557 GB/sec total speed
      
         # Running  5x5-bw-process, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp  1"
                 20.588 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.375 secs average thread-runtime
                  1.427 % difference between max/avg runtime
                 10.177 GB data processed, per thread
                254.436 GB data processed, total
                  2.023 nsecs/byte/thread runtime
                  0.494 GB/sec/thread speed
                 12.359 GB/sec total speed
      
         # Running 2x16-bw-process, "perf bench numa mem -p 2 -t 16 -P 512 -s 20 -zZ0q --thp  1"
                 20.657 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.429 secs average thread-runtime
                  1.589 % difference between max/avg runtime
                  8.170 GB data processed, per thread
                261.429 GB data processed, total
                  2.528 nsecs/byte/thread runtime
                  0.395 GB/sec/thread speed
                 12.656 GB/sec total speed
      
         # Running 1x32-bw-process, "perf bench numa mem -p 1 -t 32 -P 2048 -s 20 -zZ0q --thp  1"
                 22.981 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 21.996 secs average thread-runtime
                  6.486 % difference between max/avg runtime
                  8.863 GB data processed, per thread
                283.606 GB data processed, total
                  2.593 nsecs/byte/thread runtime
                  0.386 GB/sec/thread speed
                 12.341 GB/sec total speed
      
         # Running numa02-bw, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp  1"
                 20.047 secs slowest (max) thread-runtime
                 19.000 secs fastest (min) thread-runtime
                 20.026 secs average thread-runtime
                  2.611 % difference between max/avg runtime
                  8.441 GB data processed, per thread
                270.111 GB data processed, total
                  2.375 nsecs/byte/thread runtime
                  0.421 GB/sec/thread speed
                 13.474 GB/sec total speed
      
         # Running numa02-bw-NOTHP, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp  1 --thp -1"
                 20.088 secs slowest (max) thread-runtime
                 19.000 secs fastest (min) thread-runtime
                 20.025 secs average thread-runtime
                  2.709 % difference between max/avg runtime
                  8.411 GB data processed, per thread
                269.142 GB data processed, total
                  2.388 nsecs/byte/thread runtime
                  0.419 GB/sec/thread speed
                 13.398 GB/sec total speed
      
         # Running numa01-bw-thread, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp  1"
                 20.293 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.175 secs average thread-runtime
                  0.721 % difference between max/avg runtime
                  7.918 GB data processed, per thread
                253.374 GB data processed, total
                  2.563 nsecs/byte/thread runtime
                  0.390 GB/sec/thread speed
                 12.486 GB/sec total speed
      
         # Running numa01-bw-thread-NOTHP, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp  1 --thp -1"
                 20.411 secs slowest (max) thread-runtime
                 20.000 secs fastest (min) thread-runtime
                 20.226 secs average thread-runtime
                  1.006 % difference between max/avg runtime
                  7.931 GB data processed, per thread
                253.778 GB data processed, total
                  2.574 nsecs/byte/thread runtime
                  0.389 GB/sec/thread speed
                 12.434 GB/sec total speed
      
        #
      
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lore.kernel.org/r/20201012161611.366482-1-irogers@google.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f9299385
    • John Garry's avatar
      perf jevents: Fix event code for events referencing std arch events · caf7f968
      John Garry authored
      
      
      The event code for events referencing std arch events is incorrectly
      evaluated in json_events().
      
      The issue is that je.event is evaluated properly from try_fixup(), but
      later NULLified from the real_event() call, as "event" may be NULL.
      
      Fix by setting "event" same je.event in try_fixup().
      
      Also remove support for overwriting event code for events using std arch
      events, as it is not used.
      
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Reviewed-By: default avatarKajol <Jain&lt;kjain@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/r/1602170368-11892-1-git-send-email-john.garry@huawei.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      caf7f968
    • Jin Yao's avatar
      perf diff: Support hot streams comparison · 2a09a84c
      Jin Yao authored
      
      
      This patch enables perf-diff with "--stream" option.
      
      "--stream": Enable hot streams comparison
      
      Now let's see example.
      
      perf record -b ...      Generate perf.data.old with branch data
      perf record -b ...      Generate perf.data with branch data
      perf diff --stream
      
      [ Matched hot streams ]
      
      hot chain pair 1:
                  cycles: 1, hits: 27.77%                  cycles: 1, hits: 9.24%
              ---------------------------              --------------------------
                            main div.c:39                           main div.c:39
                            main div.c:44                           main div.c:44
      
      hot chain pair 2:
                 cycles: 34, hits: 20.06%                cycles: 27, hits: 16.98%
              ---------------------------              --------------------------
                __random_r random_r.c:360               __random_r random_r.c:360
                __random_r random_r.c:388               __random_r random_r.c:388
                __random_r random_r.c:388               __random_r random_r.c:388
                __random_r random_r.c:380               __random_r random_r.c:380
                __random_r random_r.c:357               __random_r random_r.c:357
                    __random random.c:293                   __random random.c:293
                    __random random.c:293                   __random random.c:293
                    __random random.c:291                   __random random.c:291
                    __random random.c:291                   __random random.c:291
                    __random random.c:291                   __random random.c:291
                    __random random.c:288                   __random random.c:288
                           rand rand.c:27                          rand rand.c:27
                           rand rand.c:26                          rand rand.c:26
                                 rand@plt                                rand@plt
                                 rand@plt                                rand@plt
                    compute_flag div.c:25                   compute_flag div.c:25
                    compute_flag div.c:22                   compute_flag div.c:22
                            main div.c:40                           main div.c:40
                            main div.c:40                           main div.c:40
                            main div.c:39                           main div.c:39
      
      hot chain pair 3:
                   cycles: 9, hits: 4.48%                  cycles: 6, hits: 4.51%
              ---------------------------              --------------------------
                __random_r random_r.c:360               __random_r random_r.c:360
                __random_r random_r.c:388               __random_r random_r.c:388
                __random_r random_r.c:388               __random_r random_r.c:388
                __random_r random_r.c:380               __random_r random_r.c:380
      
      [ Hot streams in old perf data only ]
      
      hot chain 1:
                  cycles: 18, hits: 6.75%
               --------------------------
                __random_r random_r.c:360
                __random_r random_r.c:388
                __random_r random_r.c:388
                __random_r random_r.c:380
                __random_r random_r.c:357
                    __random random.c:293
                    __random random.c:293
                    __random random.c:291
                    __random random.c:291
                    __random random.c:291
                    __random random.c:288
                           rand rand.c:27
                           rand rand.c:26
                                 rand@plt
                                 rand@plt
                    compute_flag div.c:25
                    compute_flag div.c:22
                            main div.c:40
      
      hot chain 2:
                  cycles: 29, hits: 2.78%
               --------------------------
                    compute_flag div.c:22
                            main div.c:40
                            main div.c:40
                            main div.c:39
      
      [ Hot streams in new perf data only ]
      
      hot chain 1:
                                                           cycles: 4, hits: 4.54%
                                                       --------------------------
                                                                    main div.c:42
                                                            compute_flag div.c:28
      
      hot chain 2:
                                                           cycles: 5, hits: 3.51%
                                                       --------------------------
                                                                    main div.c:39
                                                                    main div.c:44
                                                                    main div.c:42
                                                            compute_flag div.c:28
      
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20201009022845.13141-8-yao.jin@linux.intel.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a09a84c
    • Jin Yao's avatar
      perf streams: Report hot streams · 5bbd6bad
      Jin Yao authored
      
      
      We show the streams separately. They are divided into different sections.
      
      1. "Matched hot streams"
      
      2. "Hot streams in old perf data only"
      
      3. "Hot streams in new perf data only".
      
      For each stream, we report the cycles and hot percent (hits%).
      
      For example,
      
           cycles: 2, hits: 4.08%
       --------------------------
                    main div.c:42
            compute_flag div.c:28
      
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20201009022845.13141-7-yao.jin@linux.intel.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5bbd6bad
Loading