Skip to content
  • Namhyung Kim's avatar
    a1bf2305
    perf stat: Take cgroups into account for shadow stats · a1bf2305
    Namhyung Kim authored
    
    
    As of now it doesn't consider cgroups when collecting shadow stats and
    metrics so counter values from different cgroups will be saved in a same
    slot.  This resulted in incorrect numbers when those cgroups have
    different workloads.
    
    For example, let's look at the scenario below: cgroups A and C runs same
    workload which burns a cpu while cgroup B runs a light workload.
    
      $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1
    
       Performance counter stats for 'system wide':
    
         3,958,116,522      cycles                A
         6,722,650,929      instructions          A #    2.53  insn per cycle
             1,132,741      cycles                B
               571,743      instructions          B #    0.00  insn per cycle
         4,007,799,935      cycles                C
         6,793,181,523      instructions          C #    2.56  insn per cycle
    
           1.001050869 seconds time elapsed
    
    When I run 'perf stat' with single workload, it usually shows IPC around
    1.7.  We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.
    
    But in this case, since cgroups are ignored, cycles are averaged so it
    used the lower value for IPC calculation and resulted in around 2.5.
    
      avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066
      IPC (A)  :  6722650929 / 2655683066 = 2.531
      IPC (B)  :      571743 / 2655683066 = 0.0002
      IPC (C)  :  6793181523 / 2655683066 = 2.557
    
    We can simply compare cgroup pointers in the evsel and it'll be NULL
    when cgroups are not specified.  With this patch, I can see correct
    numbers like below:
    
      $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1
    
      Performance counter stats for 'system wide':
    
         4,171,051,687      cycles                A
         7,219,793,922      instructions          A #    1.73  insn per cycle
             1,051,189      cycles                B
               583,102      instructions          B #    0.55  insn per cycle
         4,171,124,710      cycles                C
         7,192,944,580      instructions          C #    1.72  insn per cycle
    
           1.007909814 seconds time elapsed
    
    Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
    Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jin Yao <yao.jin@linux.intel.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org
    
    
    Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    a1bf2305
    perf stat: Take cgroups into account for shadow stats
    Namhyung Kim authored
    
    
    As of now it doesn't consider cgroups when collecting shadow stats and
    metrics so counter values from different cgroups will be saved in a same
    slot.  This resulted in incorrect numbers when those cgroups have
    different workloads.
    
    For example, let's look at the scenario below: cgroups A and C runs same
    workload which burns a cpu while cgroup B runs a light workload.
    
      $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1
    
       Performance counter stats for 'system wide':
    
         3,958,116,522      cycles                A
         6,722,650,929      instructions          A #    2.53  insn per cycle
             1,132,741      cycles                B
               571,743      instructions          B #    0.00  insn per cycle
         4,007,799,935      cycles                C
         6,793,181,523      instructions          C #    2.56  insn per cycle
    
           1.001050869 seconds time elapsed
    
    When I run 'perf stat' with single workload, it usually shows IPC around
    1.7.  We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.
    
    But in this case, since cgroups are ignored, cycles are averaged so it
    used the lower value for IPC calculation and resulted in around 2.5.
    
      avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066
      IPC (A)  :  6722650929 / 2655683066 = 2.531
      IPC (B)  :      571743 / 2655683066 = 0.0002
      IPC (C)  :  6793181523 / 2655683066 = 2.557
    
    We can simply compare cgroup pointers in the evsel and it'll be NULL
    when cgroups are not specified.  With this patch, I can see correct
    numbers like below:
    
      $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1
    
      Performance counter stats for 'system wide':
    
         4,171,051,687      cycles                A
         7,219,793,922      instructions          A #    1.73  insn per cycle
             1,051,189      cycles                B
               583,102      instructions          B #    0.55  insn per cycle
         4,171,124,710      cycles                C
         7,192,944,580      instructions          C #    1.72  insn per cycle
    
           1.007909814 seconds time elapsed
    
    Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
    Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jin Yao <yao.jin@linux.intel.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org
    
    
    Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
Loading