summaryrefslogtreecommitdiffstats
path: root/tools/perf/util/evsel.c
Commit message (Collapse)AuthorAgeFilesLines
* perf tools: Use list_del_init() more thorouglyArnaldo Carvalho de Melo2019-07-091-1/+1
| | | | | | | | | | | | To allow for destructors to check if they're operating on a object still in a list, and to avoid going from use after free list entries into still valid, or even also other already removed from list entries. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-deh17ub44atyox3j90e6rksu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools lib: Adopt zalloc()/zfree() from tools/perfArnaldo Carvalho de Melo2019-07-091-1/+1
| | | | | | | | | | Eroding a bit more the tools/perf/util/util.h hodpodge header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge remote-tracking branch 'tip/perf/core' into perf/urgentArnaldo Carvalho de Melo2019-07-081-6/+21
|\ | | | | | | | | | | To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * tools perf: Move from sane_ctype.h obtained from git to the Linux's originalArnaldo Carvalho de Melo2019-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We got the sane_ctype.h headers from git and kept using it so far, but since that code originally came from the kernel sources to the git sources, perhaps its better to just use the one in the kernel, so that we can leverage tools/perf/check_headers.sh to be notified when our copy gets out of sync, i.e. when fixes or goodies are added to the code we've copied. This will help with things like tools/lib/string.c where we want to have more things in common with the kernel, such as strim(), skip_spaces(), etc so as to go on removing the things that we have in tools/perf/util/ and instead using the code in the kernel, indirectly and removing things like EXPORT_SYMBOL(), etc, getting notified when fixes and improvements are made to the original code. Hopefully this also should help with reducing the difference of code hosted in tools/ to the one in the kernel proper. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7k9868l713wqtgo01xxygn12@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf string: Move 'dots' and 'graph_dotted_line' out of sane_ctype.hArnaldo Carvalho de Melo2019-06-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Those are not in that file in the git repo, lets move it from there so that we get that sane ctype code fully isolated to allow getting it in sync either with the git sources or better with the kernel sources (include/linux/ctype.h + lib/ctype.h), that way we can use check_headers.h to get notified when changes are made in the original code so that we can cherry-pick. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ioh5sghn3943j0rxg6lb2dgs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf evsel: Make perf_evsel__name() accept a NULL argumentArnaldo Carvalho de Melo2019-06-171-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In which case it simply returns "unknown", like when it can't figure out the evsel->name value. This makes this code more robust and fixes a problem in 'perf trace' where a NULL evsel was being passed to a routine that only used the evsel for printing its name when a invalid syscall id was passed. Reported-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-f30ztaasku3z935cn3ak3h53@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * Merge tag 'perf-core-for-mingo-5.3-20190611' of ↵Ingo Molnar2019-06-171-4/+12
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf record: Alexey Budankov: - Allow mixing --user-regs with --call-graph=dwarf, making sure that the minimal set of registers for DWARF unwinding is present in the set of user registers requested to be present in each sample, while warning the user that this may make callchains unreliable if more that the minimal set of registers is needed to unwind. yuzhoujian: - Add support to collect callchains from kernel or user space only, IOW allow setting the perf_event_attr.exclude_callchain_{kernel,user} bits from the command line. perf trace: Arnaldo Carvalho de Melo: - Remove x86_64 specific syscall numbers from the augmented_raw_syscalls BPF in-kernel collector of augmented raw_syscalls:sys_{enter,exit} payloads, use instead the syscall numbers obtainer either by the arch specific syscalltbl generators or from audit-libs. - Allow 'perf trace' to ask for the number of bytes to collect for string arguments, for now ask for PATH_MAX, i.e. the whole pathnames, which ends up being just a way to speficy which syscall args are pathnames and thus should be read using bpf_probe_read_str(). - Skip unknown syscalls when expanding strace like syscall groups. This helps using the 'string' group of syscalls to work in arm64, where some of the syscalls present in x86_64 that deal with strings, for instance 'access', are deprecated and this should not be asked for tracing. Leo Yan: - Exit when failing to build eBPF program. perf config: Arnaldo Carvalho de Melo: - Bail out when a handler returns failure for a key-value pair. This helps with cases where processing a key-value pair is not just a matter of setting some tool specific knob, involving, for instance building a BPF program to then attach to the list of events 'perf trace' will use, e.g. augmented_raw_syscalls.c. perf.data: Kan Liang: - Read and store die ID information available in new Intel processors in CPUID.1F in the CPU topology written in the perf.data header. perf stat: Kan Liang: - Support per-die aggregation. Documentation: Arnaldo Carvalho de Melo: - Update perf.data documentation about the CPU_TOPOLOGY, MEM_TOPOLOGY, CLOCKID and DIR_FORMAT headers. Song Liu: - Add description of headers HEADER_BPF_PROG_INFO and HEADER_BPF_BTF. Leo Yan: - Update default value for llvm.clang-bpf-cmd-template in 'man perf-config'. JVMTI: Jiri Olsa: - Address gcc string overflow warning for strncpy() core: - Remove superfluous nthreads system_wide setup in perf_evsel__alloc_fd(). Intel PT: Adrian Hunter: - Add support for samples to contain IPC ratio, collecting cycles information from CYC packets, showing the IPC info periodically, because Intel PT does not update the cycle count on every branch or instruction, the incremental values will often be zero. When there are values, they will be the number of instructions and number of cycles since the last update, and thus represent the average IPC since the last IPC value. E.g.: # perf record --cpu 1 -m200000 -a -e intel_pt/cyc/u sleep 0.0001 rounding mmap pages size to 1024M (262144 pages) [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 2.208 MB perf.data ] # perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid # <SNIP + add line numbering to make sense of IPC counts e.g.: (18/3)> 1 cc1 63501.650479626: 7f5219ac27bf _int_free+0x3f jnz 0x7f5219ac2af0 IPC: 0.81 (36/44) 2 cc1 63501.650479626: 7f5219ac27c5 _int_free+0x45 cmp $0x1f, %rbp 3 cc1 63501.650479626: 7f5219ac27c9 _int_free+0x49 jbe 0x7f5219ac2b00 4 cc1 63501.650479626: 7f5219ac27cf _int_free+0x4f test $0x8, %al 5 cc1 63501.650479626: 7f5219ac27d1 _int_free+0x51 jnz 0x7f5219ac2b00 6 cc1 63501.650479626: 7f5219ac27d7 _int_free+0x57 movq 0x13c58a(%rip), %rcx 7 cc1 63501.650479626: 7f5219ac27de _int_free+0x5e mov %rdi, %r12 8 cc1 63501.650479626: 7f5219ac27e1 _int_free+0x61 movq %fs:(%rcx), %rax 9 cc1 63501.650479626: 7f5219ac27e5 _int_free+0x65 test %rax, %rax 10 cc1 63501.650479626: 7f5219ac27e8 _int_free+0x68 jz 0x7f5219ac2821 11 cc1 63501.650479626: 7f5219ac27ea _int_free+0x6a leaq -0x11(%rbp), %rdi 12 cc1 63501.650479626: 7f5219ac27ee _int_free+0x6e mov %rdi, %rsi 13 cc1 63501.650479626: 7f5219ac27f1 _int_free+0x71 shr $0x4, %rsi 14 cc1 63501.650479626: 7f5219ac27f5 _int_free+0x75 cmpq %rsi, 0x13caf4(%rip) 15 cc1 63501.650479626: 7f5219ac27fc _int_free+0x7c jbe 0x7f5219ac2821 16 cc1 63501.650479626: 7f5219ac2821 _int_free+0xa1 cmpq 0x13f138(%rip), %rbp 17 cc1 63501.650479626: 7f5219ac2828 _int_free+0xa8 jnbe 0x7f5219ac28d8 18 cc1 63501.650479626: 7f5219ac28d8 _int_free+0x158 testb $0x2, 0x8(%rbx) 19 cc1 63501.650479628: 7f5219ac28dc _int_free+0x15c jnz 0x7f5219ac2ab0 IPC: 6.00 (18/3) <SNIP> - Allow using time ranges with Intel PT, i.e. these features, already present but not optimially usable with Intel PT, should be now: Select the second 10% time slice: $ perf script --time 10%/2 Select from 0% to 10% time slice: $ perf script --time 0%-10% Select the first and second 10% time slices: $ perf script --time 10%/1,10%/2 Select from 0% to 10% and 30% to 40% slices: $ perf script --time 0%-10%,30%-40% cs-etm (ARM): Mathieu Poirier: - Add support for CPU-wide trace scenarios. s390: Thomas Richter: - Fix missing kvm module load for s390. - Fix OOM error in TUI mode on s390 - Support s390 diag event display when doing analysis on !s390 architectures. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * perf evsel: Remove superfluous nthreads system_wide setup in alloc_fd()Jiri Olsa2019-06-101-3/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's already setup in the only caller of this method in perf_evsel__open(), right before calling perf_evsel__alloc_fd(), no need to do it again. Also it's better to have it out of the function before we move it to libperf. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-1k8lhyjxfk7o8v4g3r7eyjc9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * perf record: Add support to collect callchains from kernel or user space onlyyuzhoujian2019-06-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One can just record callchains in the kernel or user space with this new options. We can use it together with "--all-kernel" options. This two options is used just like print_stack(sys) or print_ustack(usr) for systemtap. Shown below is the usage of this new option combined with "--all-kernel" options: 1. Configure all used events to run in kernel space and just collect kernel callchains. $ perf record -a -g --all-kernel --kernel-callchains 2. Configure all used events to run in kernel space and just collect user callchains. $ perf record -a -g --all-kernel --user-callchains Committer notes: Improved documentation to state that asking for kernel callchains really is asking for excluding user callchains, and vice versa. Further mentioned that using both won't get both, but nothing, as both will be excluded. Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1559222962-22891-1-git-send-email-ufo19890607@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * perf record: Allow mixing --user-regs with --call-graph=dwarfAlexey Budankov2019-06-051-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When DWARF stacks were requested and at the same time that the user specifies a register set using the --user-regs option the full register context was being captured on samples: $ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP -- stack_test2.g.O3 188143843893585 0x6b48 [0x4f8]: PERF_RECORD_SAMPLE(IP, 0x4002): 23828/23828: 0x401236 period: 1363819 addr: 0x7ffedbdd51ac ... FP chain: nr:0 ... user regs: mask 0xff0fff ABI 64-bit .... AX 0x53b .... BX 0x7ffedbdd3cc0 .... CX 0xffffffff .... DX 0x33d3a .... SI 0x7f09b74c38d0 .... DI 0x0 .... BP 0x401260 .... SP 0x7ffedbdd3cc0 .... IP 0x401236 .... FLAGS 0x20a .... CS 0x33 .... SS 0x2b .... R8 0x7f09b74c3800 .... R9 0x7f09b74c2da0 .... R10 0xfffffffffffff3ce .... R11 0x246 .... R12 0x401070 .... R13 0x7ffedbdd5db0 .... R14 0x0 .... R15 0x0 ... ustack: size 1024, offset 0xe0 . data_src: 0x5080021 ... thread: stack_test2.g.O:23828 ...... dso: /root/abudanko/stacks/stack_test2.g.O3 I.e. the --user-regs=IP,SP,BP was being ignored, being overridden by the needs of --call-graph=dwarf. After applying the change in this patch the sample data contains the user specified register, but making sure that at least the minimal set of register needed for DWARF unwinding (DWARF_MINIMAL_REGS) is requested. The user is warned that DWARF unwinding may not work if extra registers end up being needed. -g call-graph dwarf,K full_regs --user-regs=user_regs user_regs -g call-graph dwarf,K --user-regs=user_regs user_regs + DWARF_MINIMAL_REGS $ perf record -g --call-graph dwarf,1024 --user-regs=BP -- ls WARNING: The use of --call-graph=dwarf may require all the user registers, specifying a subset with --user-regs may render DWARF unwinding unreliable, so the minimal registers set (IP, SP) is explicitly forced. arch COPYING Documentation include Kbuild lbuild MAINTAINERS modules.builtin Module.symvers perf.data.old scripts System.map virt block CREDITS drivers init Kconfig lib Makefile modules.builtin.modinfo net README security tools vmlinux certs crypto fs ipc kernel LICENSES mm modules.order perf.data samples sound usr vmlinux.o [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ] 188368474305373 0x5e40 [0x470]: PERF_RECORD_SAMPLE(IP, 0x4002): 23839/23839: 0x401236 period: 1260507 addr: 0x7ffd3d85e96c ... FP chain: nr:0 ... user regs: mask 0x1c0 ABI 64-bit .... BP 0x401260 .... SP 0x7ffd3d85cc20 .... IP 0x401236 ... ustack: size 1024, offset 0x58 . data_src: 0x5080021 Committer notes: Detected build failures on arches where PERF_REGS_ is not available, such as debian:experimental-x-{mips,mips64,mipsel}, fedora 24 and 30 for ARC uClibc and glibc, reported to Alexey that provided a patch moving the DWARF_MINIMAL_REGS from evsel.c to util/perf_regs.h, where it is guarded by an HAVE_PERF_REGS_SUPPORT ifdef. Committer testing: # perf record --user-regs=bp,ax -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.955 MB perf.data (1773 samples) ] # perf script -F+uregs | grep AX: | head -5 perf 1719 [000] 181.272398: 1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 perf 1719 [000] 181.272402: 1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 perf 1719 [000] 181.272403: 8 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 perf 1719 [000] 181.272405: 181 cycles: ffffffffba06a7c6 native_write_msr+0x6 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 perf 1719 [000] 181.272406: 4405 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00 # perf record --call-graph=dwarf --user-regs=bp,ax -a sleep 1 WARNING: The use of --call-graph=dwarf may require all the user registers, specifying a subset with --user-regs may render DWARF unwinding unreliable, so the minimal registers set (IP, SP) is explicitly forced. [ perf record: Woken up 55 times to write data ] [ perf record: Captured and wrote 24.184 MB perf.data (2841 samples) ] [root@quaco ~]# perf script --hide-call-graph -F+uregs | grep AX: | head -5 perf 1729 [000] 211.268006: 1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db perf 1729 [000] 211.268014: 1 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db perf 1729 [000] 211.268017: 5 cycles: ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db perf 1729 [000] 211.268020: 48 cycles: ffffffffba06a7c6 native_write_msr+0x6 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db perf 1729 [000] 211.268024: 490 cycles: ffffffffba00e471 intel_bts_enable_local+0x21 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db # Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/e7fd37b1-af22-0d94-a0dc-5895e803bbfe@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | perf evsel: Do not rely on errno values for precise_ip fallbackJiri Olsa2019-07-061-8/+2Star
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Konstantin reported problem with default perf record command, which fails on some AMD servers, because of the default maximum precise config. The current fallback mechanism counts on getting ENOTSUP errno for precise_ip fails, but that's not the case on some AMD servers. We can fix this by removing the errno check completely, because the precise_ip fallback is separated. We can just try (if requested by evsel->precise_max) all possible precise_ip, and if one succeeds we win, if not, we continue with standard fallback. Reported-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin.monnet@netronome.com> Cc: Kim Phillips <kim.phillips@amd.com> Link: http://lkml.kernel.org/r/20190703080949.10356-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* / treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 251Thomas Gleixner2019-06-051-2/+1Star
|/ | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): released under the gpl v2 and only v2 not any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 12 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141332.526460839@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* perf tools: Add a 'percore' event qualifierJin Yao2019-05-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a 'percore' event qualifier, like cpu/event=0,umask=0x3,percore=1/, that sums up the event counts for both hardware threads in a core. We can already do this with --per-core, but it's often useful to do this together with other metrics that are collected per hardware thread. So we need to support this per-core counting on a event level. This can be implemented in only the user tool, no kernel support needed. v4: --- 1. Add Arnaldo's patch which updates the documentation for this new qualifier. 2. Rebase to latest perf/core branch v3: --- Simplify the code according to Jiri's comments. Before: "return term->val.percore ? true : false;" Now: "return term->val.percore;" v2: --- Change the qualifier name from 'coresum' to 'percore' according to comments from Jiri and Andi. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1555077590-27664-2-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2019-05-061-1/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel changes were: - add support for Intel's "adaptive PEBS v4" - which embedds LBS data in PEBS records and can thus batch up and reduce the IRQ (NMI) rate significantly - reducing overhead and making call-graph profiling less intrusive. - add Intel CPU core and uncore support updates for Tremont, Icelake, - extend the x86 PMU constraints scheduler with 'constraint ranges' to better support Icelake hw constraints, - make x86 call-chain support work better with CONFIG_FRAME_POINTER=y - misc other changes Tooling changes: - updates to the main tools: 'perf record', 'perf trace', 'perf stat' - updated Intel and S/390 vendor events - libtraceevent updates - misc other updates and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER watchdog: Fix typo in comment perf/x86/intel: Add Tremont core PMU support perf/x86/intel/uncore: Add Intel Icelake uncore support perf/x86/msr: Add Icelake support perf/x86/intel/rapl: Add Icelake support perf/x86/intel/cstate: Add Icelake support perf/x86/intel: Add Icelake support perf/x86: Support constraint ranges perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them perf/x86/intel: Support adaptive PEBS v4 perf/x86/intel/ds: Extract code of event update in short period perf/x86/intel: Extract memory code PEBS parser for reuse perf/x86: Support outputting XMM registers perf/x86/intel: Force resched when TFA sysctl is modified perf/core: Add perf_pmu_resched() as global function perf/headers: Fix stale comment for struct perf_addr_filter perf/core: Make perf_swevent_init_cpu() static perf/x86: Add sanity checks to x86_schedule_events() perf/x86: Optimize x86_schedule_events() ...
| * perf evsel: Support printing evsel name for 'duration_time'Andi Kleen2019-04-011-1/+10
| | | | | | | | | | | | | | | | | | Implement printing the correct name for duration_time Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190326221823.11518-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user)Mao Han2019-04-161-6/+6
|/ | | | | | | | | | | | | | | | | | | | | On 32-bits platform with more than 32 registers, the 64 bits mask is truncate to the lower 32 bits and the return value of hweight_long will always smaller than 32. When kernel outputs more than 32 registers, but the user perf program only counts 32, there will be a data mismatch result to overflow check fail. Signed-off-by: Mao Han <han_mao@c-sky.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Fixes: 6a21c0b5c2ab ("perf tools: Add core support for sampling intr machine state regs") Fixes: d03f2170546d ("perf tools: Expand perf_event__synthesize_sample()") Fixes: 0f6a30150ca2 ("perf tools: Support user regs and stack in sample parsing") Link: http://lkml.kernel.org/r/29ad7947dc8fd1ff0abd2093a72cc27a2446be9f.1554883878.git.han_mao@c-sky.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Fix max perf_event_attr.precise_ip detectionJiri Olsa2019-03-281-13/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a discussion with Andi, move the perf_event_attr.precise_ip detection for maximum precise config (via :P modifier or for default cycles event) to perf_evsel__open(). The current detection in perf_event_attr__set_max_precise_ip() is tricky, because precise_ip config is specific for given event and it currently checks only hw cycles. We now check for valid precise_ip value right after failing sys_perf_event_open() for specific event, before any of the perf_event_attr fallback code gets executed. This way we get the proper config in perf_event_attr together with allowed precise_ip settings. We can see that code activity with -vv, like: $ perf record -vv ls ... ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 ... precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 ------------------------------------------------------------ sys_perf_event_open: pid 9926 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -95 decreasing precise_ip by one (2) ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 ... precise_ip 2 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 ------------------------------------------------------------ sys_perf_event_open: pid 9926 cpu 0 group_fd -1 flags 0x8 = 4 ... Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/n/tip-dkvxxbeg7lu74155d4jhlmc9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf record: Replace option --bpf-event with --no-bpf-eventSong Liu2019-03-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Currently, monitoring of BPF programs through bpf_event is off by default for 'perf record'. To turn it on, the user need to use option "--bpf-event". As BPF gets wider adoption in different subsystems, this option becomes inconvenient. This patch makes bpf_event on by default, and adds option "--no-bpf-event" to turn it off. Since option --bpf-event is not released yet, it is safe to remove it. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: kernel-team@fb.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanislav Fomichev <sdf@google.com> Link: http://lkml.kernel.org/r/20190312053051.2690567-2-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Free evsel->counts in perf_evsel__exit()Arnaldo Carvalho de Melo2019-03-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using gcc's ASan, Changbin reports: ================================================================= ==7494==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x5625e5330a5e in zalloc util/util.h:23 #2 0x5625e5330a9b in perf_counts__new util/counts.c:10 #3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 #4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 #5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 #6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 #7 0x5625e51528e6 in run_test tests/builtin-test.c:358 #8 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #10 0x5625e515572f in cmd_test tests/builtin-test.c:722 #11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 72 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x5625e532560d in zalloc util/util.h:23 #2 0x5625e532566b in xyarray__new util/xyarray.c:10 #3 0x5625e5330aba in perf_counts__new util/counts.c:15 #4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 #5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 #6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 #7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 #8 0x5625e51528e6 in run_test tests/builtin-test.c:358 #9 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #11 0x5625e515572f in cmd_test tests/builtin-test.c:722 #12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) His patch took care of evsel->prev_raw_counts, but the above backtraces are about evsel->counts, so fix that instead. Reported-by: Changbin Du <changbin.du@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/n/tip-hd1x13g59f0nuhe4anxhsmfp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf stat: Improve scalingAndi Kleen2019-03-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The multiplexing scaling in perf stat mysteriously adds 0.5 to the value. This dates back to the original perf tool. Other scaling code doesn't use that strange convention. Remove the extra 0.5. Before: $ perf stat -e 'cycles,cycles,cycles,cycles,cycles,cycles' grep -rq foo Performance counter stats for 'grep -rq foo': 6,403,580 cycles (81.62%) 6,404,341 cycles (81.64%) 6,402,983 cycles (81.62%) 6,399,941 cycles (81.63%) 6,399,451 cycles (81.62%) 6,436,105 cycles (91.87%) 0.005843799 seconds time elapsed 0.002905000 seconds user 0.002902000 seconds sys After: $ perf stat -e 'cycles,cycles,cycles,cycles,cycles,cycles' grep -rq foo Performance counter stats for 'grep -rq foo': 6,422,704 cycles (81.68%) 6,401,842 cycles (81.68%) 6,398,432 cycles (81.68%) 6,397,098 cycles (81.68%) 6,396,074 cycles (81.67%) 6,434,980 cycles (91.62%) 0.005884437 seconds time elapsed 0.003580000 seconds user 0.002356000 seconds sys Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> LPU-Reference: 20190314225002.30108-10-andi@firstfloor.org Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf stat: Fix --no-scaleAndi Kleen2019-03-191-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -c option to enable multiplex scaling has been useless for quite some time because scaling is default. It's only useful as --no-scale to disable scaling. But the non scaling code path has bitrotted and doesn't print anything because perf output code relies on value run/ena information. Also even when we don't want to scale a value it's still useful to show its multiplex percentage. This patch: - Fixes help and documentation to show --no-scale instead of -c - Removes -c, only keeps the long option because -c doesn't support negatives. - Enables running/enabled even with --no-scale - And fixes some other problems in the no-scale output. Before: $ perf stat --no-scale -e cycles true Performance counter stats for 'true': <not counted> cycles 0.000984154 seconds time elapsed After: $ ./perf stat --no-scale -e cycles true Performance counter stats for 'true': 706,070 cycles 0.001219821 seconds time elapsed Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> LPU-Reference: 20190314225002.30108-9-andi@firstfloor.org Link: https://lkml.kernel.org/n/tip-xggjvwcdaj2aqy8ib3i4b1g6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Probe for precise_ip with simple attrJiri Olsa2019-03-061-8/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | Currently we probe for precise_ip with user specified perf_event_attr, which might fail because of unsupported kernel features, which would get disabled during the open time anyway. Switching the probe to take place on simple hw cycles, so the following record sets proper precise_ip: # perf record -e cycles:P ls # perf evlist -v cycles:P: size: 112, ... precise_ip: 3, ... Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de> Cc: Nageswara R Sastry <nasastry@in.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lkml.kernel.org/r/20190305152536.21035-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Force sample_type for slave eventsJiri Olsa2019-02-201-0/+8
| | | | | | | | | | | | | | | | Force sample_type setup for slave events in group leader sessions. We don't get sample for slave events, we make them when delivering group leader sample. Set the slave event to follow the master sample_type to ease up report. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190220122800.864-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Handle PERF_RECORD_BPF_EVENTSong Liu2019-01-211-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds basic handling of PERF_RECORD_BPF_EVENT. Tracking of PERF_RECORD_BPF_EVENT is OFF by default. Option --bpf-event is added to turn it on. Committer notes: Add dummy machine__process_bpf_event() variant that returns zero for systems without HAVE_LIBBPF_SUPPORT, such as Alpine Linux, unbreaking the build in such systems. Remove the needless include <machine.h> from bpf->event.h, provide just forward declarations for the structs and unions in the parameters, to reduce compilation time and needless rebuilds when machine.h gets changed. Committer testing: When running with: # perf record --bpf-event On an older kernel where PERF_RECORD_BPF_EVENT and PERF_RECORD_KSYMBOL is not present, we fallback to removing those two bits from perf_event_attr, making the tool to continue to work on older kernels: perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ sys_perf_event_open: pid 5779 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off bpf_event ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 ------------------------------------------------------------ sys_perf_event_open: pid 5779 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off ksymbol ------------------------------------------------------------ perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD read_format ID disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ------------------------------------------------------------ And then proceeds to work without those two features. As passing --bpf-event is an explicit action performed by the user, perhaps we should emit a warning telling that the kernel has no such feature, but this can be done on top of this patch. Now with a kernel that supports these events, start the 'record --bpf-event -a' and then run 'perf trace sleep 10000' that will use the BPF augmented_raw_syscalls.o prebuilt (for another kernel version even) and thus should generate PERF_RECORD_BPF_EVENT events: [root@quaco ~]# perf record -e dummy -a --bpf-event ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.713 MB perf.data ] [root@quaco ~]# bpftool prog 13: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 13,14 14: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 13,14 15: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 15,16 16: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:43-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 15,16 17: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:44-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 17,18 18: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:44-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 17,18 21: cgroup_skb tag 7be49e3934a125ba gpl loaded_at 2019-01-19T09:09:45-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 21,22 22: cgroup_skb tag 2a142ef67aaad174 gpl loaded_at 2019-01-19T09:09:45-0300 uid 0 xlated 296B jited 229B memlock 4096B map_ids 21,22 31: tracepoint name sys_enter tag 12504ba9402f952f gpl loaded_at 2019-01-19T09:19:56-0300 uid 0 xlated 512B jited 374B memlock 4096B map_ids 30,29,28 32: tracepoint name sys_exit tag c1bd85c092d6e4aa gpl loaded_at 2019-01-19T09:19:56-0300 uid 0 xlated 256B jited 191B memlock 4096B map_ids 30,29 # perf report -D | grep PERF_RECORD_BPF_EVENT | nl 1 0 55834574849 0x4fc8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13 2 0 60129542145 0x5118 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14 3 0 64424509441 0x5268 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15 4 0 68719476737 0x53b8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16 5 0 73014444033 0x5508 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17 6 0 77309411329 0x5658 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18 7 0 90194313217 0x57a8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21 8 0 94489280513 0x58f8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22 9 7 620922484360 0xb6390 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 29 10 7 620922486018 0xb6410 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 29 11 7 620922579199 0xb6490 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 30 12 7 620922580240 0xb6510 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 30 13 7 620922765207 0xb6598 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 31 14 7 620922874543 0xb6620 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 32 # There, the 31 and 32 tracepoint BPF programs put in place by 'perf trace'. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@fb.com Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20190117161521.1341602-7-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Handle PERF_RECORD_KSYMBOLSong Liu2019-01-211-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch handles PERF_RECORD_KSYMBOL in perf record/report. Specifically, map and symbol are created for ksymbol register, and removed for ksymbol unregister. This patch also sets perf_event_attr.ksymbol properly. The flag is ON by default. Committer notes: Use proper inttypes.h for u64, fixing the build in some environments like in the android NDK r15c targetting ARM 32-bit. I.e. fixing this build error: util/event.c: In function 'perf_event__fprintf_ksymbol': util/event.c:1489:10: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'u64' [-Werror=format=] event->ksymbol_event.flags, event->ksymbol_event.name); ^ cc1: all warnings being treated as errors Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@fb.com Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20190117161521.1341602-6-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Fix crash on synthesizing the unitJiri Olsa2018-11-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adam reported a record command crash for simple session like: $ perf record -e cpu-clock ls with following backtrace: Program received signal SIGSEGV, Segmentation fault. 3543 ev = event_update_event__new(size + 1, PERF_EVENT_UPDATE__UNIT, evsel->id[0]); (gdb) bt #0 perf_event__synthesize_event_update_unit #1 0x000000000051e469 in perf_event__synthesize_extra_attr #2 0x00000000004445cb in record__synthesize #3 0x0000000000444bc5 in __cmd_record ... We synthesize an update event that needs to touch the evsel id array, which is not defined at that time. Fix this by forcing the id allocation for events with their unit defined. Reflecting possible read_format ID bit in the attr tests. Reported-by: Yongxin Liu <yongxin.liu@outlook.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adam Lee <leeadamrobert@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201477 Fixes: bfd8f72c2778 ("perf record: Synthesize unit/scale/... in event update") Link: http://lkml.kernel.org/r/20181112130012.5424-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Do not zero sample_id_all for group membersJiri Olsa2018-11-061-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | Andi reported following malfunction: # perf record -e '{ref-cycles,cycles}:S' -a sleep 1 # perf script non matching sample_id_all That's because we disable sample_id_all bit for non-sampling group members. We can't do that, because it needs to be the same over the whole event list. This patch keeps it untouched again. Reported-by: Andi Kleen <andi@firstfloor.org> Tested-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180923150420.27327-1-jolsa@kernel.org Fixes: e9add8bac6c6 ("perf evsel: Disable write_backward for leader sampling group events") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Mark a evsel as disabled when asking the kernel do disable itArnaldo Carvalho de Melo2018-10-221-6/+17
| | | | | | | | | | | | | | | | Because there may be more such events in the ring buffer that should be discarded when an app decides to stop considering them. At some point we'll do this with eBPF, this way we stop them at origin, before they are placed in the ring buffer. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-uzufuxws4hufigx07ue1dpv6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Introduce per event max_events propertyArnaldo Carvalho de Melo2018-10-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This simply adds the field to 'struct perf_evsel' and allows setting it via the event parser, to test it lets trace trace: First look at where in a function that receives an evsel we can put a probe to read how evsel->max_events was setup: # perf probe -x ~/bin/perf -L trace__event_handler <trace__event_handler@/home/acme/git/perf/tools/perf/builtin-trace.c:0> 0 static int trace__event_handler(struct trace *trace, struct perf_evsel *evsel, union perf_event *event __maybe_unused, struct perf_sample *sample) 3 { 4 struct thread *thread = machine__findnew_thread(trace->host, sample->pid, sample->tid); 5 int callchain_ret = 0; 7 if (sample->callchain) { 8 callchain_ret = trace__resolve_callchain(trace, evsel, sample, &callchain_cursor); 9 if (callchain_ret == 0) { 10 if (callchain_cursor.nr < trace->min_stack) 11 goto out; 12 callchain_ret = 1; } } See what variables we can probe at line 7: # perf probe -x ~/bin/perf -V trace__event_handler:7 Available variables at trace__event_handler:7 @<trace__event_handler+89> int callchain_ret struct perf_evsel* evsel struct perf_sample* sample struct thread* thread struct trace* trace union perf_event* event Add a probe at that line asking for evsel->max_events to be collected and named as "max_events": # perf probe -x ~/bin/perf trace__event_handler:7 'max_events=evsel->max_events' Added new event: probe_perf:trace__event_handler (on trace__event_handler:7 in /home/acme/bin/perf with max_events=evsel->max_events) You can now use it in all perf tools, such as: perf record -e probe_perf:trace__event_handler -aR sleep 1 Now use 'perf trace', here aliased to just 'trace' and trace trace, i.e. the first 'trace' is tracing just that 'probe_perf:trace__event_handler' event, while the traced trace is tracing all scheduler tracepoints, will stop at two events (--max-events 2) and will just set evsel->max_events for all the sched tracepoints to 9, we will see the output of both traces intermixed: # trace -e *perf:*event_handler trace --max-events 2 -e sched:*/nr=9/ 0.000 :0/0 sched:sched_waking:comm=rcu_sched pid=10 prio=120 target_cpu=000 0.009 :0/0 sched:sched_wakeup:comm=rcu_sched pid=10 prio=120 target_cpu=000 0.000 trace/23949 probe_perf:trace__event_handler:(48c34a) max_events=0x9 0.046 trace/23949 probe_perf:trace__event_handler:(48c34a) max_events=0x9 # Now, if the traced trace sends its output to /dev/null, we'll see just what the first level trace outputs: that evsel->max_events is indeed being set to 9: # trace -e *perf:*event_handler trace -o /dev/null --max-events 2 -e sched:*/nr=9/ 0.000 trace/23961 probe_perf:trace__event_handler:(48c34a) max_events=0x9 0.030 trace/23961 probe_perf:trace__event_handler:(48c34a) max_events=0x9 # Now that we can set evsel->max_events, we can go to the next step, honour that per-event property in 'perf trace'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-og00yasj276joem6e14l1eas@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge remote-tracking branch 'tip/perf/urgent' into perf/coreArnaldo Carvalho de Melo2018-10-181-0/+3
|\ | | | | | | | | | | To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf evsel: Store ids for events with their own cpus ↵Jiri Olsa2018-10-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf_event__synthesize_event_update_cpus John reported crash when recording on an event under PMU with cpumask defined: root@localhost:~# ./perf_debug_ record -e armv8_pmuv3_0/br_mis_pred/ sleep 1 perf: Segmentation fault Obtained 9 stack frames. ./perf_debug_() [0x4c5ef8] [0xffff82ba267c] ./perf_debug_() [0x4bc5a8] ./perf_debug_() [0x419550] ./perf_debug_() [0x41a928] ./perf_debug_() [0x472f58] ./perf_debug_() [0x473210] ./perf_debug_() [0x4070f4] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffff8294c8a0] Segmentation fault (core dumped) We synthesize an update event that needs to touch the evsel id array, which is not defined at that time. Fixing this by forcing the id allocation for events with their own cpus. Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@huawei.com Fixes: bfd8f72c2778 ("perf record: Synthesize unit/scale/... in event update") Link: http://lkml.kernel.org/r/20181003212052.GA32371@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | tools lib traceevent, perf tools: Rename enum format_flags to enum ↵Tzvetomir Stoyanov (VMware)2018-09-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tep_format_flags In order to make libtraceevent into a proper library, variables, data structures and functions require a unique prefix to prevent name space conflicts. That prefix will be "tep_". This renames enum format_flags to enum tep_format_flags and adds prefix TEP_ to all of its members. Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: linux-trace-devel@vger.kernel.org Link: http://lkml.kernel.org/r/20180919185722.803127871@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | tools lib traceevent, perf tools: Rename struct format{_field} to struct ↵Tzvetomir Stoyanov (VMware)2018-09-191-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tep_format{_field} In order to make libtraceevent into a proper library, variables, data structures and functions require a unique prefix to prevent name space conflicts. That prefix will be "tep_". This renames struct format to struct tep_format and struct format_field to struct tep_format_field Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: linux-trace-devel@vger.kernel.org Link: http://lkml.kernel.org/r/20180919185722.661319373@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf evsel: Introduce perf_evsel__store_ids()Jiri Olsa2018-08-301-0/+29
|/ | | | | | | | | | | | | | Add perf_evsel__store_ids() from stat's store_counter_ids() code to the evsel class, so that it can be used globally. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180830063252.23729-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Fix potential null pointer dereference in perf_evsel__new_idx()Hisao Tanabe2018-08-301-2/+3
| | | | | | | | | | | | | | If evsel is NULL, we should return NULL to avoid a NULL pointer dereference a bit later in the code. Signed-off-by: Hisao Tanabe <xtanabe@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 03e0a7df3efd ("perf tools: Introduce bpf-output event") LPU-Reference: 20180824154556.23428-1-xtanabe@gmail.com Link: https://lkml.kernel.org/n/tip-e5plzjhx6595a5yjaf22jss3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools lib traceevent, perf tools: Rename pevent find APIsTzvetomir Stoyanov (VMware)2018-08-131-1/+1
| | | | | | | | | | | | | | | | | In order to make libtraceevent into a proper library, variables, data structures and functions require a unique prefix to prevent name space conflicts. That prefix will be "tep_" and not "pevent_". This changes APIs: pevent_find_any_field, pevent_find_common_field, pevent_find_event, pevent_find_field Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Cc: linux-trace-devel@vger.kernel.org Link: http://lkml.kernel.org/r/20180808180700.316995920@goodmis.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evlist: Fix error out while applying initial delay and LBRKan Liang2018-07-311-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'perf record' will error out if both --delay and LBR are applied. For example: # perf record -D 1000 -a -e cycles -j any -- sleep 2 Error: dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' # A dummy event is added implicitly for initial delay, which has the same configurations as real sampling events. The dummy event is a software event. If LBR is configured, perf must error out. The dummy event will only be used to track PERF_RECORD_MMAP while perf waits for the initial delay to enable the real events. The BRANCH_STACK bit can be safely cleared for the dummy event. After applying the patch: # perf record -D 1000 -a -e cycles -j any -- sleep 2 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.054 MB perf.data (828 samples) ] # Reported-by: Sunil K Pandey <sunil.k.pandey@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1531145722-16404-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf stat: Get rid of extra clock display functionJiri Olsa2018-07-241-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason to have separate function to display clock events. It's only purpose was to convert the nanosecond value into microseconds. We do that now in generic code, if the unit and scale values are properly set, which this patch do for clock events. The output differs in the unit field being displayed in its columns rather than having it added as a suffix of the event name. Plus the value is rounded into 2 decimal numbers as for any other event. Before: # perf stat -e cpu-clock,task-clock -C 0 sleep 3 Performance counter stats for 'CPU(s) 0': 3001.123137 cpu-clock (msec) # 1.000 CPUs utilized 3001.133250 task-clock (msec) # 1.000 CPUs utilized 3.001159813 seconds time elapsed Now: # perf stat -e cpu-clock,task-clock -C 0 sleep 3 Performance counter stats for 'CPU(s) 0': 3,001.05 msec cpu-clock # 1.000 CPUs utilized 3,001.05 msec task-clock # 1.000 CPUs utilized 3.001077794 seconds time elapsed There's a small difference in csv output, as we now output the unit field, which was empty before. It's in the proper spot, so there's no compatibility issue. Before: # perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3 3001.065177,,cpu-clock,3001064187,100.00,1.000,CPUs utilized 3001.077085,,task-clock,3001077085,100.00,1.000,CPUs utilized # perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3 3000.80,msec,cpu-clock,3000799026,100.00,1.000,CPUs utilized 3000.80,msec,task-clock,3000799550,100.00,1.000,CPUs utilized Add perf_evsel__is_clock to replace nsec_counter. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180720110036.32251-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Add has_callchain() helper to make code more compact/clearArnaldo Carvalho de Melo2018-06-051-2/+2
| | | | | | | | | | | | | | | | | Its common to have the (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN), so add an evsel__has_callchain(evsel) helper. This will actually get more uses as we check that instead of symbol_conf.use_callchain in places where that produces the same result but makes this decision to be more fine grained, per evsel. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-145340oytbthatpfeaq1do18@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: No need to unconditionally read the max_stack sysctlsArnaldo Carvalho de Melo2018-05-171-1/+1
| | | | | | | | | | | | | Let tools that need to have those variables with the sysctl current values use a function that will read them. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-1ljj3oeo5kpt2n1icfd9vowe@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Only fall back group read for leaderKan Liang2018-04-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perf doesn't support mixed events from different PMUs (except software event) in a group. The perf stat should output <not counted>/<not supported> for all events, but it doesn't. For example, perf stat -e '{cycles,uncore_imc_5/umask=0xF,event=0x4/,instructions}' <not counted> cycles <not supported> uncore_imc_5/umask=0xF,event=0x4/ 1,024,300 instructions If perf fails to open an event, it doesn't error out directly. It will disable some features and retry, until the event is opened or all features are disabled. The disabled features will not be re-enabled. The group read is one of these features. For the example as above, the IMC event and the leader event "cycles" are from different PMUs. Opening the IMC event must fail. The group read feature must be disabled for IMC event and the followed event "instructions". The "instructions" event has the same PMU as the leader "cycles". It can be opened successfully. Since the group read feature has been disabled, the "instructions" event will be read as a single event, which definitely has a value. The group read fallback is still useful for the case which kernel doesn't support group read. It is good enough to be handled only by the leader. For the fallback request from members, it must be caused by an error. The fallback only breaks the semantics of group. Limit the group read fallback only for the leader. Committer testing: On a broadwell t450s notebook: Before: # perf stat -e '{cycles,unc_cbo_cache_lookup.read_i,instructions}' sleep 1 Performance counter stats for 'sleep 1': <not counted> cycles <not supported> unc_cbo_cache_lookup.read_i 818,206 instructions 1.003170887 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog After: # perf stat -e '{cycles,unc_cbo_cache_lookup.read_i,instructions}' sleep 1 Performance counter stats for 'sleep 1': <not counted> cycles <not supported> unc_cbo_cache_lookup.read_i <not counted> instructions 1.001380511 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog # Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Agustin Vega-Frias <agustinv@codeaurora.org> Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Will Deacon <will.deacon@arm.com> Fixes: 82bf311e15d2 ("perf stat: Use group read for event groups") Link: http://lkml.kernel.org/r/1524594014-79243-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Disable write_backward for leader sampling group eventsJiri Olsa2018-04-231-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | .. and other related fields that do not need to be enabled for events that have sampling leader. It fixes the perf top usage Ingo reported broken: # perf top -e '{cycles,msr/aperf/}:S' The 'msr/aperf/' event is configured for write_back sampling, which is not allowed by the MSR PMU, so it fails to create the event. Adjusting related attr test. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180423090823.32309-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf stat: Keep the / modifier separator in fallbackJiri Olsa2018-04-231-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'perf stat' fallback for EACCES error sets the exclude_kernel perf_event_attr and tries perf_event_open() again with it. In addition, it also changes the name of the event to reflect that change by adding the 'u' modifier. But it does not take into account the '/' separator, so the event name can end up mangled, like: (note the '/:' characters) $ perf stat -e cpu/cpu-cycles/ kill ... 386,832 cpu/cpu-cycles/:u Adding the code to check on the '/' separator and set the following correct event name: $ perf stat -e cpu/cpu-cycles/ kill ... 388,548 cpu/cpu-cycles/u Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180423090823.32309-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf record: Remove suggestion to enable APICAndi Kleen2018-04-181-2/+1Star
| | | | | | | | | | | | | | 'perf record' suggests to enable the APIC on errors. APIC is practically always used today and the problem is usually somewhere else. Just remove the outdated suggestion. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20180406203812.3087-5-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf record: Remove misleading error suggestionAndi Kleen2018-04-181-2/+1Star
| | | | | | | | | | | | | | | | | When perf record encounters an error setting up an event it suggests to enable CONFIG_PERF_EVENTS. This is misleading because: - Usually it is enabled (it is really hard to disable on x86) - The problem is usually somewhere else, e.g. the CPU is not supported or an invalid configuration has been used. Remove the misleading suggestion. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20180406203812.3087-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Fix the output for stdio events listJiri Olsa2018-03-081-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | Changing the output header for reporting forced groups via --groups option on non grouped events, like: $ perf record -e 'cycles,instructions' $ perf report --stdio --group Before: # Samples: 24 of event 'anon group { cycles:u, instructions:u }' After: # Samples: 24 of events 'cycles:u, instructions:u' Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Fixes: ad52b8cb4886 ("perf report: Add support to display group output for non group events") Link: http://lkml.kernel.org/r/20180307155020.32613-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf pmu: Display pmu name when printing unmerged events in statAgustin Vega-Frias2018-03-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To simplify creation of events accross multiple instances of the same type of PMU stat supports two methods for creating multiple events from a single event specification: 1. A prefix or glob can be used in the PMU name. 2. Aliases, which are listed immediately after the Kernel PMU events by perf list, are used. When the --no-merge option is passed and these events are displayed individually the PMU name is lost and it's not possible to see which count corresponds to which pmu: $ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null Performance counter stats for 'system wide': 67 l3cache/read-miss/ 67 l3cache/read-miss/ 63 l3cache/read-miss/ 60 l3cache/read-miss/ 0.001675706 seconds time elapsed $ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null Performance counter stats for 'system wide': 12 l3cache_read_miss 17 l3cache_read_miss 10 l3cache_read_miss 8 l3cache_read_miss 0.001661305 seconds time elapsed This change adds the original pmu name to the event. For dynamic pmu events the pmu name is restored in the event name: $ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null Performance counter stats for 'system wide': 63 l3cache_0_3/read-miss/ 74 l3cache_0_1/read-miss/ 64 l3cache_0_2/read-miss/ 74 l3cache_0_0/read-miss/ 0.001675706 seconds time elapsed For alias events the name is added after the event name: $ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null Performance counter stats for 'system wide': 10 l3cache_read_miss [l3cache_0_3] 12 l3cache_read_miss [l3cache_0_1] 10 l3cache_read_miss [l3cache_0_2] 17 l3cache_read_miss [l3cache_0_0] 0.001661305 seconds time elapsed Signed-off-by: Agustin Vega-Frias <agustinv@codeaurora.org> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timur Tabi <timur@codeaurora.org> Cc: linux-arm-kernel@lists.infradead.org Change-Id: I8056b9eda74bda33e95065056167ad96e97cb1fb Link: http://lkml.kernel.org/r/1520345084-42646-3-git-send-email-agustinv@codeaurora.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf cgroup: Rename close_cgroup() to cgroup__put()Arnaldo Carvalho de Melo2018-03-071-1/+1
| | | | | | | | | | | | | | | | | | | It is not really closing the cgroup, but instead dropping a reference count and if it hits zero, then calling delete, which will, among other cleanup shores, close the cgroup fd. So it is really dropping a reference to that cgroup, and the method name for that is "put", so rename close_cgroup() to cgroup__put() to follow this naming convention. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-sccxpnd7bgwc1llgokt6fcey@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf stat: Ignore error thread when enabling system-wide --per-threadJin Yao2018-02-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we execute 'perf stat --per-thread' with non-root account (even set kernel.perf_event_paranoid = -1 yet), it reports the error: jinyao@skl:~$ perf stat --per-thread Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Perhaps the ptrace rule doesn't allow to trace some processes. But anyway the global --per-thread mode had better ignore such errors and continue working on other threads. This patch will record the index of error thread in perf_evsel__open() and remove this thread before retrying. For example (run with non-root, kernel.perf_event_paranoid isn't set): jinyao@skl:~$ perf stat --per-thread ^C Performance counter stats for 'system wide': vmstat-3458 6.171984 cpu-clock:u (msec) # 0.000 CPUs utilized perf-3670 0.515599 cpu-clock:u (msec) # 0.000 CPUs utilized vmstat-3458 1,163,643 cycles:u # 0.189 GHz perf-3670 40,881 cycles:u # 0.079 GHz vmstat-3458 1,410,238 instructions:u # 1.21 insn per cycle perf-3670 3,536 instructions:u # 0.09 insn per cycle vmstat-3458 288,937 branches:u # 46.814 M/sec perf-3670 936 branches:u # 1.815 M/sec vmstat-3458 15,195 branch-misses:u # 5.26% of all branches perf-3670 76 branch-misses:u # 8.12% of all branches 12.651675247 seconds time elapsed Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1516117388-10120-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf evsel: Expose the perf_missing_features structArnaldo Carvalho de Melo2018-02-151-11/+1Star
| | | | | | | | | | | | | | | As tools may need to adjust to missing features, as 'perf top' will, in the next csets, to cope with a missing 'write_backward' feature. Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-jelngl9q1ooaizvkcput9tic@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>