summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | x86/asm: Rewrite sync_core() to use IRET-to-selfAndy Lutomirski2016-12-191-22/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aside from being excessively slow, CPUID is problematic: Linux runs on a handful of CPUs that don't have CPUID. Use IRET-to-self instead. IRET-to-self works everywhere, so it makes testing easy. For reference, On my laptop, IRET-to-self is ~110ns, CPUID(eax=1, ecx=0) is ~83ns on native and very very slow under KVM, and MOV-to-CR2 is ~42ns. While we're at it: sync_core() serves a very specific purpose. Document it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/5c79f0225f68bc8c40335612bf624511abb78941.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/microcode/intel: Replace sync_core() with native_cpuid()Andy Lutomirski2016-12-191-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Intel microcode driver is using sync_core() to mean "do CPUID with EAX=1". I want to rework sync_core(), but first the Intel microcode driver needs to stop depending on its current behavior. Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Juergen Gross <jgross@suse.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/535a025bb91fed1a019c5412b036337ad239e5bb.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"Andy Lutomirski2016-12-191-6/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit ed68d7e9b9cfb64f3045ffbcb108df03c09a0f98. The patch wasn't quite correct -- there are non-Intel (and hence non-486) CPUs that we support that don't have CPUID. Since we no longer require CPUID for sync_core(), just revert the patch. I think the relevant CPUs are Geode and Elan, but I'm not sure. In principle, we should try to do better at identifying CPUID-less CPUs in early boot, but that's more complicated. Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/82acde18a108b8e353180dd6febcc2876df33f24.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernelsAndy Lutomirski2016-12-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We support various non-Intel CPUs that don't have the CPUID instruction, so the M486 test was wrong. For now, fix it with a big hammer: handle missing CPUID on all 32-bit CPUs. Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/685bd083a7c036f7769510b6846315b17d6ba71f.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6Andy Lutomirski2016-12-191-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A typo (or mis-merge?) resulted in leaf 6 only being probed if cpuid_level >= 7. Fixes: 2ccd71f1b278 ("x86/cpufeature: Move some of the scattered feature bits to x86_capability") Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Link: http://lkml.kernel.org/r/6ea30c0e9daec21e488b54761881a6dfcf3e04d0.1481825597.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/tools: Fix gcc-7 warning in relocs.cMarkus Trippelsdorf2016-12-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc-7 warns: In file included from arch/x86/tools/relocs_64.c:17:0: arch/x86/tools/relocs.c: In function ‘process_64’: arch/x86/tools/relocs.c:953:2: warning: argument 1 null where non-null expected [-Wnonnull] qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/tools/relocs.h:6:0, from arch/x86/tools/relocs_64.c:1: /usr/include/stdlib.h:741:13: note: in a call to function ‘qsort’ declared here extern void qsort This happens because relocs16 is not used for ELF_BITS == 64, so there is no point in trying to sort it. Make the sort_relocs(&relocs16) call 32bit only. Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de> Link: http://lkml.kernel.org/r/20161215124513.GA289@x4 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/unwind: Dump stack data on warningsJosh Poimboeuf2016-12-192-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The unwinder warnings are good at finding unexpected unwinder issues, but they often don't give enough data to be able to fully diagnose them. Print a one-time stack dump when a warning is detected. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/15607370e3ddb1732b6a73d5c65937864df16ac8.1481904011.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/unwind: Adjust last frame check for aligned function stacksJosh Poimboeuf2016-12-191-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Somehow, CONFIG_PARAVIRT=n convinces gcc to change the x86_64_start_kernel() prologue from: 0000000000000129 <x86_64_start_kernel>: 129: 55 push %rbp 12a: 48 89 e5 mov %rsp,%rbp to: 0000000000000124 <x86_64_start_kernel>: 124: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 129: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 12d: 41 ff 72 f8 pushq -0x8(%r10) 131: 55 push %rbp 132: 48 89 e5 mov %rsp,%rbp This is an unusual pattern which aligns rsp (though in this case it's already aligned) and saves the start_cpu() return address again on the stack before storing the frame pointer. The unwinder assumes the last stack frame header is at a certain offset, but the above code breaks that assumption, resulting in the following warning: WARNING: kernel stack frame pointer at ffffffff82e03f40 in swapper:0 has bad value (null) Fix it by checking for the last task stack frame at the aligned offset in addition to the normal unaligned offset. Fixes: acb4608ad186 ("x86/unwind: Create stack frames for saved syscall registers") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/9d7b4eb8cf55a7d6002cb738f25c23e7429c99a0.1481904011.git.jpoimboe@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/init: Fix a couple of comment typosDmitry Torokhov2016-12-191-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: linux-input@vger.kernel.org Link: http://lkml.kernel.org/r/1481317061-31486-5-git-send-email-dmitry.torokhov@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/init: Remove i8042_detect() from platform opsDmitry Torokhov2016-12-194-17/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that i8042 uses flag in legacy platform data, i8042_detect() is no longer used and can be removed. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Takashi Iwai <tiwai@suse.de> Acked-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: linux-input@vger.kernel.org Link: http://lkml.kernel.org/r/1481317061-31486-4-git-send-email-dmitry.torokhov@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | Input: i8042 - Trust firmware a bit more when probing on X86Dmitry Torokhov2016-12-191-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error message "Can't read CTR while initializing i8042" appears on Cherry Trail-based devices at each boot time: i8042: PNP: No PS/2 controller found. Probing ports directly. i8042: Can't read CTR while initializing i8042 i8042: probe of i8042 failed with error -5 This happens because we historically do not trust firmware on X86 and, while noting that PNP does not show keyboard or mouse devices, we still charge ahead and try to probe the controller. Let's relax this a bit and if results of PNP probe agree with the results of platform initialization/quirks conclude that there is, in fact, no i8042. While at it, let's avoid using x86_platform.i8042_detect() and instead abort execution early if platform indicates that it can not possibly have i8042 (x86_platform.legacy.i8042 equals X86_LEGACY_I8042_PLATFORM_ABSENT). Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: linux-input@vger.kernel.org Link: http://lkml.kernel.org/r/1481317061-31486-3-git-send-email-dmitry.torokhov@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/init: Add i8042 state to the platform dataDmitry Torokhov2016-12-193-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add i8042 state to the platform data to help i8042 driver make decision whether to probe for i8042 or not. We recognize 3 states: platform/subarch ca not possible have i8042 (as is the case with Inrel MID platform), firmware (such as ACPI) reports that i8042 is absent from the device, or i8042 may be present and the driver should probe for it. The intent is to allow i8042 driver abort initialization on x86 if PNP data (absence of both keyboard and mouse PNP devices) agrees with firmware data. It will also allow us to remove i8042_detect later. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Takashi Iwai <tiwai@suse.de> Acked-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Cc: linux-input@vger.kernel.org Link: http://lkml.kernel.org/r/1481317061-31486-2-git-send-email-dmitry.torokhov@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/microcode/AMD: Use native_cpuid() in load_ucode_amd_bsp()Boris Ostrovsky2016-12-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_PARAVIRT is selected, cpuid() becomes a call. Since for 32-bit kernels load_ucode_amd_bsp() is executed before paging is enabled the call cannot be completed (as kernel virtual addresses are not reachable yet). Use native_cpuid() instead which is an asm wrapper for the CPUID instruction. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jürgen Gross <jgross@suse.com> Link: http://lkml.kernel.org/r/1481906392-3847-1-git-send-email-boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/20161218164414.9649-5-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/microcode/AMD: Do not load when running on a hypervisorBorislav Petkov2016-12-191-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Doing so is completely void of sense for multiple reasons so prevent it. Set dis_ucode_ldr to true and thus disable the microcode loader by default to address xen pv guests which execute the AP path but not the BSP path. By having it turned off by default, the APs won't run into the loader either. Also, check CPUID(1).ECX[31] which hypervisors set. Well almost, not the xen pv one. That one gets the aforementioned "fix". Also, improve the detection method by caching the final decision whether to continue loading in dis_ucode_ldr and do it once on the BSP. The APs then simply test that value. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Juergen Gross <jgross@suse.com> Link: http://lkml.kernel.org/r/20161218164414.9649-4-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/microcode/AMD: Sanitize apply_microcode_early_amd()Borislav Petkov2016-12-191-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make it simply return bool to denote whether it found a container or not and return the pointer to the container and its size in the handed-in container pointer instead, as returning a struct was just silly. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jürgen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/20161218164414.9649-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/microcode/AMD: Make find_proper_container() sane againBorislav Petkov2016-12-191-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixup signature and retvals, return the container struct through the passed in pointer, not as a function return value. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jürgen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/20161218164414.9649-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2016-12-241-6/+16
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "ARM/MOXA SoC clocksource driver fixes" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/moxart: Plug memory and mapping leaks
| * | | | clocksource/drivers/moxart: Plug memory and mapping leaksSudip Mukherjee2016-12-191-6/+16
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If of_iomap() or any other subsequent function fails moxart_timer_init() exits without freeing memory and unmapping the timer base. Add proper cleanup points. Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: http://lkml.kernel.org/r/1482099996-1524-1-git-send-email-sudipm.mukherjee@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2016-12-2463-750/+1104
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "On the kernel side there's two x86 PMU driver fixes and a uprobes fix, plus on the tooling side there's a number of fixes and some late updates" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) perf sched timehist: Fix invalid period calculation perf sched timehist: Remove hardcoded 'comm_width' check at print_summary perf sched timehist: Enlarge default 'comm_width' perf sched timehist: Honour 'comm_width' when aligning the headers perf/x86: Fix overlap counter scheduling bug perf/x86/pebs: Fix handling of PEBS buffer overflows samples/bpf: Move open_raw_sock to separate header samples/bpf: Remove perf_event_open() declaration samples/bpf: Be consistent with bpf_load_program bpf_insn parameter tools lib bpf: Add bpf_prog_{attach,detach} samples/bpf: Switch over to libbpf perf diff: Do not overwrite valid build id perf annotate: Don't throw error for zero length symbols perf bench futex: Fix lock-pi help string perf trace: Check if MAP_32BIT is defined (again) samples/bpf: Make perf_event_read() static uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation samples/bpf: Make samples more libbpf-centric tools lib bpf: Add flags to bpf_create_map() tools lib bpf: use __u32 from linux/types.h ...
| * \ \ \ Merge tag 'perf-urgent-for-mingo-20161222' of ↵Ingo Molnar2016-12-231-8/+6Star
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: Fixes for 'perf sched timehist': (Namhyung Kim) - Define a larger initial alignment value for the COMM column and make it be more consistently honoured, for instance in the header. - Fix invalid period calculation when using the --time option to select a time slice, when events outside that slice were being considered for the per cpu idle stats summary. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | perf sched timehist: Fix invalid period calculationNamhyung Kim2016-12-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When --time option is given with a value outside recorded time, the last sample time (tprev) was set to that value and run time calculation might be incorrect. This is a problem of the first samples for each cpus since it would skip the runtime update when tprev is 0. But with --time option it had non-zero (which is invalid) value so the calculation is also incorrect. For example, let's see the followging: $ perf sched timehist time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 3195.968367 [0003] <idle> 0.000 0.000 0.000 3195.968386 [0002] Timer[4306/4277] 0.000 0.000 0.018 3195.968397 [0002] Web Content[4277] 0.000 0.000 0.000 3195.968595 [0001] JS Helper[4302/4277] 0.000 0.000 0.000 3195.969217 [0000] <idle> 0.000 0.000 0.621 3195.969251 [0001] kworker/1:1H[291] 0.000 0.000 0.033 The sample starts at 3195.968367 but when I gave a time interval from 3194 to 3196 (in sec) it will calculate the whole 2 second as runtime. In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as idle time. Before: $ perf sched timehist --time 3194,3196 -s | tail Idle stats: CPU 0 idle for 1995.991 msec CPU 1 idle for 20.793 msec CPU 2 idle for 30.191 msec CPU 3 idle for 1999.852 msec Total number of unique tasks: 23 Total number of context switches: 128 Total run time (msec): 3724.940 After: $ perf sched timehist --time 3194,3196 -s | tail Idle stats: CPU 0 idle for 10.811 msec CPU 1 idle for 20.793 msec CPU 2 idle for 30.191 msec CPU 3 idle for 18.337 msec Total number of unique tasks: 23 Total number of context switches: 128 Total run time (msec): 18.139 Committer notes: Further testing: Before: Idle stats: CPU 0 idle for 229.785 msec CPU 1 idle for 937.944 msec CPU 2 idle for 188.931 msec CPU 3 idle for 986.185 msec After: # perf sched timehist --time 40602,40603 -s | tail Idle stats: CPU 0 idle for 229.785 msec CPU 1 idle for 175.407 msec CPU 2 idle for 188.931 msec CPU 3 idle for 223.657 msec Total number of unique tasks: 68 Total number of context switches: 814 Total run time (msec): 97.688 # for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 | grep "\[000${cpu}\].*\<idle\>" | tr -s ' ' | cut -d' ' -f7 | awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done CPU 0 idle for 229.721 msec (entries: 123) CPU 1 idle for 175.381 msec (entries: 65) CPU 2 idle for 188.903 msec (entries: 56) CPU 3 idle for 223.61 msec (entries: 102) Difference due to the idle stats being accounted at nanoseconds precision while the <idle> entries in 'perf sched timehist' are trucated at msec.usec. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 853b74071110 ("perf sched timehist: Add option to specify time window of interest") Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf sched timehist: Remove hardcoded 'comm_width' check at print_summaryNamhyung Kim2016-12-221-3/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the default 'comm_width' value is 30, no need to check that at print_summary, Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf sched timehist: Enlarge default 'comm_width'Namhyung Kim2016-12-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current default value is 20 but it's easily changed to a bigger value as task has a long name and different tid and pid. And it makes the output not aligned. So change it to have a large value as summary shows. Committer notes: Before: # perf sched record ^C # perf sched timehist <SNIP> 40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020 40602.771512 [0003] <idle> 0.003 0.000 0.986 40602.771586 [0001] <idle> 0.020 0.000 1.049 40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020 40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116 40602.771776 [0000] <idle> 0.001 0.000 1.892 <SNIP> After: # perf sched timehist <SNIP> 40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020 40602.771512 [0003] <idle> 0.003 0.000 0.986 40602.771586 [0001] <idle> 0.020 0.000 1.049 40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020 40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116 <SNIP> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf sched timehist: Honour 'comm_width' when aligning the headersNamhyung Kim2016-12-221-3/+4
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current default value is 20, but that may change in the future, so make places where we have 20 hardcoded use 'comm_width'. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org [ Split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf/x86: Fix overlap counter scheduling bugPeter Zijlstra2016-12-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri reported the overlap scheduling exceeding its max stack. Looking at the constraint that triggered this, it turns out the overlap marker isn't needed. The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight". Esp. that latter part is of interest here I think, our overlapping mask is 0x0e, that has 3 bits set and is the highest weight mask in on the PMU, therefore it will be placed last. Can we still create a scenario where we would need to rewind that? The scenario for AMD Fam15h is we're having masks like: 0x3F -- 111111 0x38 -- 111000 0x07 -- 000111 0x09 -- 001001 And we mark 0x09 as overlapping, because it is not a direct subset of 0x38 or 0x07 and has less weight than either of those. This means we'll first try and place the 0x09 event, then try and place 0x38/0x07 events. Now imagine we have: 3 * 0x07 + 0x09 and the initial pick for the 0x09 event is counter 0, then we'll fail to place all 0x07 events. So we'll pop back, try counter 4 for the 0x09 event, and then re-try all 0x07 events, which will now work. The masks on the PMU in question are: 0x01 - 0001 0x03 - 0011 0x0e - 1110 0x0c - 1100 But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 -> 0x1) are of heavier weight, it should all work out. Reported-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Liang Kan <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | perf/x86/pebs: Fix handling of PEBS buffer overflowsStephane Eranian2016-12-221-9/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch solves a race condition between PEBS and the PMU handler. In case multiple PEBS events are sampled at the same time, it is possible to have GLOBAL_STATUS bit 62 set indicating PEBS buffer overflow and also seeing at most 3 PEBS counters having their bits set in the status register. This is a sign that there was at least one PEBS record pending at the time of the PMU interrupt. PEBS counters must only be processed via the drain_pebs() calls, and not via the regular sample processing loop coming after that the function, otherwise phony regular samples may be generated in the sampling buffer not marked with the EXACT tag. Another possibility is to have one PEBS event and at least one non-PEBS event whic hoverflows while PEBS has armed. In this case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow status bit for the PEBS counter will be on Skylake. To avoid this problem, we systematically ignore the PEBS-enabled counters from the GLOBAL_STATUS mask and we always process PEBS events via drain_pebs(). The problem manifested itself by having non-exact samples when sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would not have the EXACT flag set. Note that this problem is only present on Skylake processor. This fix is harmless on older processors. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | Merge tag 'perf-core-for-mingo-20161220' of ↵Ingo Molnar2016-12-2060-731/+1075
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes: New features: - Introduce 'perf sched timehist --idle', to analyse processes going to/from idle state (Namhyung Kim) Fixes: - Allow 'perf record -u user' to continue when facing races with threads going away after having scanned them via /proc (Jiri Olsa) - Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa) - Support jumps with multiple arguments (Ravi Bangoria) - Fix jumps to before the function where they are located (Ravi Bangoria) - Fix lock-pi help string (Davidlohr Bueso) - Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa) - Do not overwrite valid build id in 'perf diff' (Kan Liang) - Don't throw error for zero length symbols, allowing the use of the TUI in PowerPC, where such symbols became more common recently (Ravi Bangoria) Infrastructure changes: - Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf duplication (Joe Stringer) - Move headers check into bash script (Jiri Olsa) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | samples/bpf: Move open_raw_sock to separate headerJoe Stringer2016-12-208-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function was declared in libbpf.c and was the only remaining function in this library, but has nothing to do with BPF. Shift it out into a new header, sock_example.h, and include it from the relevant samples. Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | samples/bpf: Remove perf_event_open() declarationJoe Stringer2016-12-208-18/+15Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This declaration was made in samples/bpf/libbpf.c for convenience, but there's already one in tools/perf/perf-sys.h. Reuse that one. Committer notes: Testing it: $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/ make[1]: Entering directory '/home/build/v4.9.0-rc8+' CHK include/config/kernel.release GEN ./Makefile CHK include/generated/uapi/linux/version.h Using /home/acme/git/linux as source for kernel CHK include/generated/utsrelease.h CHK include/generated/timeconst.h CHK include/generated/bounds.h CHK include/generated/asm-offsets.h CALL /home/acme/git/linux/scripts/checksyscalls.sh HOSTCC samples/bpf/test_verifier.o HOSTCC samples/bpf/libbpf.o HOSTCC samples/bpf/../../tools/lib/bpf/bpf.o HOSTCC samples/bpf/test_maps.o HOSTCC samples/bpf/sock_example.o HOSTCC samples/bpf/bpf_load.o <SNIP> HOSTLD samples/bpf/trace_event HOSTLD samples/bpf/sampleip HOSTLD samples/bpf/tc_l2_redirect make[1]: Leaving directory '/home/build/v4.9.0-rc8+' $ Also tested the offwaketime resulting from the rebuild, seems to work as before. Signed-off-by: Joe Stringer <joe@ovn.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | samples/bpf: Be consistent with bpf_load_program bpf_insn parameterArnaldo Carvalho de Melo2016-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only one of the examples declare the bpf_insn bpf proggie as a const: $ grep 'struct bpf_insn [a-z]' samples/bpf/*.c samples/bpf/fds_example.c: static const struct bpf_insn insns[] = { samples/bpf/sock_example.c: struct bpf_insn prog[] = { samples/bpf/test_cgrp2_attach2.c: struct bpf_insn prog[] = { samples/bpf/test_cgrp2_attach.c: struct bpf_insn prog[] = { samples/bpf/test_cgrp2_sock.c: struct bpf_insn prog[] = { $ Which causes this warning: [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/ <SNIP> HOSTCC samples/bpf/fds_example.o /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create': /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] insns, insns_cnt, "GPL", 0, ^~~~~ In file included from /git/linux/samples/bpf/libbpf.h:5:0, from /git/linux/samples/bpf/bpf_load.h:4, from /git/linux/samples/bpf/fds_example.c:15: /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *' int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns, ^~~~~~~~~~~~~~~~ HOSTCC samples/bpf/sockex1_user.o So just ditch that 'const' to reduce build noise, leaving changing the bpf_load_program() bpf_insn parameter to const to a later patch, if deemed adequate. Cc: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-1z5xee8n3oa66jf62bpv16ed@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | tools lib bpf: Add bpf_prog_{attach,detach}Joe Stringer2016-12-204-24/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching eBPF programs to cgroups") added these functions to samples/libbpf, but during this merge all of the samples libbpf functionality is shifting to tools/lib/bpf. Shift these functions there. Committer notes: Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just like the other wrapper calls to sys_bpf with bpf_attr to make this build in older toolchais, such as the ones in CentOS 5 and 6. Signed-off-by: Joe Stringer <joe@ovn.org> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org Link: https://github.com/joestringer/linux/commit/353e6f298c3d0a92fa8bfa61ff898c5050261a12.patch Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | samples/bpf: Switch over to libbpfJoe Stringer2016-12-2010-168/+52Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that libbpf under tools/lib/bpf/* is synced with the version from samples/bpf, we can get rid most of the libbpf library here. Committer notes: Built it in a docker fedora rawhide container and ran it in the f25 host, seems to work just like it did before this patch, i.e. the switch to tools/lib/bpf/ doesn't seem to have introduced problems and Joe said he tested it with all the entries in samples/bpf/ and other code he found: [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install <SNIP> [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/ [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/ make[1]: Entering directory '/tmp/build/linux' CHK include/config/kernel.release HOSTCC scripts/basic/fixdep GEN ./Makefile CHK include/generated/uapi/linux/version.h Using /git/linux as source for kernel CHK include/generated/utsrelease.h HOSTCC scripts/basic/bin2c HOSTCC arch/x86/tools/relocs_32.o HOSTCC arch/x86/tools/relocs_64.o LD samples/bpf/built-in.o <SNIP> HOSTCC samples/bpf/fds_example.o HOSTCC samples/bpf/sockex1_user.o /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create': /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] insns, insns_cnt, "GPL", 0, ^~~~~ In file included from /git/linux/samples/bpf/libbpf.h:5:0, from /git/linux/samples/bpf/bpf_load.h:4, from /git/linux/samples/bpf/fds_example.c:15: /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *' int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns, ^~~~~~~~~~~~~~~~ HOSTCC samples/bpf/sockex2_user.o <SNIP> HOSTCC samples/bpf/xdp_tx_iptunnel_user.o clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h \ -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \ -Wno-compare-distinct-pointer-types \ -Wno-gnu-variable-sized-type-not-at-end \ -Wno-address-of-packed-member -Wno-tautological-compare \ -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o HOSTLD samples/bpf/tc_l2_redirect <SNIP> HOSTLD samples/bpf/lwt_len_hist HOSTLD samples/bpf/xdp_tx_iptunnel make[1]: Leaving directory '/tmp/build/linux' [root@f5065a7d6272 linux]# And then, in the host: [root@jouet bpf]# mount | grep "docker.*devicemapper\/" /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota) [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/ [root@jouet bpf]# file offwaketime offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped [root@jouet bpf]# readelf -SW offwaketime offwaketime offwaketime_kern.o offwaketime_user.o [root@jouet bpf]# readelf -SW offwaketime_kern.o There are 11 section headers, starting at offset 0x700: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .strtab STRTAB 0000000000000000 000658 0000a8 00 0 0 1 [ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4 [ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8 [ 4] .relkprobe/try_to_wake_up REL 0000000000000000 0005a8 000020 10 10 3 8 [ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8 [ 6] .reltracepoint/sched/sched_switch REL 0000000000000000 0005c8 000090 10 10 5 8 [ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4 [ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1 [ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4 [10] .symtab SYMTAB 0000000000000000 000488 000120 18 1 4 8 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) [root@jouet bpf]# ./offwaketime | head -3 qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4 firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1 swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61 [root@jouet bpf]# Signed-off-by: Joe Stringer <joe@ovn.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf diff: Do not overwrite valid build idKan Liang2016-12-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes a perf diff regression issue which was introduced by commit 5baecbcd9c9a ("perf symbols: we can now read separate debug-info files based on a build ID") The binary name could be same when perf diff different binaries. Build id is used to distinguish between them. However, the previous patch assumes the same binary name has same build id. So it overwrites the build id according to the binary name, regardless of whether the build id is set or not. Check the has_build_id in dso__load. If the build id is already set, use it. Before the fix: $ perf diff 1.perf.data 2.perf.data # Event 'cycles' # # Baseline Delta Shared Object Symbol # ........ ....... ................ ............................. # 99.83% -99.80% tchain_edit [.] f2 0.12% +99.81% tchain_edit [.] f3 0.02% -0.01% [ixgbe] [k] ixgbe_read_reg After the fix: $ perf diff 1.perf.data 2.perf.data # Event 'cycles' # # Baseline Delta Shared Object Symbol # ........ ....... ................ ............................. # 99.83% +0.10% tchain_edit [.] f3 0.12% -0.08% tchain_edit [.] f2 Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> CC: Dima Kogan <dima@secretsauce.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 5baecbcd9c9a ("perf symbols: we can now read separate debug-info files based on a build ID") Link: http://lkml.kernel.org/r/1481642984-13593-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf annotate: Don't throw error for zero length symbolsRavi Bangoria2016-12-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'perf report --tui' exits with error when it finds a sample of zero length symbol (i.e. addr == sym->start == sym->end). Actually these are valid samples. Don't exit TUI and show report with such symbols. Reported-and-Tested-by: Anton Blanchard <anton@samba.org> Link: https://lkml.org/lkml/2016/10/8/189 Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Riyder <chris.ryder@arm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org # v4.9+ Link: http://lkml.kernel.org/r/1479804050-5028-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf bench futex: Fix lock-pi help stringDavidlohr Bueso2016-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Obvious copy/paste typo from the requeue program. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dbueso@suse.de> Link: http://lkml.kernel.org/r/1481830584-30909-1-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf trace: Check if MAP_32BIT is defined (again)Jiri Olsa2016-12-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There might be systems where MAP_32BIT is not defined, like some some RHEL7 powerpc versions. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Kyle McMartin <kyle@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 256763b01741 ("perf trace beauty mmap: Add more conditional defines") Link: http://lkml.kernel.org/r/1481831814-23683-1-git-send-email-jolsa@kernel.org [ Changed the Fixme cset to the one removing the conditional switch case for MAP_32BIT ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | samples/bpf: Make perf_event_read() staticArnaldo Carvalho de Melo2016-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing Joe's conversion of samples/bpf/ to use tools/lib/bpf/ I noticed some warnings building samples/bpf/ on a Fedora Rawhide container, with clang/llvm 3.9 I noticed this: [root@1e797fdfbf4f linux]# make -j4 O=/tmp/build/linux/ samples/bpf/ make[1]: Entering directory '/tmp/build/linux' CHK include/config/kernel.release GEN ./Makefile CHK include/generated/uapi/linux/version.h Using /git/linux as source for kernel <SNIP> HOSTCC samples/bpf/trace_output_user.o /git/linux/samples/bpf/trace_output_user.c:64:6: warning: no previous prototype for 'perf_event_read' [-Wmissing-prototypes] void perf_event_read(print_fn fn) ^~~~~~~~~~~~~~~ HOSTLD samples/bpf/trace_output make[1]: Leaving directory '/tmp/build/linux' Shut up the compiler by making that function static. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@fb.com> Cc: Joe Stringer <joe@ovn.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161215152927.GC6866@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | samples/bpf: Make samples more libbpf-centricJoe Stringer2016-12-1530-109/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch all of the sample code to use the function names from tools/lib/bpf so that they're consistent with that, and to declare their own log buffers. This allow the next commit to be purely devoted to getting rid of the duplicate library in samples/bpf. Committer notes: Testing it: On a fedora rawhide container, with clang/llvm 3.9, sharing the host linux kernel git tree: # make O=/tmp/build/linux/ headers_install # make O=/tmp/build/linux -C samples/bpf/ Since I forgot to make it privileged, just tested it outside the container, using what it generated: # uname -a Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/ # ls -la offwaketime -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime # file offwaketime offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped # readelf -SW offwaketime_kern.o | grep PROGBITS [ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4 [ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8 [ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8 [ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4 [ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1 [ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4 # ./offwaketime | head -5 swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106 CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2 Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5 firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13 JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2 # Signed-off-by: Joe Stringer <joe@ovn.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Wang Nan <wangnan0@huawei.com> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | tools lib bpf: Add flags to bpf_create_map()Joe Stringer2016-12-153-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6c905981743 ("bpf: pre-allocate hash map elements") introduces map_flags to bpf_attr for BPF_MAP_CREATE command. Expose this new parameter in libbpf. By exposing it, users can access flags such as whether or not to preallocate the map. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: http://lkml.kernel.org/r/20161209024620.31660-4-joe@ovn.org [ Added clarifying comment made by Wang Nan ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | tools lib bpf: use __u32 from linux/types.hJoe Stringer2016-12-152-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes the following issue when building without access to 'u32' type: ./tools/lib/bpf/bpf.h:27:23: error: unknown type name ‘u32’ Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: http://lkml.kernel.org/r/20161209024620.31660-3-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.hJoe Stringer2016-12-151-229/+364
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tools version of this header is out of date; update it to the latest version from the kernel headers. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: http://lkml.kernel.org/r/20161209024620.31660-2-joe@ovn.org [ Sync it harder, after merging with what was in net-next via perf/urgent via torvalds/master to get BPG_PROG_(AT|DE)TACH, etc ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf annotate: Fix jump target outside of function address rangeRavi Bangoria2016-12-153-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If jump target is outside of function range, perf is not handling it correctly. Especially when target address is lesser than function start address, target offset will be negative. But, target address declared to be unsigned, converts negative number into 2's complement. See below example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is lesser than function start address(34cf0). 34ac0 - 34cf0 = -0x230 = 0xfffffffffffffdd0 Objdump output: 0000000000034cf0 <__sigaction>: __GI___sigaction(): 34cf0: lea -0x20(%rdi),%eax 34cf3: cmp -bashx1,%eax 34cf6: jbe 34d00 <__sigaction+0x10> 34cf8: jmpq 34ac0 <__GI___libc_sigaction> 34cfd: nopl (%rax) 34d00: mov 0x386161(%rip),%rax # 3bae68 <_DYNAMIC+0x2e8> 34d07: movl -bashx16,%fs:(%rax) 34d0e: mov -bashxffffffff,%eax 34d13: retq perf annotate before applying patch: __GI___sigaction /usr/lib64/libc-2.22.so lea -0x20(%rdi),%eax cmp -bashx1,%eax v jbe 10 v jmpq fffffffffffffdd0 nop 10: mov _DYNAMIC+0x2e8,%rax movl -bashx16,%fs:(%rax) mov -bashxffffffff,%eax retq perf annotate after applying patch: __GI___sigaction /usr/lib64/libc-2.22.so lea -0x20(%rdi),%eax cmp -bashx1,%eax v jbe 10 ^ jmpq 34ac0 <__GI___libc_sigaction> nop 10: mov _DYNAMIC+0x2e8,%rax movl -bashx16,%fs:(%rax) mov -bashxffffffff,%eax retq Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1480953407-7605-3-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf annotate: Support jump instruction with target as second operandRavi Bangoria2016-12-151-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Architectures like PowerPC have jump instructions that includes a target address as a second operand. For example, 'bne cr7,0xc0000000000f6154'. Add support for such instruction in perf annotate. objdump o/p: c0000000000f6140: ld r9,1032(r31) c0000000000f6144: cmpdi cr7,r9,0 c0000000000f6148: bne cr7,0xc0000000000f6154 c0000000000f614c: ld r9,2312(r30) c0000000000f6150: std r9,1032(r31) c0000000000f6154: ld r9,88(r31) Corresponding perf annotate o/p: Before patch: ld r9,1032(r31) cmpdi cr7,r9,0 v bne 3ffffffffff09f2c ld r9,2312(r30) std r9,1032(r31) 74: ld r9,88(r31) After patch: ld r9,1032(r31) cmpdi cr7,r9,0 v bne 74 ld r9,2312(r30) std r9,1032(r31) 74: ld r9,88(r31) Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1480953407-7605-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf record: Force ignore_missing_thread for uid optionJiri Olsa2016-12-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable perf_evsel::ignore_missing_thread for -u option to ignore complete failure if any of the user's processes die between its enumeration and time we open the event. Committer notes: While doing a 'make -j4 allmodconfig' we sometimes get into the race: Before: # perf record -u acme Error: The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:ppp). /bin/dmesg may provide additional information. No CONFIG_PERF_EVENTS=y kernel support configured? # After: [root@jouet ~]# perf record -u acme WARNING: Ignored open failure for pid 9888 WARNING: Ignored open failure for pid 18059 [root@jouet ~]# Which is an improvement, with the races not preventing the remaining threads for the specified user from being monitored, but the message probably needs further clarification. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1481538943-21874-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf evsel: Allow to ignore missing pidJiri Olsa2016-12-153-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding perf_evsel::ignore_missing_cpu_thread bool. When set true, it allows perf to ignore error of missing pid of perf event syscall. We remove missing thread id from the thread_map, so the rest of the processing like ioctl and mmap won't get disturbed with -1 fd. The reason for supporting this is to ease up monitoring group of pids, that 'disappear' before perf opens their event. This currently leads perf to report error and exit and makes perf record's -u option unusable under certain setup. With this change we will allow this race and ignore such failure with following warning: WARNING: Ignored open failure for pid 8605 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161213074622.GA3084@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf thread_map: Add thread_map__remove functionJiri Olsa2016-12-155-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add thread_map__remove function to remove thread from thread map. Add automated test also. Committer notes: Testing it: # perf test "Remove thread map" 39: Remove thread map : Ok # perf test -v "Remove thread map" 39: Remove thread map : --- start --- test child forked, pid 4483 2 threads: 4482, 4483 1 thread: 4483 0 thread: test child finished with 0 ---- end ---- Remove thread map: Ok # Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1481538943-21874-4-git-send-email-jolsa@kernel.org [ Added stdlib.h, to get the free() declaration ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf evsel: Use variable instead of repeating lengthy FD macroJiri Olsa2016-12-151-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's more readable and will ease up following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1481538943-21874-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf mem: Fix --all-user/--all-kernel optionsJiri Olsa2016-12-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removing extra '--' prefix. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: ad16511b0e40 ("perf mem: Add -U/-K (--all-user/--all-kernel) options") Link: http://lkml.kernel.org/r/1481538943-21874-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf tools: Remove some needless __maybe_unusedArnaldo Carvalho de Melo2016-12-153-11/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I.e. those parameters/functions _are_ used, so ditch that misleading attribute. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-13cqtjh0yojg5gzvpq1zzpl0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf sched timehist: Show callchains for idle statNamhyung Kim2016-12-151-0/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When --idle-hist option is used with --summary, it now shows idle stats with callchains like below: Idle stats by callchain: CPU 0: 902.195 msec Idle time (msec) Count Callchains ---------------- ------- -------------------------------------------------- 370.589 69 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath 178.799 17 worker_thread <- kthread <- ret_from_fork 128.352 17 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork 125.111 19 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select 71.599 50 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll 23.146 1 rcu_gp_kthread <- kthread <- ret_from_fork 4.510 1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64 0.085 1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- do_restart_poll ... Committer notes: Extra testing: # uname -a Linux jouet 4.8.8-300.fc25.x86_64 #1 SMP Tue Nov 15 18:10:06 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 1) Run 'perf sched record -g' 2) Run 'perf sched timehist --idle --summary' <SNIP> Idle stats by callchain: CPU 0: 13456.840 msec Idle time (msec) Count Callchains ---------------- ----- -------------------------------------------------- 5386.637 3283 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll 2750.238 2299 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- do_syscall_64 1275.672 1287 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- entry_SYSCALL_64_fastpath 936.322 452 worker_thread <- kthread <- ret_from_fork 741.311 385 rcu_nocb_kthread <- kthread <- ret_from_fork 729.385 248 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_ppoll 365.386 229 irq_thread <- kthread <- ret_from_fork 338.934 265 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath 219.488 201 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork 186.839 410 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64 142.541 59 kvm_vcpu_block <- kvm_arch_vcpu_ioctl_run <- kvm_vcpu_ioctl <- do_vfs_ioctl <- sys_ioctl 83.887 92 smpboot_thread_fn <- kthread <- ret_from_fork 62.722 96 do_exit <- do_group_exit <- 0x2a5594 <- entry_SYSCALL_64_fastpath 47.894 83 pipe_wait <- pipe_read <- __vfs_read <- vfs_read <- sys_read 46.554 61 rcu_gp_kthread <- kthread <- ret_from_fork 34.337 21 schedule_timeout <- intel_fbc_work_fn <- process_one_work <- worker_thread <- kthread 29.521 14 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select 20.274 10 schedule_timeout <- io_schedule_timeout <- bit_wait_io <- __wait_on_bit <- out_of_line_wait_on_bit 15.085 55 schedule_timeout <- unix_stream_read_generic <- unix_stream_recvmsg <- sock_recvmsg <- SYSC_recvfrom <SNIP> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>