summaryrefslogtreecommitdiffstats
path: root/include/linux/perf_event.h
diff options
context:
space:
mode:
authorFrederic Weisbecker2010-02-03 09:09:33 +0100
committerFrederic Weisbecker2010-02-27 17:06:19 +0100
commitb67577dfb45580c498bfdb1bc76c00c3b2ad6310 (patch)
tree6d3c8a49706fea39d4279382618b84a9e8625951 /include/linux/perf_event.h
parentperf lock: Fix and add misc documentally things (diff)
downloadkernel-qcow2-linux-b67577dfb45580c498bfdb1bc76c00c3b2ad6310.tar.gz
kernel-qcow2-linux-b67577dfb45580c498bfdb1bc76c00c3b2ad6310.tar.xz
kernel-qcow2-linux-b67577dfb45580c498bfdb1bc76c00c3b2ad6310.zip
perf lock: Drop the buffers multiplexing dependency
We need to deal with time ordered events to build a correct state machine of lock events. This is why we multiplex the lock events buffers. But the ordering is done from the kernel, on the tracing fast path, leading to high contention between cpus. Without multiplexing, the events appears in a weak order. If we have four events, each split per cpu, perf record will read the events buffers in the following order: [ CPU0 ev0, CPU0 ev1, CPU0 ev3, CPU0 ev4, CPU1 ev0, CPU1 ev0....] To handle a post processing reordering, we could just read and sort the whole in memory, but it just doesn't scale with high amounts of events: lock events can fill huge amounts in few times. Basically we need to sort in memory and find a "grace period" point when we know that a given slice of previously sorted events can be committed for post-processing, so that we can unload the memory usage step by step and keep a scalable sorting list. There is no strong rules about how to define such "grace period". What does this patch is: We define a FLUSH_PERIOD value that defines a grace period in seconds. We want to have a slice of events covering 2 * FLUSH_PERIOD in our sorted list. If FLUSH_PERIOD is big enough, it ensures every events that occured in the first half of the timeslice have all been buffered and there are none remaining and there won't be further to put inside this first timeslice. Then once we reach the 2 * FLUSH_PERIOD timeslice, we flush the first half to be gentle with the memory (the second half can still get new events in the middle, so wait another period to flush it) FLUSH_PERIOD is defined to 5 seconds. Say the first event started on time t0. We can safely assume that at the time we are processing events of t0 + 10 seconds, ther won't be anymore events to read from perf.data that occured between t0 and t0 + 5 seconds. Hence we can safely flush the first half. To point out funky bugs, we have a guardian that checks a new event timestamp is not below the last event's timestamp flushed and that displays a warning in this case. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com>
Diffstat (limited to 'include/linux/perf_event.h')
0 files changed, 0 insertions, 0 deletions