bwlp/qemu.git - Experimental fork of QEMU with video encoding patches

	Commit message (Collapse)	Author	Age	Files	Lines
*	qsp: Simplify how qsp_report() prints	Markus Armbruster	2019-04-18	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	qsp_report() takes an fprintf()-like callback and a FILE * to pass to it. Its only caller hmp_sync_profile() passes monitor_fprintf() and the current monitor cast to FILE *. monitor_fprintf() casts it right back, and is otherwise identical to monitor_printf(). The type-punning is ugly. Drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-7-armbru@redhat.com>
*	qsp: support call site coalescing	Emilio G. Cota	2018-08-23	1	-1/+1
\| \| \| \| \|	Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	qsp: add qsp_reset	Emilio G. Cota	2018-08-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I first implemented this by deleting all entries in the global hash table. But doing that safely slows down profiling, since we'd need to introduce rcu_read_lock/unlock in the fast path. What's implemented here avoids messing with the thread-local data in the global hash table. It achieves this by taking a snapshot of the current state, so that subsequent reports present the delta wrt to the snapshot. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	qsp: add sort_by option to qsp_report	Emilio G. Cota	2018-08-23	1	-1/+7
\| \| \| \| \|	Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	qsp: QEMU's Synchronization Profiler	Emilio G. Cota	2018-08-23	1	-0/+22
	The goal of this module is to profile synchronization primitives (i.e. mutexes, recursive mutexes and condition variables) so that scalability issues can be quickly diagnosed. Sync primitives are profiled by QSP based on the vaddr of the object accessed as well as the call site (file:line_nr). That means the same object called from two different call sites will be tracked in separate entries, which might be reported together or separately (see subsequent commit on call site coalescing). Some perf numbers: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz Command: taskset -c 0 tests/atomic_add-bench -d 5 -m - Before: 54.80 Mops/s - After: 54.75 Mops/s That is, a negligible slowdown due to the now indirect call to qemu_mutex_lock. Note that using a branch instead of an indirect call introduces a more severe slowdown (53.65 Mops/s, i.e. 2% slowdown). Enabling the profiler (with -p, added in this series) is more interesting: - No profiling: 54.75 Mops/s - W/ profiling: 12.53 Mops/s That is, a 4.36X slowdown. We can break down this slowdown by removing the get_clock calls or the entry lookup: - No profiling: 54.75 Mops/s - W/o get_clock: 25.37 Mops/s - W/o entry lookup: 19.30 Mops/s - W/ profiling: 12.53 Mops/s Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>