bwlp/qemu.git - Experimental fork of QEMU with video encoding patches

	Commit message (Collapse)	Author	Age	Files	Lines
*	tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK	Emilio G. Cota	2017-10-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state	Lluís Vilanova	2017-07-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Every vCPU now uses a separate set of TBs for each set of dynamic tracing event state values. Each set of TBs can be used by any number of vCPUs to maximize TB reuse when vCPUs have the same tracing state. This feature is later used by tracetool to optimize tracing of guest code events. The maximum number of TB sets is defined as 2^E, where E is the number of events that have the 'vcpu' property (their state is stored in CPUState->trace_dstate). For this to work, a change on the dynamic tracing state of a vCPU will force it to flush its virtual TB cache (which is only indexed by address), and fall back to the physical TB cache (which now contains the vCPU's dynamic tracing state as part of the hashing function). Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-id: 149915775266.6295.10060144081246467690.stgit@frigg.lan Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
*	qht-bench: relax test_start/stop atomic accesses	Emilio G. Cota	2016-10-24	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	test_start/stop are used only as flags to loop on. Barriers are unnecessary, since no dependent data is transferred among threads apart from the flags themselves. This commit relaxes the three accesses to test_start/stop that were not yet relaxed. Signed-off-by: Emilio G. Cota <cota@braap.org>
*	clean-includes: run it once more	Paolo Bonzini	2016-06-16	1	-1/+0
\| \| \| \|	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	qht: add qht-bench, a performance benchmark	Emilio G. Cota	2016-06-12	1	-0/+488
	This serves as a performance benchmark as well as a stress test for QHT. We can tweak quite a number of things, including the number of resize threads and how frequently resizes are triggered. A performance comparison of QHT vs CLHT[1] and ck_hs[2] using this same benchmark program can be found here: http://imgur.com/a/0Bms4 The tests are run on a 64-core AMD Opteron 6376, pinning threads to cores favoring same-socket cores. For each run, qht-bench is invoked with: $ tests/qht-bench -d $duration -n $n -u $u -g $range , where $duration is in seconds, $n is the number of threads, $u is the update rate (0.0 to 100.0), and $range is the number of keys. Note that ck_hs's performance drops significantly as writes go up, since it requires an external lock (I used a ck_spinlock) around every write. Also, note that CLHT instead of using a seqlock, relies on an allocator that does not ever return the same address during the same read-critical section. This gives it a slight performance advantage over QHT on read-heavy workloads, since the seqlock writes aren't there. [1] CLHT: https://github.com/LPD-EPFL/CLHT https://infoscience.epfl.ch/record/207109/files/ascy_asplos15.pdf [2] ck_hs: http://concurrencykit.org/ http://backtrace.io/blog/blog/2015/03/13/workload-specialization/ A few of those plots are shown in text here, since that site might not be online forever. Throughput is on Mops/s on the Y axis. 200K keys, 0 % updates 450 ++--+------+------+-------+-------+-------+-------+------+-------+--++ \| + + + + + + + + +N+ \| 400 ++ ---+E+ ++ \| +++---- \| 350 ++ 9 ++------+------++ --+E+ -+H+ ++ \| \| +H+- \| -+N+---- ---- +++ \| 300 ++ 8 ++ +E+ ++ -----+E+ --+H+ ++ \| \| +++ \| -+N+-----+H+-- \| 250 ++ 7 ++------+------++ +++-----+E+---- ++ 200 ++ 1 -+E+-----+H+ ++ \| ---- qht +-E--+ \| 150 ++ -+E+ clht +-H--+ ++ \| ---- ck +-N--+ \| 100 ++ +E+ ++ \| ---- \| 50 ++ -+E+ ++ \| +E+E+ + + + + + + + + \| 0 ++--E------+------+-------+-------+-------+-------+------+-------+--++ 1 8 16 24 32 40 48 56 64 Number of threads 200K keys, 1 % updates 350 ++--+------+------+-------+-------+-------+-------+------+-------+--++ \| + + + + + + + + -+E+ \| 300 ++ -----+H+ ++ \| +E+-- \| \| 9 ++------+------++ +++---- \| 250 ++ \| +E+ -- \| -+E+ ++ \| 8 ++ -- ++ ---- \| 200 ++ \| +++- \| +++ ---+E+ ++ \| 7 ++------N------++ -+E+-- qht +-E--+ \| \| 1 +++---- clht +-H--+ \| 150 ++ -+E+ ck +-N--+ ++ \| ---- \| 100 ++ +E+ ++ \| ---- \| \| -+E+ \| 50 ++ +H+-+N+----+N+-----+N+------ ++ \| +E+E+ + + + +N+-----+N+-----+N+----+N+-----+N+ \| 0 ++--E------+------+-------+-------+-------+-------+------+-------+--++ 1 8 16 24 32 40 48 56 64 Number of threads 200K keys, 20 % updates 300 ++--+------+------+-------+-------+-------+-------+------+-------+--++ \| + + + + + + + + + \| \| -+H+ \| 250 ++ ---- ++ \| 9 ++------+------++ --+H+ ---+E+ \| \| 8 ++ +H+-- ++ -+H+----+E+-- \| 200 ++ \| +E+ --\| -----+E+-- +++ ++ \| 7 ++ + ---- ++ ---+H+---- +++ qht +-E--+ \| 150 ++ 6 ++------N------++ -+H+-----+E+ clht +-H--+ ++ \| 1 -----+E+-- ck +-N--+ \| \| -+H+---- \| 100 ++ -----+E+ ++ \| +E+-- \| \| ----+++ \| 50 ++ -+E+ ++ \| +E+ +++ \| \| +E+N+-+N+-----+ + + + + + + \| 0 ++--E------+------N-------N-------N-------N-------N------N-------N--++ 1 8 16 24 32 40 48 56 64 Number of threads 200K keys, 100 % updates qht +-E--+ clht +-H--+ 160 ++--+------+------+-------+-------+-------+-------+---ck-+-N-----+--++ \| + + + + + + + + ----H \| 140 ++ +H+-- -+E+ ++ \| +++---- ---- \| 120 ++ 8 ++------+------++ -+H+ +E+ ++ \| 7 ++ +H+---- ++ ---- +++---- \| 100 ++ \| +E+ \| +++ ---+H+ -+E+ ++ \| 6 ++ +++ ++ -+H+-- +++---- \| 80 ++ 5 ++------N----------+E+-----+E+ ++ \| 1 -+H+---- +++ \| \| -----+E+ \| 60 ++ +H+---- +++ ++ \| ----+E+ \| 40 ++ +H+---- ++ \| --+E+ \| 20 ++ +E+ ++ \| +EE+ + + + + + + + + \| 0 ++--+N-N---N------N-------N-------N-------N-------N------N-------N--++ 1 8 16 24 32 40 48 56 64 Number of threads Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-13-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>