bwlp/qemu.git - Experimental fork of QEMU with video encoding patches

	Commit message (Collapse)	Author	Age	Files	Lines
*	tcg/i386: Implement bitsel for avx512	Richard Henderson	2022-03-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The general ternary logic operation can implement BITSEL. Funnel the 4-operand operation into three variants of the 3-operand instruction, depending on input operand overlap. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Implement more logical operations for avx512	Richard Henderson	2022-03-04	1	-5/+5
\| \| \| \| \| \| \| \| \|	AVX512VL has a general ternary logic operation, VPTERNLOGQ, which can implement NOT, ORC, NAND, NOR, EQV. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Implement avx512 variable rotate	Richard Henderson	2022-03-04	1	-1/+1
\| \| \| \| \| \| \| \|	AVX512VL has VPROLVD and VPRORVQ. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Implement avx512 immediate rotate	Richard Henderson	2022-03-04	1	-1/+1
\| \| \| \| \| \| \|	AVX512VL has VPROLD and VPROLQ, layered onto the same opcode as PSHIFTD, but requires EVEX encoding and W1. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Detect AVX512	Richard Henderson	2022-03-04	1	-0/+4
\| \| \| \| \| \| \| \| \|	There are some operation sizes in some subsets of AVX512 that are missing from previous iterations of AVX. Detect them. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Add opcodes for vector nand, nor, eqv	Richard Henderson	2022-03-04	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	We've had placeholders for these opcodes for a while, and should have support on ppc, s390x and avx512 hosts. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Support raising sigbus for user-only	Richard Henderson	2022-02-08	1	-2/+0
\| \| \| \| \|	Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Remove TCG_TARGET_HAS_goto_ptr	Richard Henderson	2021-07-10	1	-1/+0
\| \| \| \| \| \| \| \|	Since 6eea04347eb6, all tcg backends support goto_ptr. Remove the conditional, making support mandatory. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Move MAX_CODE_GEN_BUFFER_SIZE to tcg-target.h	Richard Henderson	2021-06-11	1	-0/+2
\| \| \| \| \| \| \| \| \|	Remove the ifdef ladder and move each define into the appropriate header file. Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Remove TCG_TARGET_SUPPORT_MIRROR	Richard Henderson	2021-01-07	1	-1/+0
\| \| \| \| \| \| \| \|	Now that all native tcg hosts support splitwx, remove the define. Replace the one use with a test for CONFIG_TCG_INTERPRETER. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Support split-wx code generation	Richard Henderson	2021-01-07	1	-1/+1
\| \| \| \| \|	Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Add --accel tcg,split-wx property	Richard Henderson	2021-01-07	1	-0/+1
\| \| \| \| \| \| \| \| \|	Plumb the value through to alloc_code_gen_buffer. This is not supported by any os or tcg backend, so for now enabling it will result in an error. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Adjust tb_target_set_jmp_target for split-wx	Richard Henderson	2021-01-07	1	-3/+3
\| \| \| \| \| \| \|	Pass both rx and rw addresses to tb_target_set_jmp_target. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Introduce INDEX_op_qemu_st8_i32	Richard Henderson	2021-01-07	1	-0/+3
\| \| \| \| \| \| \| \| \|	Enable this on i386 to restrict the set of input registers for an 8-bit store, as required by the architecture. This removes the last use of scratch registers for user-only mode. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Adjust TCG_TARGET_HAS_MEMORY_BSWAP	Richard Henderson	2021-01-07	1	-1/+2
\| \| \| \| \| \| \| \|	Always true when movbe is available, otherwise leave this to generic code. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	util: Extract flush_icache_range to cacheflush.c	Richard Henderson	2021-01-02	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This has been a tcg-specific function, but is also in use by hardware accelerators via physmem.c. This can cause link errors when tcg is disabled. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Joelle van Dyne <j@getutm.app> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20201214140314.18544-3-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	tcg: Remove TCG_TARGET_HAS_cmp_vec	Richard Henderson	2020-10-08	1	-1/+0
\| \| \| \| \| \| \|	The cmp_vec opcode is mandatory; this symbol is unused. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	qemu/atomic.h: rename atomic_ to qatomic_	Stefan Hajnoczi	2020-09-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
*	tcg: Implement gvec support for rotate by scalar	Richard Henderson	2020-06-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	No host backend support yet, but the interfaces for rotls are in place. Only implement left-rotate for now, as the only known use of vector rotate by scalar is s390x, so any right-rotate would be unused and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Implement gvec support for rotate by vector	Richard Henderson	2020-06-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	No host backend support yet, but the interfaces for rotlv and rotrv are in place. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Drop the generic expansion from rot to shift; we can do better for each backend, and then this code becomes unused.
*	tcg: Implement gvec support for rotate by immediate	Richard Henderson	2020-06-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	No host backend support yet, but the interfaces for rotli are in place. Canonicalize immediate rotate to the left, based on a survey of architectures, but provide both left and right shift interfaces to the translators. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Search includes from the project root source directory	Philippe Mathieu-Daudé	2020-01-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently search both the root and the tcg/ directories for tcg files: $ git grep '#include "tcg/' \| wc -l 28 $ git grep '#include "tcg[^/]' \| wc -l 94 To simplify the preprocessor search path, unify by expliciting the tcg/ directory. Patch created mechanically by running: $ for x in \ tcg.h tcg-mo.h tcg-op.h tcg-opc.h \ tcg-op-gvec.h tcg-gvec-desc.h; do \ sed -i "s,#include \"$x\",#include \"tcg/$x\"," \ $(git grep -l "#include \"$x\""); \ done Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200101112303.20724-2-philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Support vector comparison select value	Richard Henderson	2019-05-22	1	-1/+1
\| \| \| \| \| \| \| \|	We already had backend support for this feature. Expand the new cmpsel opcode using vpblendb. The combination allows us to avoid an extra NOT for some comparison codes. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Add support for vector compare select	Richard Henderson	2019-05-22	1	-0/+1
\| \| \| \| \| \| \| \| \|	Perform a per-element conditional move. This combination operation is easier to implement on some host vector units than plain cmp+bitsel. Omit the usual gvec interface, as this is intended to be used by target-specific gvec expansion call-backs. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Add support for vector bitwise select	Richard Henderson	2019-05-22	1	-0/+1
\| \| \| \| \| \| \|	This operation performs d = (b & a) \| (c & ~a), and is present on a majority of host vector units. Include gvec expanders. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Support vector absolute value	Richard Henderson	2019-05-14	1	-1/+1
\| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Add support for vector absolute value	Richard Henderson	2019-05-14	1	-0/+1
\| \| \| \| \|	Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Support vector scalar shift opcodes	Richard Henderson	2019-05-14	1	-1/+1
\| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Support vector variable shift opcodes	Richard Henderson	2019-05-14	1	-1/+1
\| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Support INDEX_op_extract2_{i32,i64}	Richard Henderson	2019-04-24	1	-2/+2
\| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Add INDEX_op_extract2_{i32,i64}	Richard Henderson	2019-04-24	1	-0/+2
\| \| \| \| \| \| \|	This will let backends implement the double-word shift operation. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	cputlb: Remove static tlb sizing	Richard Henderson	2019-01-28	1	-1/+0
\| \| \| \| \| \| \| \|	Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB, remove the define and the old code. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: enable dynamic TLB sizing	Emilio G. Cota	2019-01-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As the following experiments show, this series is a net perf gain, particularly for memory-heavy workloads. Experiments are run on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz. 1. System boot + shudown, debian aarch64: - Before (v3.1.0): Performance counter stats for './die.sh v3.1.0' (10 runs): 9019.797015 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 29,910,312,379 cycles # 3.316 GHz ( +- 0.14% ) 54,699,252,014 instructions # 1.83 insn per cycle ( +- 0.08% ) 10,061,951,686 branches # 1115.541 M/sec ( +- 0.08% ) 172,966,530 branch-misses # 1.72% of all branches ( +- 0.07% ) 9.084039051 seconds time elapsed ( +- 0.23% ) - After: Performance counter stats for './die.sh tlb-dyn-v5' (10 runs): 8624.084842 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 28,556,123,404 cycles # 3.311 GHz ( +- 0.13% ) 51,755,089,512 instructions # 1.81 insn per cycle ( +- 0.05% ) 9,526,513,946 branches # 1104.641 M/sec ( +- 0.05% ) 166,578,509 branch-misses # 1.75% of all branches ( +- 0.19% ) 8.680540350 seconds time elapsed ( +- 0.24% ) That is, a 4.4% perf increase. 2. System boot + shutdown, ubuntu 18.04 x86_64: - Before (v3.1.0): 56100.574751 task-clock (msec) # 1.016 CPUs utilized ( +- 4.81% ) 200,745,466,128 cycles # 3.578 GHz ( +- 5.24% ) 431,949,100,608 instructions # 2.15 insn per cycle ( +- 5.65% ) 77,502,383,330 branches # 1381.490 M/sec ( +- 6.18% ) 844,681,191 branch-misses # 1.09% of all branches ( +- 3.82% ) 55.221556378 seconds time elapsed ( +- 5.01% ) - After: 56603.419540 task-clock (msec) # 1.019 CPUs utilized ( +- 10.19% ) 202,217,930,479 cycles # 3.573 GHz ( +- 10.69% ) 439,336,291,626 instructions # 2.17 insn per cycle ( +- 14.14% ) 80,538,357,447 branches # 1422.853 M/sec ( +- 16.09% ) 776,321,622 branch-misses # 0.96% of all branches ( +- 3.77% ) 55.549661409 seconds time elapsed ( +- 10.44% ) No improvement (within noise range). Note that for this workload, increasing the time window too much can lead to perf degradation, since it flushes the TLB very frequently. 3. x86_64 SPEC06int: x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 5.5 +------------------------------------------------------------------------+ \| +-+ \| 5 \|-+.................+-+...............................tlb-dyn-v5.......+-\| \| * * \| 4.5 \|-+..................................................................+-\| \| * * \| 4 \|-+..................................................................+-\| \| * * \| 3.5 \|-+..................................................................+-\| \| * * \| 3 \|-+......+-+........................................................+-\| \| * * * \| 2.5 \|-+.................................................+-+...........+-\| \| * * * * * \| 2 \|-+..............................................................+-\| \| * * * * * * +-+ \| 1.5 \|-+....................................................+-+.+-+.+-\| \| * * +-+ * +-+ +-+ +-+ +-+ * * * * * \| 1 \|++++-++++++++++++++++-+++-+++-++++-++++-++++++++++++++\| \| * * * * * * * * * * * * * * * * * * * * * * * * * * \| 0.5 +------------------------------------------------------------------------+ 400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean png: https://imgur.com/YRF90f7 That is, a 1.51x average speedup over the baseline, with a max speedup of 5.17x. Here's a different look at the SPEC06int results, using KVM as the baseline: x86_64-softmmu slowdown vs. KVM for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 25 +---------------------------------------------------------------------------+ \| +-+ +-+ \| \| * * +-+ v3.1.0 \| \| * * +-+ tlb-dyn-v5 \| \| * * * * +-+ \| 20 \|-+................................................+-+...............+-\| \| * * # # * * \| \| +-+ * * * # # * * \| \| * * * * * # # * * \| 15 \|-+..............................................#.#.......+-+......+-\| \| * * * * * # # * #\|# \| \| * * * * +-+ * # # * +-+ \| \| * * +-+ * * ++-+ +-+ * # # * # # +-+ \| \| * * +-+ * * * ## \| +-+ # # * # # +-+ \| 10 \|-+..........+-+...........##.......++-+..+-+.#.#.......#.#....+-\| \| * * +-+ * * * ## +-+ # # # #* # # +-+ * # # * * \| \| * * * # # * * +-+ * ## * +-+ # # # #* # # * * * # # +-+ \| \| * * # # * * * +-+ * ## * # # # # # #* # # * * * # # * ## \| 5 \|-+.......+-+.#.#.....#.#..##..#.#.#.#..#.#.#.#.....#.#..##.+-\| \| * # #* # # * +-+* # # * ## * # # # # # #* # # * * * # # * ## \| \| * # #* # # * # #* # # * ## * # # # # # #* # # * +-+* # # * ## \| \| ++-+ * # #* # # * # #* # # * ## * # # # # # #* # # * # #* # # * ## \| \|+++#+#++#+#+#+#++#+#+#+#++##++#+#+#+#++#+#+#+#++#+#+#+#+*+##+++\| 0 +---------------------------------------------------------------------------+ 400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean png: https://imgur.com/YzAMNEV After this series, we bring down the average SPEC06int slowdown vs KVM from 11.47x to 7.58x. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-4-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: introduce dynamic TLB sizing	Emilio G. Cota	2019-01-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Disabled in all TCG backends for now. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-3-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Implement vector minmax arithmetic	Richard Henderson	2019-01-28	1	-1/+1
\| \| \| \| \| \| \|	The avx instruction set does not directly provide MO_64. We can still implement 64-bit with comparison and vpblendvb. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Implement vector saturating arithmetic	Richard Henderson	2019-01-28	1	-1/+1
\| \| \| \| \| \| \|	Only MO_8 and MO_16 are implemented, since that's all the instruction set provides. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Add opcodes for vector minmax arithmetic	Richard Henderson	2019-01-28	1	-0/+1
\| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Add opcodes for vector saturated arithmetic	Richard Henderson	2019-01-28	1	-0/+1
\| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP	Richard Henderson	2018-12-17	1	-0/+2
\| \| \| \| \| \| \| \|	For now, defined universally as true, since we previously required backends to implement swapped memory operations. Future patches may now remove that support where it is onerous. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests	Richard Henderson	2018-12-17	1	-2/+3
\| \| \| \| \| \| \| \|	This preserves the invariant that all TCG_TYPE_I32 values are zero-extended in the 64-bit host register. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Move TCG_REG_CALL_STACK from define to enum	Richard Henderson	2018-12-17	1	-1/+1
\| \| \| \| \| \|	Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Always use %ebp for TCG_AREG0	Richard Henderson	2018-12-17	1	-6/+2
\| \| \| \| \| \| \| \| \| \|	For x86_64, this can remove a REX prefix resulting in smaller code when manipulating globals of type i32, as we move them between backing store via cpu_env, aka TCG_AREG0. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Add vector operations	Richard Henderson	2018-02-08	1	-2/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The x86 vector instruction set is extremely irregular. With newer editions, Intel has filled in some of the blanks. However, we don't get many 64-bit operations until SSE4.2, introduced in 2009. The subsequent edition was for AVX1, introduced in 2011, which added three-operand addressing, and adjusts how all instructions should be encoded. Given the relatively narrow 2 year window between possible to support and desirable to support, and to vastly simplify code maintainence, I am only planning to support AVX1 and later cpus. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg/i386: Store out-of-range call targets in constant pool	Richard Henderson	2017-09-07	1	-0/+1
\| \| \| \| \| \| \|	Already it saves 2 bytes per call, but also the constant pool entry may well be shared across multiple calls. Signed-off-by: Richard Henderson <rth@twiddle.net>
*	tcg: Rearrange ldst label tracking	Richard Henderson	2017-09-07	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Dispense with TCGBackendData, as it has never been used for more than holding a single pointer. Use a define in the cpu/tcg-target.h to signal requirement for TCGLabelQemuLdst, so that we can drop the no-op tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
*	tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h	Richard Henderson	2017-09-07	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional function tb_target_set_jmp_target. While we're touching all backends, add a parameter for tb->tc_ptr; we're going to need it shortly for some backends. Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c. This opens the possibility for TCG_TARGET_HAS_direct_jump to be a runtime decision -- based on host cpu capabilities, the size of code_gen_buffer, or a future debugging switch. Signed-off-by: Richard Henderson <rth@twiddle.net>
*	tcg/i386: implement goto_ptr	Emilio G. Cota	2017-06-05	1	-1/+1
\| \| \| \| \| \| \| \| \|	Suggested-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-6-git-send-email-cota@braap.org> [rth: Reuse goto_ptr epilogue for exit_tb 0.] Signed-off-by: Richard Henderson <rth@twiddle.net>
*	tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr	Emilio G. Cota	2017-06-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of exporting goto_ptr directly to TCG frontends, export tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer returned by the lookup_tb_ptr() helper. This is the only use case we have for goto_ptr and lookup_tb_ptr, so having this function is very convenient. Furthermore, it trivially allows us to avoid calling the lookup helper if goto_ptr is not implemented by the backend. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-5-git-send-email-cota@braap.org> [rth: Squashed 4 related commits.] Signed-off-by: Richard Henderson <rth@twiddle.net>
*	tcg: enable MTTCG by default for ARM on x86 hosts	Alex Bennée	2017-02-24	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enables the multi-threaded system emulation by default for ARMv7 and ARMv8 guests using the x86_64 TCG backend. This is because on the guest side: - The ARM translate.c/translate-64.c have been converted to - use MTTCG safe atomic primitives - emit the appropriate barrier ops - The ARM machine has been updated to - hold the BQL when modifying shared cross-vCPU state - defer powerctl changes to async safe work All the host backends support the barrier and atomic primitives but need to provide same-or-better support for normal load/store operations. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Acked-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Pranith Kumar <bobby.prani@gmail.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
*	tcg/i386: Handle ctpop opcode	Richard Henderson	2017-01-10	1	-2/+3
\| \| \| \|	Signed-off-by: Richard Henderson <rth@twiddle.net>