| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
Enable this on i386 to restrict the set of input registers
for an 8-bit store, as required by the architecture. This
removes the last use of scratch registers for user-only mode.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
| |
Always true when movbe is available, otherwise leave
this to generic code.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has been a tcg-specific function, but is also in use
by hardware accelerators via physmem.c. This can cause
link errors when tcg is disabled.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Joelle van Dyne <j@getutm.app>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20201214140314.18544-3-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
|
|
|
|
|
| |
The cmp_vec opcode is mandatory; this symbol is unused.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
| |
The previous change wrongly stated that 32-bit avx2 should have
used VPBROADCASTW. But that's a 16-bit broadcast and we want a
32-bit broadcast.
Fixes: 7b60ef3264e
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
| |
This wasn't actually used for anything, really. All variable
operands must accept registers, and which are indicated by the
set in TCGArgConstraint.regs.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
| |
The union is unused; let "regs" appear in the main structure
without the "u.regs" wrapping.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:
$ CC=clang CXX=clang++ ./configure ... && make
../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)
Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.
This patch was generated using:
$ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
sort -u >/tmp/changed_identifiers
$ for identifier in $(</tmp/changed_identifiers); do
sed -i "s%\<$identifier\>%q$identifier%g" \
$(git grep -I -l "\<$identifier\>")
done
I manually fixed line-wrap issues and misaligned rST tables.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With Makefiles that have automatically generated dependencies, you
generated includes are set as dependencies of the Makefile, so that they
are built before everything else and they are available when first
building the .c files.
Alternatively you can use a fine-grained dependency, e.g.
target/arm/translate.o: target/arm/decode-neon-shared.inc.c
With Meson you have only one choice and it is a third option, namely
"build at the beginning of the corresponding target"; the way you
express it is to list the includes in the sources of that target.
The problem is that Meson decides if something is a source vs. a
generated include by looking at the extension: '.c', '.cc', '.m', '.C'
are sources, while everything else is considered an include---including
'.inc.c'.
Use '.c.inc' to avoid this, as it is consistent with our other convention
of using '.rst.inc' for included reStructuredText files. The editorconfig
file is adjusted.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
|
|
|
|
|
|
| |
For immediates, we must continue the special casing of 8-bit
elements. The other element sizes and shift types are trivially
implemented with shifts.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
| |
No host backend support yet, but the interfaces for rotls
are in place. Only implement left-rotate for now, as the
only known use of vector rotate by scalar is s390x, so any
right-rotate would be unused and untestable.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
| |
No host backend support yet, but the interfaces for rotlv
and rotrv are in place.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Drop the generic expansion from rot to shift; we can do better
for each backend, and then this code becomes unused.
|
|
|
|
|
|
|
|
|
|
| |
No host backend support yet, but the interfaces for rotli
are in place. Canonicalize immediate rotate to the left,
based on a survey of architectures, but provide both left
and right shift interfaces to the translators.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
| |
When %gs cannot be used, we use register offset addressing.
This path is almost never used, so it was clearly not tested.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200406174803.8192-1-richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
|
|
|
|
|
|
|
|
| |
We were only constructing the 64-bit element, and not
replicating the 64-bit element across the rest of the vector.
Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A given RISU testcase for SVE can produce
tcg-op-vec.c:511: do_shifti: Assertion `i >= 0 && i < (8 << vece)' failed.
because expand_vec_sari gave a shift count of 32 to a MO_32
vector shift.
In 44f1441dbe1, we changed from direct expansion of vector opcodes
to re-use of the tcg expanders. So while the comment correctly notes
that the hw will handle such a shift count, we now have to take our
own sanity checks into account. Which is easy in this particular case.
Fixes: 44f1441dbe1
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All the *.inc.c files included by tcg/$TARGET/tcg-target.inc.c
are in tcg/, their parent directory. To simplify the preprocessor
search path, include the relative parent path: '..'.
Patch created mechanically by running:
$ for x in tcg-pool.inc.c tcg-ldst.inc.c; do \
sed -i "s,#include \"$x\",#include \"../$x\"," \
$(git grep -l "#include \"$x\""); \
done
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200101112303.20724-3-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently search both the root and the tcg/ directories for tcg
files:
$ git grep '#include "tcg/' | wc -l
28
$ git grep '#include "tcg[^/]' | wc -l
94
To simplify the preprocessor search path, unify by expliciting the
tcg/ directory.
Patch created mechanically by running:
$ for x in \
tcg.h tcg-mo.h tcg-op.h tcg-opc.h \
tcg-op-gvec.h tcg-gvec-desc.h; do \
sed -i "s,#include \"$x\",#include \"tcg/$x\"," \
$(git grep -l "#include \"$x\""); \
done
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200101112303.20724-2-philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the copyright/license boilerplate for tcg/i386/tcg-target.opc.h.
This file has had only one commit, 770c2fc7bb70804a, by
a Linaro engineer.
The license is MIT, since that's what the rest of tcg/i386/ is.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20191025155848.17362-3-peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Preparation for collapsing the two byte swaps, adjust_endianness and
handle_bswap, along the I/O path.
Target dependant attributes are conditionalized upon NEED_CPU_H.
Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <81d9cd7d7f5aaadfa772d6c48ecee834e9cf7882.1566466906.git.tony.nguyen@bt.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
| |
We have for some time had code within the tcg backends to
handle large positive offsets from env. This move makes
sure that need not happen. Indeed, we are able to assert
at build time that simple offsets suffice for all hosts.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
| |
Move all softmmu tlb data into this structure. Arrange the
members so that we are able to place mask+table together and
at a smaller absolute offset from ENV.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This instruction raises #GP, aka SIGSEGV, if the effective address
is not aligned to 16-bytes.
We have assertions in tcg-op-gvec.c that the offset from ENV is
aligned, for vector types <= V128. But the offset itself does not
validate that the final pointer is aligned -- one must also remember
to use the QEMU_ALIGNED() attribute on the vector member within ENV.
PowerPC Altivec has vector load/store instructions that silently
discard the low 4 bits of the address, making alignment mistakes
difficult to discover. Aid that by making the most popular host
visibly signal the error.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
| |
Using umin(a, b) == a as an expansion for TCG_COND_LEU is a
better alternative to (a - INT_MIN) <= (b - INT_MIN).
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
| |
This is now handled by code within tcg-op-vec.c.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
| |
We already had backend support for this feature. Expand the new
cmpsel opcode using vpblendb. The combination allows us to avoid
an extra NOT for some comparison codes.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
| |
Perform a per-element conditional move. This combination operation is
easier to implement on some host vector units than plain cmp+bitsel.
Omit the usual gvec interface, as this is intended to be used by
target-specific gvec expansion call-backs.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
| |
This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units. Include gvec expanders.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VBROADCASTSD instruction only allows %ymm registers as destination.
Rather than forcing VEX.L and writing to the entire 256-bit register,
revert to using MOVDDUP with an %xmm register. This is sufficient for
an avx1 host since we do not support TCG_TYPE_V256 for that case.
Also fix the 32-bit avx2, which should have used VPBROADCASTW.
Fixes: 1e262b49b533
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
| |
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
| |
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
| |
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
| |
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first. This is especially important
if integer/vector register moves do not exist.
Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:
VECE == 32/64:
Load the value into a vector register, then dup.
Both of these must work.
VECE == 8/16:
If the value happens to be at an offset such that an aligned
load would place the desired value in the least significant
end of the register, go ahead and load w/garbage in high bits.
Load the value w/INDEX_op_ld{8,16}_i32.
Attempt a move directly to vector reg, which may fail.
Store the value into the backing store for OTS.
Load the value into the vector reg w/TCG_TYPE_I32, which must work.
Duplicate from the vector reg into itself, which must work.
All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
| |
At the same time, improve tcg_out_dupi_vec wrt broadcast
from the constant pool.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
| |
Currently stubbed out in all backends that support vectors.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
| |
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Added some commentary to the tcg_reg_alloc_* functions.
|
|
|
|
|
|
|
|
|
|
|
| |
The i386 backend already has these functions, and the aarch64 backend
could easily split out one. Nothing is done with these functions yet,
but this will aid register allocation of INDEX_op_dup_vec in a later patch.
Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch merely changes the interface, aborting on all failures,
of which there are currently none.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
| |
This is part c of relocation overflow handling.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
| |
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
| |
This will let backends implement the double-word shift operation.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Due to a cut/paste error in the original implementation, the unsigned
vector saturating arithmetic was erroneously being calculated as signed
vector saturating arithmetic.
Fixes: 8ffafbcec2 ("tcg/i386: Implement vector saturating arithmetic")
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190207224258.426-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
| |
Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB,
remove the define and the old code.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As the following experiments show, this series is a net perf gain,
particularly for memory-heavy workloads. Experiments are run on an
Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz.
1. System boot + shudown, debian aarch64:
- Before (v3.1.0):
Performance counter stats for './die.sh v3.1.0' (10 runs):
9019.797015 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% )
29,910,312,379 cycles # 3.316 GHz ( +- 0.14% )
54,699,252,014 instructions # 1.83 insn per cycle ( +- 0.08% )
10,061,951,686 branches # 1115.541 M/sec ( +- 0.08% )
172,966,530 branch-misses # 1.72% of all branches ( +- 0.07% )
9.084039051 seconds time elapsed ( +- 0.23% )
- After:
Performance counter stats for './die.sh tlb-dyn-v5' (10 runs):
8624.084842 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% )
28,556,123,404 cycles # 3.311 GHz ( +- 0.13% )
51,755,089,512 instructions # 1.81 insn per cycle ( +- 0.05% )
9,526,513,946 branches # 1104.641 M/sec ( +- 0.05% )
166,578,509 branch-misses # 1.75% of all branches ( +- 0.19% )
8.680540350 seconds time elapsed ( +- 0.24% )
That is, a 4.4% perf increase.
2. System boot + shutdown, ubuntu 18.04 x86_64:
- Before (v3.1.0):
56100.574751 task-clock (msec) # 1.016 CPUs utilized ( +- 4.81% )
200,745,466,128 cycles # 3.578 GHz ( +- 5.24% )
431,949,100,608 instructions # 2.15 insn per cycle ( +- 5.65% )
77,502,383,330 branches # 1381.490 M/sec ( +- 6.18% )
844,681,191 branch-misses # 1.09% of all branches ( +- 3.82% )
55.221556378 seconds time elapsed ( +- 5.01% )
- After:
56603.419540 task-clock (msec) # 1.019 CPUs utilized ( +- 10.19% )
202,217,930,479 cycles # 3.573 GHz ( +- 10.69% )
439,336,291,626 instructions # 2.17 insn per cycle ( +- 14.14% )
80,538,357,447 branches # 1422.853 M/sec ( +- 16.09% )
776,321,622 branch-misses # 0.96% of all branches ( +- 3.77% )
55.549661409 seconds time elapsed ( +- 10.44% )
No improvement (within noise range). Note that for this workload,
increasing the time window too much can lead to perf degradation,
since it flushes the TLB *very* frequently.
3. x86_64 SPEC06int:
x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set)
Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)
5.5 +------------------------------------------------------------------------+
| +-+ |
5 |-+.................+-+...............................tlb-dyn-v5.......+-|
| * * |
4.5 |-+.................*.*................................................+-|
| * * |
4 |-+.................*.*................................................+-|
| * * |
3.5 |-+.................*.*................................................+-|
| * * |
3 |-+......+-+*.......*.*................................................+-|
| * * * * |
2.5 |-+......*..*.......*.*.................................+-+*...........+-|
| * * * * * * |
2 |-+......*..*.......*.*.................................*..*...........+-|
| * * * * * * +-+ |
1.5 |-+......*..*.......*.*.................................*..*.*+-+.*+-+.+-|
| * * *+-+ * * +-+ *+-+ +-+ +-+ * * * * * * |
1 |++++-+*+*++*+*++*++*+*++*+*+++-+*+*+-++*+-++++-++++-+++*++*+*++*+*++*+++|
| * * * * * * * * * * * * * * * * * * * * * * * * * * |
0.5 +------------------------------------------------------------------------+
400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean
png: https://imgur.com/YRF90f7
That is, a 1.51x average speedup over the baseline, with a max speedup
of 5.17x.
Here's a different look at the SPEC06int results, using KVM as the baseline:
x86_64-softmmu slowdown vs. KVM for SPEC06int (test set)
Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)
25 +---------------------------------------------------------------------------+
| +-+ +-+ |
| * * +-+ v3.1.0 |
| * * +-+ tlb-dyn-v5 |
| * * * * +-+ |
20 |-+.................*.*.............................*.+-+......*.*........+-|
| * * * # # * * |
| +-+ * * * # # * * |
| * * * * * # # * * |
15 |-+......*.*........*.*.............................*.#.#......*.+-+......+-|
| * * * * * # # * #|# |
| * * * * +-+ * # # * +-+ |
| * * +-+ * * ++-+ +-+ * # # * # # +-+ |
| * * +-+ * * * ## *| +-+ * # # * # # +-+ |
10 |-+......*.*..*.+-+.*.*........*.##.......++-+.*.+-+*.#.#......*.#.#.*.*..+-|
| * * * +-+ * * * ## +-+ *# # * # #* # # +-+ * # # * * |
| * * * # # * * +-+ * ## * +-+ *# # * # #* # # * * * # # *+-+ |
| * * * # # * * * +-+ * ## * # # *# # * # #* # # * * * # # * ## |
5 |-+......*.+-+*.#.#.*.*..*.#.#.*.##.*.#.#.*#.#.*.#.#*.#.#.*.*..*.#.#.*.##.+-|
| * # #* # # * +-+* # # * ## * # # *# # * # #* # # * * * # # * ## |
| * # #* # # * # #* # # * ## * # # *# # * # #* # # * +-+* # # * ## |
| ++-+ * # #* # # * # #* # # * ## * # # *# # * # #* # # * # #* # # * ## |
|+++*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+*+#+#+*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+++|
0 +---------------------------------------------------------------------------+
400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean
png: https://imgur.com/YzAMNEV
After this series, we bring down the average SPEC06int slowdown vs KVM
from 11.47x to 7.58x.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-4-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
|
|
|
| |
Disabled in all TCG backends for now.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
| |
The avx instruction set does not directly provide MO_64.
We can still implement 64-bit with comparison and vpblendvb.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
|
| |
Only MO_8 and MO_16 are implemented, since that's all the
instruction set provides.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
|
|
| |
This routine was becoming too large.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
|
|
| |
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|