summaryrefslogtreecommitdiffstats
path: root/tcg/tcg-op-gvec.c
Commit message (Collapse)AuthorAgeFilesLines
* tcg: Use tcg_constant_{i32,i64,vec} with gvec expandersRichard Henderson2021-01-131-77/+50Star
| | | | Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Use memset for large vector byte replicationRichard Henderson2021-01-041-0/+32
| | | | | | | | | | | | | | | | | | In f47db80cc07, we handled odd-sized tail clearing for the case of hosts that have vector operations, but did not handle the case of hosts that do not have vector ops. This was ok until e2e7168a214b, which changed the encoding of simd_desc such that the odd sizes are impossible. Add memset as a tcg helper, and use that for all out-of-line byte stores to vectors. This includes, but is not limited to, the tail clearing operation in question. Cc: qemu-stable@nongnu.org Buglink: https://bugs.launchpad.net/bugs/1907817 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Adjust simd_desc size encodingRichard Henderson2020-10-081-8/+27
| | | | | | | | | | | | | | With larger vector sizes, it turns out oprsz == maxsz, and we only need to represent mismatch for oprsz <= 32. We do, however, need to represent larger oprsz and do so without reducing SIMD_DATA_BITS. Reduce the size of the oprsz field and increase the maxsz field. Steal the oprsz value of 24 to indicate equality with maxsz. Tested-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Frank Chang <frank.chang@sifive.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Implement 256-bit dup for tcg_gen_gvec_dup_memRichard Henderson2020-09-031-3/+49
| | | | | | | | We already support duplication of 128-bit blocks. This extends that support to 256-bit blocks. This will be needed by SVE2. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Eliminate one store for in-place 128-bit dup_memRichard Henderson2020-09-031-2/+2
| | | | | | | Do not store back to the exact memory from which we just loaded. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Fix tcg gen for vectorized absolute valueStephen Long2020-09-031-2/+3
| | | | | | | | | | | | | | The fallback inline expansion for vectorized absolute value, when the host doesn't support such an insn was flawed. E.g. when a vector of bytes has all elements negative, mask will be 0xffff_ffff_ffff_ffff. Subtracting mask only adds 1 to the low element instead of all elements becase -mask is 1 and not 0x0101_0101_0101_0101. Signed-off-by: Stephen Long <steplong@quicinc.com> Message-Id: <20200813161818.190-1-steplong@quicinc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Implement gvec support for rotate by scalarRichard Henderson2020-06-021-0/+22
| | | | | | | | | | No host backend support yet, but the interfaces for rotls are in place. Only implement left-rotate for now, as the only known use of vector rotate by scalar is s390x, so any right-rotate would be unused and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Implement gvec support for rotate by vectorRichard Henderson2020-06-021-0/+122
| | | | | | | | | | | No host backend support yet, but the interfaces for rotlv and rotrv are in place. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Drop the generic expansion from rot to shift; we can do better for each backend, and then this code becomes unused.
* tcg: Implement gvec support for rotate by immediateRichard Henderson2020-06-021-0/+68
| | | | | | | | | | No host backend support yet, but the interfaces for rotli are in place. Canonicalize immediate rotate to the left, based on a survey of architectures, but provide both left and right shift interfaces to the translators. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add load_dest parameter to GVecGen2Richard Henderson2020-05-061-13/+32
| | | | | | | | | We have this same parameter for GVecGen2i, GVecGen3, and GVecGen3i. This will make some SVE2 insns easier to parameterize. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Improve vector tail clearingRichard Henderson2020-05-061-20/+64
| | | | | | | | Better handling of non-power-of-2 tails as seen with Arm 8-byte vector operations. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Remove tcg_gen_gvec_dup{8,16,32,64}iRichard Henderson2020-05-061-28/+0Star
| | | | | | | | | These interfaces are now unused. Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Use tcg_gen_gvec_dup_imm in logical simplificationsRichard Henderson2020-05-061-4/+4
| | | | | | | | Replace the outgoing interface. Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add tcg_gen_gvec_dup_immRichard Henderson2020-05-061-0/+7
| | | | | | | | | | | | Add a version of tcg_gen_dup_* that takes both immediate and a vector element size operand. This will replace the set of tcg_gen_gvec_dup{8,16,32,64}i functions that encode the element size within the function name. Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add tcg_gen_gvec_5_ptrRichard Henderson2020-02-121-0/+32
| | | | | | | | | | Extend the vector generator infrastructure to handle 5 vector arguments. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Taylor Simpson <tsimpson@quicinc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Search includes from the project root source directoryPhilippe Mathieu-Daudé2020-01-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently search both the root and the tcg/ directories for tcg files: $ git grep '#include "tcg/' | wc -l 28 $ git grep '#include "tcg[^/]' | wc -l 94 To simplify the preprocessor search path, unify by expliciting the tcg/ directory. Patch created mechanically by running: $ for x in \ tcg.h tcg-mo.h tcg-op.h tcg-opc.h \ tcg-op-gvec.h tcg-gvec-desc.h; do \ sed -i "s,#include \"$x\",#include \"tcg/$x\"," \ $(git grep -l "#include \"$x\""); \ done Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200101112303.20724-2-philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* Include qemu/main-loop.h lessMarkus Armbruster2019-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
* Include qemu-common.h exactly where neededMarkus Armbruster2019-06-121-1/+0Star
| | | | | | | | | | | | | | | | No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
* tcg: Add support for vector bitwise selectRichard Henderson2019-05-221-0/+23
| | | | | | | This operation performs d = (b & a) | (c & ~a), and is present on a majority of host vector units. Include gvec expanders. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Fix missing checks and clears in tcg_gen_gvec_dup_memRichard Henderson2019-05-221-23/+25
| | | | | | | | | | | | | The paths through tcg_gen_dup_mem_vec and through MO_128 were missing the check_size_align. The path through MO_128 was also missing the expand_clr. This last was not visible because the only user is ARM SVE, which would set oprsz == maxsz, and not require the clear. Fix by adding the check_size_align and using do_dup directly instead of duplicating the check in tcg_gen_gvec_dup_{i32,i64}. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add support for vector absolute valueRichard Henderson2019-05-141-0/+63
| | | | | Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add gvec expanders for vector shift by scalarRichard Henderson2019-05-141-0/+214
| | | | | | | | | Allow expansion either via shift by scalar or by replicating the scalar for shift by vector. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Use a private structure for do_gvec_shifts.
* tcg: Add gvec expanders for variable shiftRichard Henderson2019-05-141-0/+195
| | | | | | | | | The gvec expanders perform a modulo on the shift count. If the target requires alternate behaviour, then it cannot use the generic gvec expanders anyway, and will have to have its own custom code. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add INDEX_op_dupm_vecRichard Henderson2019-05-141-41/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the backend to expand dup from memory directly, instead of forcing the value into a temp first. This is especially important if integer/vector register moves do not exist. Note that officially tcg_out_dupm_vec is allowed to fail. If it did, we could fix this up relatively easily: VECE == 32/64: Load the value into a vector register, then dup. Both of these must work. VECE == 8/16: If the value happens to be at an offset such that an aligned load would place the desired value in the least significant end of the register, go ahead and load w/garbage in high bits. Load the value w/INDEX_op_ld{8,16}_i32. Attempt a move directly to vector reg, which may fail. Store the value into the backing store for OTS. Load the value into the vector reg w/TCG_TYPE_I32, which must work. Duplicate from the vector reg into itself, which must work. All of which is well and good, except that all supported hosts can support dupm for all vece, so all of the failure paths would be dead code and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Specify optional vector requirements with a listRichard Henderson2019-05-131-105/+144
| | | | | | | | | | | | | | | | | Replace the single opcode in .opc with a null-terminated array in .opt_opc. We still require that all opcodes be used with the same .vece. Validate the contents of this list with CONFIG_DEBUG_TCG. All tcg_gen_*_vec functions will check any list active during .fniv expansion. Swap the active list in and out as we expand other opcodes, or take control away from the front-end function. Convert all existing vector aware front ends. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Implement tcg_gen_gvec_3i()David Hildenbrand2019-05-131-0/+139
| | | | | | | | | | | Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed for now. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190416185301.25344-2-david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Fix LGPL version numberThomas Huth2019-01-301-1/+1
| | | | | | | | | | | | It's either "GNU *Library* General Public version 2" or "GNU Lesser General Public version *2.1*", but there was no "version 2.0" of the "Lesser" library. So assume that version 2.1 is meant here. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* tcg: Add opcodes for vector minmax arithmeticRichard Henderson2019-01-281-0/+108
| | | | Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add opcodes for vector saturated arithmeticRichard Henderson2019-01-281-20/+64
| | | | Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add write_aofs to GVecGen4Richard Henderson2019-01-281-8/+19
| | | | | | This allows writing 2 output, 3 input operations. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add gvec expanders for nand, nor, eqvRichard Henderson2019-01-281-0/+51
| | | | | Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add logical simplifications during gvec expandRichard Henderson2019-01-281-5/+30
| | | | | | | | We handle many of these during integer expansion, and the rest of them during integer optimization. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Restrict check_size_impl to multiples of the line sizeRichard Henderson2018-07-091-2/+5
| | | | | | | | | | | | | | Normally this is automatic in the size restrictions that are placed on vector sizes coming from the implementation. However, for the legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final 24 bytes of the vector register. Without this check, do_dup selects TCG_TYPE_V128 and clears only 16 bytes. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180705191929.30773-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* tcg: Add choose_vector_sizeRichard Henderson2018-03-151-180/+260
| | | | | | | | | | | This unifies 5 copies of checks for supported vector size, and in the process fixes a missing check in tcg_gen_gvec_2s. This lead to an assertion failure for 64-bit vector multiply, which is not available in the AVX instruction set. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add generic vector helpers with a scalar operandRichard Henderson2018-02-081-1/+360
| | | | | | | | | | | | Use dup to convert a non-constant scalar to a third vector. Add addition, multiplication, and logical operations with an immediate. Add addition, subtraction, multiplication, and logical operations with a non-constant scalar. Allow for the front-end to build operations in which the scalar operand comes first. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add generic helpers for saturating arithmeticRichard Henderson2018-02-081-0/+92
| | | | | | | | No vector ops as yet. SSE only has direct support for 8- and 16-bit saturation; handling 32- and 64-bit saturation is much more expensive. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add generic vector ops for multiplicationRichard Henderson2018-02-081-0/+29
| | | | | Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add generic vector ops for comparisonsRichard Henderson2018-02-081-0/+151
| | | | | Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add generic vector ops for constant shiftsRichard Henderson2018-02-081-0/+276
| | | | | | | | | Opcodes are added for scalar and vector shifts, but considering the varied semantics of these do not expose them to the front ends. Do go ahead and provide them in case they are needed for backend expansion. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add generic vector expandersRichard Henderson2018-02-081-0/+1309
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>