summaryrefslogtreecommitdiffstats
path: root/tcg
Commit message (Collapse)AuthorAgeFilesLines
* tcg/aarch64: Remove unused code in tcg_out_opQi Hu2022-10-251-17/+14Star
| | | | | | | | | | | | | AArch64 defines the TCG_TARGET_HAS_direct_jump. So the "else" block is useless in the case of "INDEX_op_goto_tb" in function "tcg_out_op". Add an assertion and delete these codes for clarity. Suggested-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Qi Hu <huqi@loongson.cn> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221017020826.990729-1-huqi@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/loongarch64: Add direct jump supportQi Hu2022-10-252-7/+50
| | | | | | | | | | | | | | Similar to the ARM64, LoongArch has PC-relative instructions such as PCADDU18I. These instructions can be used to support direct jump for LoongArch. Additionally, if instruction "B offset" can cover the target address(target is within ±128MB range), a single "B offset" plus a nop will be used by "tb_target_set_jump_target". Signed-off-by: Qi Hu <huqi@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: WANG Xuerui <git@xen0n.name> Message-Id: <20221015092754.91971-1-huqi@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/ppc: Optimize 26-bit jumpsLeandro Lupori2022-10-041-31/+88
| | | | | | | | | | | | | | | | PowerPC64 processors handle direct branches better than indirect ones, resulting in less stalled cycles and branch misses. However, PPC's tb_target_set_jmp_target() was only using direct branches for 16-bit jumps, while PowerPC64's unconditional branch instructions are able to handle displacements of up to 26 bits. To take advantage of this, now jumps whose displacements fit in between 17 and 26 bits are also converted to direct branches. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br> [rth: Expanded some commentary.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* accel/tcg: Introduce tb_pc and log_pcRichard Henderson2022-10-041-4/+4
| | | | | | | | | | | The availability of tb->pc will shortly be conditional. Introduce accessor functions to minimize ifdefs. Pass around a known pc to places like tcg_gen_code, where the caller must already have the value. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* misc: fix commonly doubled up wordsDaniel P. Berrangé2022-08-011-1/+1
| | | | | | | Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220707163720.1421716-5-berrange@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
* tcg: Fix returned type in alloc_code_gen_buffer_splitwx_memfd()Shaobo Song2022-07-121-1/+1
| | | | | | | | | | | | | | | This fixes a bug in POSIX-compliant environments. Since we had allocated a buffer named 'tcg-jit' with read-write access protections we need a int type to combine these access flags and return it, whereas we had inexplicably return a bool type. It may cause an unnecessary protection change in tcg_region_init(). Cc: qemu-stable@nongnu.org Fixes: 7be9ebcf924c ("tcg: Return the map protection from alloc_code_gen_buffer") Signed-off-by: Shaobo Song <shnusongshaobo@gmail.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220624150216.3627-1-shnusongshaobo@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/tci: Remove CONFIG_DEBUG_TCG_INTERPRETERRichard Henderson2022-07-052-12/+0Star
| | | | | | | | | | | | | There is nothing in this environment variable that cannot be done better with -d flags. There is nothing special about TCI that warrants this hack. Moreover, it does not compile -- remove it. Reported-by: Song Gao <gaosong@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/ppc: implement rem[u]_i{32,64} with mod[su][wd]Matheus Kowalczuk Ferst2022-06-202-2/+24
| | | | | | | | Power ISA v3.0 introduced mod[su][wd] insns that can be used to implement rem[u]_i{32,64}. Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Fix illegal insn from out-of-range shliRichard Henderson2022-06-021-1/+1
| | | | | | | | | | The masking in tcg_out_shl was incorrect, producing an illegal instruction, rather than merely unspecified results for the out-of-range shift. Tested-by: Joel Stanley <joel@jms.id.au> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1051 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Fix encoding of OPC_VPSRAQ for INDEX_op_sars_vecRichard Henderson2022-06-021-1/+1
| | | | | | | | | We wanted the VPSRAQ variant with the scalar vector shift operand, not the variant with an immediate operand. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1022 Fixes: 47b331b2a8da ("tcg/i386: Implement avx512 scalar shift") Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/ppc: Optimize memory ordering generation with lwsyncNicholas Piggin2022-05-261-3/+6
| | | | | | | | | | lwsync orders more than just LD_LD, importantly it matches x86 and s390 default memory ordering. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220519135908.21282-4-npiggin@gmail.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* tcg/ppc: ST_ST memory ordering is not provided with eieioNicholas Piggin2022-05-261-2/+0Star
| | | | | | | | | | eieio does not provide ordering between stores to CI memory and stores to cacheable memory so it can't be used as a general ST_ST barrier. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-of-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20220519135908.21282-3-npiggin@gmail.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: declare vmsumuh[ms] helper with call flagsMatheus Ferst2022-05-261-0/+1
| | | | | | | | | | | Move vmsumuhm and vmsumuhs to decodetree, declare vmsumuhm helper with TCG_CALL_NO_RWG, and drop the unused env argument. Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220517123929.284511-12-matheus.ferst@eldorado.org.br> [danielhb: added #undef VMSUMUHM to fix ppc64 build] Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* Normalize header guard symbol definitionMarkus Armbruster2022-05-111-1/+1
| | | | | | | | | | | We commonly define the header guard symbol without an explicit value. Normalize the exceptions. Done with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-4-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Implement tcg_gen_{h,w}swap_{i32,i64}Richard Henderson2022-05-041-0/+30
| | | | | | | | | | | | Swap half-words (16-bit) and words (32-bit) within a larger value. Mirrors functions of the same names within include/qemu/bitops.h. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Miller <dmiller423@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20220428094708.84835-5-david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
* compiler.h: replace QEMU_NORETURN with G_NORETURNMarc-André Lureau2022-04-211-1/+2
| | | | | | | | | | | | | G_NORETURN was introduced in glib 2.68, fallback to G_GNUC_NORETURN in glib-compat. Note that this attribute must be placed before the function declaration (bringing a bit of consistency in qemu codebase usage). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Warner Losh <imp@bsdimp.com> Message-Id: <20220420132624.2439741-20-marcandre.lureau@redhat.com>
* Merge tag 'pull-tcg-20220420' of https://gitlab.com/rth7680/qemu into stagingRichard Henderson2022-04-211-7/+27
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cleanup sysemu/tcg.h usage. Fix indirect lowering vs cond branches Remove ATOMIC_MMU_IDX Add tcg_constant_ptr # -----BEGIN PGP SIGNATURE----- # # iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmJgW38dHHJpY2hhcmQu # aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV8tpggApfg2CDI0bRMDBh0g # 04/xwNnzHuSa84/ocMOMUfD5pvBblUmeTH8fAwqcAPDM/EEZwWZl2V1bYzuIrbmR # 8zV+r1cOenDF5Tz8PWfy8XssinTVtTWh/TE0XNV9R/SbEM9eMsjHNu5osKVuLuq1 # rnHWZf8LuY7xGsy4GYqPN0dLE6HtQOfpj/eLGRAj9mZ7re0jKeWg3GdxYoiYDmks # NKmNHYcWD+SjjFvXlOafniQsHbBZmQc/qp7AShG/+VcYY9o1VfncWD6I2dV13RdB # N7++ZhGyQR4NOVo6CN1zLKhfuJqzH2q+qJ7vQ3xtXNAk53LGQ91zjoE+3KaJTrcy # dmnLUw== # =aKdS # -----END PGP SIGNATURE----- # gpg: Signature made Wed 20 Apr 2022 12:14:07 PM PDT # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate] * tag 'pull-tcg-20220420' of https://gitlab.com/rth7680/qemu: tcg: Add tcg_constant_ptr accel/tcg: Remove ATOMIC_MMU_IDX tcg: Fix indirect lowering vs TCG_OPF_COND_BRANCH Don't include sysemu/tcg.h if it is not necessary Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
| * tcg: Fix indirect lowering vs TCG_OPF_COND_BRANCHRichard Henderson2022-04-201-7/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With TCG_OPF_COND_BRANCH, we extended the lifetimes of globals across extended basic blocks. This means that the liveness computed in pass 1 does not kill globals in the same way as normal temps. Introduce TYPE_EBB to match this lifetime, so that we get correct register allocation for the temps that we introduce during the indirect lowering pass. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Fixes: b4cb76e6208 ("tcg: Do not kill globals at conditional branches") Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* | util/log: Remove qemu_log_flushRichard Henderson2022-04-201-1/+0Star
| | | | | | | | | | | | | | | | | | | | All uses flush output immediately before or after qemu_log_unlock. Instead of a separate call, move the flush into qemu_log_unlock. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220417183019.755276-20-richard.henderson@linaro.org>
* | tcg: Pass the locked filepointer to tcg_dump_opsRichard Henderson2022-04-201-57/+52Star
| | | | | | | | | | | | | | | | | | | | We have already looked up and locked the filepointer. Use fprintf instead of qemu_log directly for output in and around tcg_dump_ops. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220417183019.755276-12-richard.henderson@linaro.org>
* | *: Use fprintf between qemu_log_trylock/unlockRichard Henderson2022-04-201-34/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | Inside qemu_log, we perform qemu_log_trylock/unlock, which need not be done if we have already performed the lock beforehand. Always check the result of qemu_log_trylock -- only checking qemu_loglevel_mask races with the acquisition of the lock on the logfile. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220417183019.755276-10-richard.henderson@linaro.org>
* | util/log: Rename qemu_log_lock to qemu_log_trylockRichard Henderson2022-04-201-4/+4
|/ | | | | | | | | | | | | | This function can fail, which makes it more like ftrylockfile or pthread_mutex_trylock than flockfile or pthread_mutex_lock, so rename it. To closer match the other trylock functions, release rcu_read_lock along the failure path, so that qemu_log_unlock need not be called on failure. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220417183019.755276-8-richard.henderson@linaro.org>
* Remove qemu-common.h include from most unitsMarc-André Lureau2022-04-061-1/+0Star
| | | | | | Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Replace qemu_real_host_page variables with inlined functionsMarc-André Lureau2022-04-061-4/+4
| | | | | | | | | | | | Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Replace config-time define HOST_WORDS_BIGENDIANMarc-André Lureau2022-04-067-20/+20
| | | | | | | | | | | | | | | | | | | Replace a config-time define with a compile time condition define (compatible with clang and gcc) that must be declared prior to its usage. This avoids having a global configure time define, but also prevents from bad usage, if the config header wasn't included before. This can help to make some code independent from qemu too. gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> [ For the s390x parts I'm involved in ] Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* tcg/aarch64: Use 'ull' suffix to force 64-bit constantRichard Henderson2022-03-311-2/+2
| | | | | | | | | Typo used only 'ul' suffix, which is still 32-bits for windows host. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/947 Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/arm: Don't emit UNPREDICTABLE LDRD with Rm == Rt or Rt+1Richard Henderson2022-03-141-2/+15
| | | | | | | | | | | | | | | | | | | | | | The LDRD (register) instruction is UNPREDICTABLE if the Rm register is the same as either Rt or Rt+1 (the two registers being loaded to). We weren't making sure we avoided this, with the result that on some host CPUs like the Cortex-A7 we would get a SIGILL because the CPU chooses to UNDEF for this particular UNPREDICTABLE case. Since we've already checked that datalo is aligned, we can simplify the test vs the Rm operand by aligning it before comparison. Check for the two orderings before falling back to two ldr instructions. We don't bother to do anything similar for tcg_out_ldrd_rwb(), because it is only used in tcg_out_tlb_read() with a fixed set of registers which don't overlap. There is no equivalent UNPREDICTABLE case for STRD. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/896 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/s390x: Fix tcg_out_dup_vec vs general registersRichard Henderson2022-03-141-0/+1
| | | | | | | | | | | | We copied the data from the general register input to the vector register output, but have not yet replicated it. We intended to fall through into the vector-vector case, but failed to redirect the input register. This is caught by an assertion failure in tcg_out_insn_VRIc, which diagnosed the incorrect register class. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/s390x: Fix INDEX_op_bitsel_vec vs VSELRichard Henderson2022-03-141-1/+1
| | | | | | | | | | The operands are output in the wrong order: the tcg selector argument is first, whereas the s390x selector argument is last. Tested-by: Thomas Huth <thuth@redhat.com> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/898 Fixes: 9bca986df88 ("tcg/s390x: Implement TCG_TARGET_HAS_bitsel_vec") Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/s390x: Fix tcg_out_dupi_vec vs VGMRichard Henderson2022-03-141-2/+2
| | | | | | | The immediate operands to VGM were in the wrong order, producing an inverse mask. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* osdep: Move memalign-related functions to their own headerPeter Maydell2022-03-071-0/+1
| | | | | | | | | | | Move the various memalign-related functions out of osdep.h and into their own header, which we include only where they are used. While we're doing this, add some brief documentation comments. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
* tcg/i386: Implement bitsel for avx512Richard Henderson2022-03-042-2/+20
| | | | | | | | | | The general ternary logic operation can implement BITSEL. Funnel the 4-operand operation into three variants of the 3-operand instruction, depending on input operand overlap. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Implement more logical operations for avx512Richard Henderson2022-03-042-5/+39
| | | | | | | | | AVX512VL has a general ternary logic operation, VPTERNLOGQ, which can implement NOT, ORC, NAND, NOR, EQV. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Implement avx512 multiplyRichard Henderson2022-03-041-6/+6
| | | | | | | | AVX512DQ has VPMULLQ. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Implement avx512 min/max/absRichard Henderson2022-03-041-7/+11
| | | | | | | | AVX512VL has VPABSQ, VPMAXSQ, VPMAXUQ, VPMINSQ, VPMINUQ. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Expand scalar rotate with avx512 insnsRichard Henderson2022-03-041-20/+29
| | | | | | | | | Expand 32-bit and 64-bit scalar rotate with VPRO[LR]V; expand 16-bit scalar rotate with VPSHLDV. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Remove rotls_vec from tcg_target_op_defRichard Henderson2022-03-041-1/+0Star
| | | | | | | | | There is no such instruction on x86, so we should not be pretending it has arguments. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Expand vector word rotate as avx512vbmi2 shift-doubleRichard Henderson2022-03-041-1/+17
| | | | | | | | | While there are no specific 16-bit rotate instructions, there are double-word shifts, which can perform the same operation. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Support avx512vbmi2 vector shift-double instructionsRichard Henderson2022-03-043-0/+42
| | | | | | | | We will use VPSHLD, VPSHLDV and VPSHRDV for 16-bit rotates. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Implement avx512 variable rotateRichard Henderson2022-03-042-2/+25
| | | | | | | | AVX512VL has VPROLVD and VPRORVQ. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Implement avx512 immediate rotateRichard Henderson2022-03-042-3/+14
| | | | | | | AVX512VL has VPROLD and VPROLQ, layered onto the same opcode as PSHIFTD, but requires EVEX encoding and W1. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Implement avx512 immediate sari shiftRichard Henderson2022-03-041-9/+21
| | | | | | | | | AVX512 has VPSRAQ with immediate operand, in the same form as with AVX, but requires EVEX encoding and W1. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Implement avx512 scalar shiftRichard Henderson2022-03-041-2/+10
| | | | | | | | AVX512VL has VPSRAQ. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Implement avx512 variable shiftsRichard Henderson2022-03-041-8/+24
| | | | | | | | | AVX512VL has VPSRAVQ, and AVX512BW has VPSLLVW, VPSRAVW, VPSRLVW. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Use tcg_can_emit_vec_op in expand_vec_cmp_noinvRichard Henderson2022-03-041-4/+4
| | | | | | | | | The condition for UMIN/UMAX availability is about to change; use the canonical version. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Add tcg_out_evex_opcRichard Henderson2022-03-041-1/+50
| | | | | | | | The evex encoding is added here, for use in a subsequent patch. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Detect AVX512Richard Henderson2022-03-042-2/+26
| | | | | | | | | There are some operation sizes in some subsets of AVX512 that are missing from previous iterations of AVX. Detect them. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/s390x: Implement vector NAND, NOR, EQVRichard Henderson2022-03-042-3/+20
| | | | | | | Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/ppc: Implement vector NAND, NOR, EQVRichard Henderson2022-03-042-3/+18
| | | | | | | Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add opcodes for vector nand, nor, eqvRichard Henderson2022-03-048-15/+45
| | | | | | | | | | We've had placeholders for these opcodes for a while, and should have support on ppc, s390x and avx512 hosts. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>