summaryrefslogtreecommitdiffstats
path: root/tcg
Commit message (Collapse)AuthorAgeFilesLines
* Supply missing header guardsMarkus Armbruster2019-06-122-0/+10
| | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190604181618.19980-5-armbru@redhat.com>
* Include qemu-common.h exactly where neededMarkus Armbruster2019-06-126-6/+0Star
| | | | | | | | | | | | | | | | No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
* tcg/arm: Remove mostly unreachable tlb special caseRichard Henderson2019-06-101-11/+12
| | | | | | | | | | | There was nothing armv7 specific about the bic+cmp sequence, however looking at the set of guests more closely shows that the 8-bit immediate operand for the bic can only be satisfied with one guest in tree: baseline m-profile -- 10-bit pages with aligned 4-byte memory ops. Therefore it does not seem useful to keep this path. Acked-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/arm: Use LDRD to load tlb mask+tableRichard Henderson2019-06-101-26/+40
| | | | | | | | | | | | | | | | | | | | | | | | This changes the code generation for the tlb from e.g. ldr ip, [r6, #-0x10] ldr r2, [r6, #-0xc] and ip, ip, r4, lsr #8 ldrd r0, r1, [r2, ip]! ldr r2, [r2, #0x18] to ldrd r0, r1, [r6, #-0x10] and r0, r0, r4, lsr #8 ldrd r2, r3, [r1, r0]! ldr r1, [r1, #0x18] for armv7 hosts. Rearranging the register allocation in order to avoid overlap between the two ldrd pairs causes the patch to be larger than it ordinarily would be. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Use LDP to load tlb mask+tableRichard Henderson2019-06-101-7/+8
| | | | | | | | | | | | | | | | | | | | | | This changes the code generation for the tlb from e.g. ldur x0, [x19, #0xffffffffffffffe0] ldur x1, [x19, #0xffffffffffffffe8] and x0, x0, x20, lsr #8 add x1, x1, x0 ldr x0, [x1] ldr x1, [x1, #0x18] to ldp x0, x1, [x19, #-0x20] and x0, x0, x20, lsr #8 add x1, x1, x0 ldr x0, [x1] ldr x1, [x1, #0x18] Acked-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* cpu: Move the softmmu tlb to CPUNegativeOffsetStateRichard Henderson2019-06-108-159/+59Star
| | | | | | | | | | | We have for some time had code within the tcg backends to handle large positive offsets from env. This move makes sure that need not happen. Indeed, we are able to assert at build time that simple offsets suffice for all hosts. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Create struct CPUTLBRichard Henderson2019-06-108-57/+19Star
| | | | | | | | | | Move all softmmu tlb data into this structure. Arrange the members so that we are able to place mask+table together and at a smaller absolute offset from ENV. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/storeRichard Henderson2019-05-221-2/+22
| | | | | | | | | | | | | | | | | This instruction raises #GP, aka SIGSEGV, if the effective address is not aligned to 16-bytes. We have assertions in tcg-op-gvec.c that the offset from ENV is aligned, for vector types <= V128. But the offset itself does not validate that the final pointer is aligned -- one must also remember to use the QEMU_ALIGNED() attribute on the vector member within ENV. PowerPC Altivec has vector load/store instructions that silently discard the low 4 bits of the address, making alignment mistakes difficult to discover. Aid that by making the most popular host visibly signal the error. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Allow immediates for vector ORR and BICRichard Henderson2019-05-221-7/+83
| | | | | | | The allows immediates to be used for ORR and BIC, as well as the trivial inversions, ORC and AND. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Build vector immediates with two insnsRichard Henderson2019-05-221-0/+47
| | | | | | | | Use MOVI+ORR or MVNI+BIC in order to build some vector constants, as opposed to dropping them to the constant pool. This includes all 16-bit constants and a similar set of 32-bit constants. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Use MVNI in tcg_out_dupi_vecRichard Henderson2019-05-221-0/+11
| | | | | | | The compliment of a subset of immediates can be computed with a single instruction. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Split up is_fimmRichard Henderson2019-05-221-84/+119
| | | | | | | | | | | There are several sub-classes of vector immediate, and only MOVI can use them all. This will enable usage of MVNI and ORRI, which use progressively fewer sub-classes. This patch adds no new functionality, merely splits the function and moves part of the logic into tcg_out_dupi_vec. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Support vector bitwise select valueRichard Henderson2019-05-222-2/+24
| | | | | | | | | The instruction set has 3 insns that perform the same operation, only varying in which operand must overlap the destination. We can represent the operation without overlap and choose based on the operands seen. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Use umin/umax in expanding unsigned compareRichard Henderson2019-05-221-19/+61
| | | | | | | Using umin(a, b) == a as an expansion for TCG_COND_LEU is a better alternative to (a - INT_MIN) <= (b - INT_MIN). Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Remove expansion for missing minmaxRichard Henderson2019-05-221-37/+0Star
| | | | | | This is now handled by code within tcg-op-vec.c. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Support vector comparison select valueRichard Henderson2019-05-222-5/+36
| | | | | | | | We already had backend support for this feature. Expand the new cmpsel opcode using vpblendb. The combination allows us to avoid an extra NOT for some comparison codes. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negativeRichard Henderson2019-05-221-1/+1
| | | | | | | | | | | | | | | If INDEX_op_foo is always expanded by tcg_expand_vec_op, then there may be no reasonable set of constraints to return from tcg_target_op_def for that opcode. Let TCG_TARGET_HAS_foo be specified as -1 in that case. Thus a boolean test for TCG_TARGET_HAS_foo is true, but we will not assert within process_op_defs when no constraints are specified. Compare this with tcg_can_emit_vec_op, which already uses this tri-state indication. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Expand vector minmax using cmp+cmpselRichard Henderson2019-05-221-4/+16
| | | | | | Provide a generic fallback for the min/max operations. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Introduce do_op3_nofail for vector expansionRichard Henderson2019-05-221-18/+27
| | | | | | | This makes do_op3 match do_op2 in allowing for failure, and thus fall back expansions. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add support for vector compare selectRichard Henderson2019-05-228-0/+75
| | | | | | | | | Perform a per-element conditional move. This combination operation is easier to implement on some host vector units than plain cmp+bitsel. Omit the usual gvec interface, as this is intended to be used by target-specific gvec expansion call-backs. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add support for vector bitwise selectRichard Henderson2019-05-2210-0/+70
| | | | | | | This operation performs d = (b & a) | (c & ~a), and is present on a majority of host vector units. Include gvec expanders. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Fix missing checks and clears in tcg_gen_gvec_dup_memRichard Henderson2019-05-221-23/+25
| | | | | | | | | | | | | The paths through tcg_gen_dup_mem_vec and through MO_128 were missing the check_size_align. The path through MO_128 was also missing the expand_clr. This last was not visible because the only user is ARM SVE, which would set oprsz == maxsz, and not require the clear. Fix by adding the check_size_align and using do_dup directly instead of duplicating the check in tcg_gen_gvec_dup_{i32,i64}. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Fix dupi/dupm for avx1 and 32-bit hostsRichard Henderson2019-05-221-3/+4
| | | | | | | | | | | | | | The VBROADCASTSD instruction only allows %ymm registers as destination. Rather than forcing VEX.L and writing to the entire 256-bit register, revert to using MOVDDUP with an %xmm register. This is sufficient for an avx1 host since we do not support TCG_TYPE_V256 for that case. Also fix the 32-bit avx2, which should have used VPBROADCASTW. Fixes: 1e262b49b533 Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Do not advertise minmax for MO_64Richard Henderson2019-05-141-4/+4
| | | | | | | The min/max instructions are not available for 64-bit elements. Fixes: 93f332a50371 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Support vector absolute valueRichard Henderson2019-05-142-1/+7
| | | | | Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Support vector absolute valueRichard Henderson2019-05-142-1/+16
| | | | Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add support for vector absolute valueRichard Henderson2019-05-149-0/+114
| | | | | Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add support for integer absolute valueRichard Henderson2019-05-142-0/+25
| | | | | | | | | | Remove a function of the same name from target/arm/. Use a branchless implementation of abs gleaned from gcc. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Support vector scalar shift opcodesRichard Henderson2019-05-142-1/+36
| | | | Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add gvec expanders for vector shift by scalarRichard Henderson2019-05-144-0/+279
| | | | | | | | | Allow expansion either via shift by scalar or by replicating the scalar for shift by vector. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Use a private structure for do_gvec_shifts.
* tcg/aarch64: Support vector variable shift opcodesRichard Henderson2019-05-143-1/+45
| | | | | Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Support vector variable shift opcodesRichard Henderson2019-05-142-1/+36
| | | | Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add gvec expanders for variable shiftRichard Henderson2019-05-144-0/+225
| | | | | | | | | The gvec expanders perform a modulo on the shift count. If the target requires alternate behaviour, then it cannot use the generic gvec expanders anyway, and will have to have its own custom code. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add INDEX_op_dupm_vecRichard Henderson2019-05-147-41/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the backend to expand dup from memory directly, instead of forcing the value into a temp first. This is especially important if integer/vector register moves do not exist. Note that officially tcg_out_dupm_vec is allowed to fail. If it did, we could fix this up relatively easily: VECE == 32/64: Load the value into a vector register, then dup. Both of these must work. VECE == 8/16: If the value happens to be at an offset such that an aligned load would place the desired value in the least significant end of the register, go ahead and load w/garbage in high bits. Load the value w/INDEX_op_ld{8,16}_i32. Attempt a move directly to vector reg, which may fail. Store the value into the backing store for OTS. Load the value into the vector reg w/TCG_TYPE_I32, which must work. Duplicate from the vector reg into itself, which must work. All of which is well and good, except that all supported hosts can support dupm for all vece, so all of the failure paths would be dead code and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/aarch64: Implement tcg_out_dupm_vecRichard Henderson2019-05-141-2/+35
| | | | | | | The LD1R instruction does all the work. Note that the only useful addressing mode is a base register with no offset. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/i386: Implement tcg_out_dupm_vecRichard Henderson2019-05-131-14/+43
| | | | | | | At the same time, improve tcg_out_dupi_vec wrt broadcast from the constant pool. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Add tcg_out_dupm_vec to the backend interfaceRichard Henderson2019-05-133-1/+31
| | | | | | Currently stubbed out in all backends that support vectors. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Manually expand INDEX_op_dup_vecRichard Henderson2019-05-133-10/+118
| | | | | | | | | This case is similar to INDEX_op_mov_* in that we need to do different things depending on the current location of the source. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Added some commentary to the tcg_reg_alloc_* functions.
* tcg: Promote tcg_out_{dup,dupi}_vec to backend interfaceRichard Henderson2019-05-133-3/+26
| | | | | | | | | | | The i386 backend already has these functions, and the aarch64 backend could easily split out one. Nothing is done with these functions yet, but this will aid register allocation of INDEX_op_dup_vec in a later patch. Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Support cross-class moves without instruction supportRichard Henderson2019-05-131-3/+28
| | | | | | | | | PowerPC Altivec does not support direct moves between vector registers and general registers. So when tcg_out_mov fails, we can use the backing memory for the temporary to perform the move. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Return bool success from tcg_out_movRichard Henderson2019-05-1310-16/+31
| | | | | | | | | | | This patch merely changes the interface, aborting on all failures, of which there are currently none. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/arm: Use tcg_out_mov_reg in tcg_out_movRichard Henderson2019-05-131-1/+1
| | | | | | | | We have a function that takes an additional condition parameter over the standard backend interface. It already takes care of eliding no-op moves. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Assert fixed_reg is read-onlyRichard Henderson2019-05-131-47/+40Star
| | | | | | | | | | The only fixed_reg is cpu_env, and it should not be modified during any TB. Therefore code that tries to special-case moves into a fixed_reg is dead. Remove it. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Specify optional vector requirements with a listRichard Henderson2019-05-134-117/+278
| | | | | | | | | | | | | | | | | Replace the single opcode in .opc with a null-terminated array in .opt_opc. We still require that all opcodes be used with the same .vece. Validate the contents of this list with CONFIG_DEBUG_TCG. All tcg_gen_*_vec functions will check any list active during .fniv expansion. Swap the active list in and out as we expand other opcodes, or take control away from the front-end function. Convert all existing vector aware front ends. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expandedRichard Henderson2019-05-131-16/+33
| | | | | | | | | | PowerPC Altivec does not support add and subtract of 64-bit elements. Prepare for that configuration by not assuming the operation is universally supported. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Do not recreate INDEX_op_neg_vec unless supportedRichard Henderson2019-05-131-2/+6
| | | | | | | | Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec, so that we check the type and vece for the actual operation. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Implement tcg_gen_gvec_3i()David Hildenbrand2019-05-132-0/+163
| | | | | | | | | | | Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed for now. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190416185301.25344-2-david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/arm: Restrict constant pool displacement to 12 bitsRichard Henderson2019-04-251-36/+21Star
| | | | | | | | | This will not necessarily restrict the size of the TB, since for v7 the majority of constant pool usage is for calls from the out-of-line ldst code, which is already at the end of the TB. But this does allow us to save one insn per reference on the off-chance. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/ppc: Allow the constant pool to overflow at 32kRichard Henderson2019-04-241-18/+10Star
| | | | | | | | There is no point in coding for a 2GB offset when the max TB size is already limited to 64k. If we further restrict to 32k then we can eliminate the extra ADDIS instruction. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Restart TB generation after out-of-line ldst overflowRichard Henderson2019-04-249-44/+75
| | | | | | This is part c of relocation overflow handling. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>