bwlp/qemu.git - Experimental fork of QEMU with video encoding patches

	Commit message (Collapse)	Author	Age	Files	Lines
*	target/i386: implement FMA instructions	Paolo Bonzini	2022-10-22	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only issue with FMA instructions is that there are _a lot_ of them (30 opcodes, each of which comes in up to 4 versions depending on VEX.W and VEX.L; a total of 96 possibilities). However, they can be implement with only 6 helpers, two for scalar operations and four for packed operations. (Scalar versions do not do any merging; they only affect the bottom 32 or 64 bits of the output operand. Therefore, there is no separate XMM and YMM of the scalar helpers). First, we can reduce the number of helpers to one third by passing four operands (one output and three inputs); the reordering of which operands go to the multiply and which go to the add is done in emit.c. Second, the different instructions also dispatch to the same softfloat function, so the flags for float32_muladd and float64_muladd are passed in the helper as int arguments, with a little extra complication to handle FMADDSUB and FMSUBADD. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: implement F16C instructions	Paolo Bonzini	2022-10-20	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	F16C only consists of two instructions, which are a bit peculiar nevertheless. First, they access only the low half of an YMM or XMM register for the packed-half operand; the exact size still depends on the VEX.L flag. This is similar to the existing avx_movx flag, but not exactly because avx_movx is hardcoded to affect operand 2. To this end I added a "ph" format name; it's possible to reuse this approach for the VPMOVSX and VPMOVZX instructions, though that would also require adding two more formats for the low-quarter and low-eighth of an operand. Second, VCVTPS2PH is somewhat weird because it stores the result of the instruction into memory rather than loading it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: introduce function to set rounding mode from FPCW or MXCSR bits	Paolo Bonzini	2022-10-20	1	-56/+4
\| \| \| \| \| \| \| \| \| \| \|	VROUND, FSTCW and STMXCSR all have to perform the same conversion from x86 rounding modes to softfloat constants. Since the ISA is consistent on the meaning of the two-bit rounding modes, extract the common code into a wrapper for set_float_rounding_mode. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: remove old SSE decoder	Paolo Bonzini	2022-10-18	1	-124/+0
\| \| \| \| \| \| \| \| \|	With all SSE (and AVX!) instructions now implemented in disas_insn_new, it's possible to remove gen_sse, as well as the helpers for instructions that now use gvec. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: reimplement 0x0f 0x10-0x17, add AVX	Paolo Bonzini	2022-10-18	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are mostly moves, and yet are a total pain. The main issue is that: 1) some instructions are selected by mod==11 (register operand) vs. mod=00/01/10 (memory operand) 2) stores to memory are two-operand operations, while the 3-register and load-from-memory versions operate on the entire contents of the destination; this makes it easier to separate the gen_* function for the store case 3) it's inefficient to load into xmm_T0 only to move the value out again, so the gen_* function for the load case is separated too The manual also has various mistakes in the operands here, for example the store case of MOVHPS operates on a 128-bit source (albeit discarding the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq. Likewise for the destination and source of MOVHLPS. VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ, but encoded as prefixes rather than separate operands. The helpers can be reused however. For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as helpers. I named the helper for MOVDDUP "movdldup" in preparation for possible future introduction of MOVDHDUP and to clarify the similarity with MOVSLDUP. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: reimplement 0x0f 0x38, add AVX	Paolo Bonzini	2022-10-18	1	-1/+187
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several special cases here: 1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov. 2) some instructions, such as variable-width shifts, select the vector element size via REX.W. 3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands. 3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order. X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers that accept a Reg* but have a M argument. These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: reimplement 0x0f 0x3a, add AVX	Paolo Bonzini	2022-10-18	1	-0/+95
\| \| \| \| \| \| \| \| \| \| \| \| \|	The more complicated operations here are insertions and extractions. Otherwise, there are just more entries than usual because the PS/PD/SS/SD variations are encoded in the opcode rater than in the prefixes. These three-byte opcodes also include AVX new instructions, whose implementation in the helpers was originally done by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: clarify (un)signedness of immediates from 0F3Ah opcodes	Paolo Bonzini	2022-10-18	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	Three-byte opcodes from the 0F3Ah area all have an immediate byte which is usually unsigned. Clarify in the helper code that it is unsigned; the new decoder treats immediates as signed by default, and seeing an intN_t in the prototype might give the wrong impression that one can use decode->immediate directly. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Introduce 256-bit vector helpers	Paolo Bonzini	2022-10-18	1	-0/+5
\| \| \| \| \| \| \| \| \|	The new implementation of SSE will cover AVX from the get go, because all the work for the helper functions is already done. We just need to build them. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: implement additional AVX comparison operators	Paolo Bonzini	2022-10-18	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \|	The new implementation of SSE will cover AVX from the get go, so include the 24 extra comparison operators that are only available with the VEX prefix. Based on a patch by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: provide 3-operand versions of unary scalar helpers	Paolo Bonzini	2022-10-18	1	-8/+40
\| \| \| \| \| \| \| \| \| \| \|	Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the old SSE decoder to be compatible with new-style helpers. The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: support operand merging in binary scalar helpers	Paolo Bonzini	2022-10-18	1	-0/+16
\| \| \| \| \| \| \| \| \|	Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the helpers to provide this functionality. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: extend helpers to support VEX.V 3- and 4- operand encodings	Paolo Bonzini	2022-10-18	1	-107/+66
\| \| \| \| \| \| \| \| \| \|	Add to the helpers all the operands that are needed to implement AVX. Extracted from a patch by Paul Brook <paul@nowt.org>. Message-Id: <20220424220204.2493824-26-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: fix INSERTQ implementation	Paolo Bonzini	2022-09-19	1	-5/+5
\| \| \| \| \| \| \| \| \|	INSERTQ is defined to not modify any bits in the lower 64 bits of the destination, other than the ones being replaced with bits from the source operand. QEMU instead is using unshifted bits from the source for those bits. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: correctly mask SSE4a bit indices in register operands	Paolo Bonzini	2022-09-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	SSE4a instructions EXTRQ and INSERTQ have two bit index operands, that can be immediates or taken from an XMM register. In both cases, the fields are 6-bit wide and the top two bits in the byte are ignored. translate.c is doing that correctly for the immediate case, but not for the XMM case, so fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: AVX+AES helpers prep	Paul Brook	2022-09-01	1	-19/+22
\| \| \| \| \| \| \| \| \| \|	Make the AES vector helpers AVX ready No functional changes to existing helpers Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-22-paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: AVX pclmulqdq prep	Paul Brook	2022-09-01	1	-7/+22
\| \| \| \| \| \| \| \| \|	Make the pclmulqdq helper AVX ready Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-21-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Rewrite blendv helpers	Paul Brook	2022-09-01	1	-62/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Rewrite the blendv helpers so that they can easily be extended to support the AVX encodings, which make all 4 arguments explicit. No functional changes to the existing helpers Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-20-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Misc AVX helper prep	Paul Brook	2022-09-01	1	-49/+94
\| \| \| \| \| \| \| \| \| \| \| \|	Fixup various vector helpers that either trivially exten to 256 bit, or don't have 256 bit variants. No functional changes to existing helpers Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-19-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Destructive FP helpers for AVX	Paul Brook	2022-09-01	1	-58/+43
\| \| \| \| \| \| \| \| \| \| \| \| \|	Perpare the horizontal atithmetic vector helpers for AVX These currently use a dummy Reg typed variable to store the result then assign the whole register. This will cause 128 bit operations to corrupt the upper half of the register, so replace it with explicit temporaries and element assignments. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-18-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Dot product AVX helper prep	Paul Brook	2022-09-01	1	-35/+45
\| \| \| \| \| \| \| \| \| \| \| \|	Make the dpps and dppd helpers AVX-ready I can't see any obvious reason why dppd shouldn't work on 256 bit ymm registers, but both AMD and Intel agree that it's xmm only. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-17-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: reimplement AVX comparison helpers	Paul Brook	2022-09-01	1	-47/+56
\| \| \| \| \| \| \| \| \| \| \|	AVX includes an additional set of comparison predicates, some of which our softfloat implementation does not expose as separate functions. Rewrite the helpers in terms of floatN_compare for future extensibility. Signed-off-by: Paul Brook <paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220424220204.2493824-24-paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Floating point arithmetic helper AVX prep	Paul Brook	2022-09-01	1	-41/+87
\| \| \| \| \| \| \| \| \| \| \|	Prepare the "easy" floating point vector helpers for AVX No functional changes to existing helpers. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-16-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Destructive vector helpers for AVX	Paul Brook	2022-09-01	1	-301/+269
\| \| \| \| \| \| \| \| \| \| \| \|	These helpers need to take special care to avoid overwriting source values before the wole result has been calculated. Currently they use a dummy Reg typed variable to store the result then assign the whole register. This will cause 128 bit operations to corrupt the upper half of the register, so replace it with explicit temporaries and element assignments. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-14-paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Misc integer AVX helper prep	Paul Brook	2022-09-01	1	-86/+82
\| \| \| \| \| \| \| \| \| \| \|	More preparatory work for AVX support in various integer vector helpers No functional changes to existing helpers. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-13-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Rewrite simple integer vector helpers	Paul Brook	2022-09-01	1	-54/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rewrite the "simple" vector integer helpers in preperation for AVX support. While the current code is able to use the same prototype for unary (a = F(b)) and binary (a = F(b, c)) operations, future changes will cause them to diverge. No functional changes to existing helpers Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-12-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Rewrite vector shift helper	Paul Brook	2022-09-01	1	-131/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rewrite the vector shift helpers in preperation for AVX support (3 operand form and 256 bit vectors). For now keep the existing two operand interface. No functional changes to existing helpers. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-11-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: rewrite destructive 3DNow operations	Paolo Bonzini	2022-09-01	1	-16/+16
\| \| \| \| \| \| \|	Remove use of the MOVE macro, since it will be purged from MMX/SSE as well. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: Add size suffix to vector FP helpers	Paolo Bonzini	2022-09-01	1	-24/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	For AVX we're going to need both 128 bit (xmm) and 256 bit (ymm) variants of floating point helpers. Add the register type suffix to the existing PS and PD helpers (SS and SD variants are only valid on 128 bit vectors) No functional changes. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-15-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: DPPS rounding fix	Paolo Bonzini	2022-09-01	1	-32/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DPPS (Dot Product) instruction is defined to first sum pairs of intermediate results, then sum those values to get the final result. i.e. (A+B)+(C+D) We incrementally sum the results, i.e. ((A+B)+C)+D, which can result in incorrect rouding. For consistency, also change the variable names to the ones used in the Intel SDM and implement DPPD following the manual. Based on a patch by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: fix PHSUB* instructions with dest=src	Paolo Bonzini	2022-09-01	1	-20/+29
\| \| \| \| \| \| \| \|	The computation must not overwrite neither the destination nor the source before the last element has been computed. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	i386: pcmpestr 64-bit sign extension bug	Paul Brook	2022-04-28	1	-11/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The abs1 function in ops_sse.h only works sorrectly when the result fits in a signed int. This is fine most of the time because we're only dealing with byte sized values. However pcmp_elen helper function uses abs1 to calculate the absolute value of a cpu register. This incorrectly truncates to 32 bits, and will give the wrong anser for the most negative value. Fix by open coding the saturation check before taking the absolute value. Signed-off-by: Paul Brook <paul@nowt.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: do not access beyond the low 128 bits of SSE registers	Paolo Bonzini	2022-04-13	1	-28/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The i386 target consolidates all vector registers so that instead of XMMReg, YMMReg and ZMMReg structs there is a single ZMMReg that can fit all of SSE, AVX and AVX512. When TCG copies data from and to the SSE registers, it uses the full 64-byte width. This is not a correctness issue because TCG never lets guest code see beyond the first 128 bits of the ZMM registers, however it causes uninitialized stack memory to make it to the CPU's migration stream. Fix it by only copying the low 16 bytes of the ZMMReg union into the destination register. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	x86 tcg cpus: Fix Lesser GPL version number	Chetan Pant	2020-11-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Message-Id: <20201023122801.19514-1-chetan4windows@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
*	target/i386: fix IEEE SSE floating-point exception raising	Joseph Myers	2020-07-11	1	-12/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SSE instruction implementations all fail to raise the expected IEEE floating-point exceptions because they do nothing to convert the exception state from the softfloat machinery into the exception flags in MXCSR. Fix this by adding such conversions. Unlike for x87, emulated SSE floating-point operations might be optimized using hardware floating point on the host, and so a different approach is taken that is compatible with such optimizations. The required invariant is that all exceptions set in env->sse_status (other than "denormal operand", for which the SSE semantics are different from those in the softfloat code) are ones that are set in the MXCSR; the emulated MXCSR is updated lazily when code reads MXCSR, while when code sets MXCSR, the exceptions in env->sse_status are set accordingly. A few instructions do not raise all the exceptions that would be raised by the softfloat code, and those instructions are made to save and restore the softfloat exception state accordingly. Nothing is done about "denormal operand"; setting that (only for the case when input denormals are not flushed to zero, the opposite of the logic in the softfloat code for such an exception) will require custom code for relevant instructions, or else architecture-specific conditionals in the softfloat code for when to set such an exception together with custom code for various SSE conversion and rounding instructions that do not set that exception. Nothing is done about trapping exceptions (for which there is minimal and largely broken support in QEMU's emulation in the x87 case and no support at all in the SSE case). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2006252358000.3832@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: correct fix for pcmpxstrx substring search	Joseph Myers	2020-06-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri / pcmpestrm / pcmpistri / pcmpistrm substring search, commit ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60. That commit fixed a bug that showed up in four GCC tests with one libc implementation. The tests in question generate random inputs to the intrinsics and compare results to a C implementation, but they only test 1024 possible random inputs, and when the tests use the cases of those instructions that work with word rather than byte inputs, it's easy to have problematic cases that show up much less frequently than that. Thus, testing with a different libc implementation, and so a different random number generator, showed up a problem with the previous patch. When investigating the previous test failures, I found the description of these instructions in the Intel manuals (starting from computing a 16x16 or 8x8 set of comparison results) confusing and hard to match up with the more optimized implementation in QEMU, and referred to AMD manuals which described the instructions in a different way. Those AMD descriptions are very explicit that the whole of the string being searched for must be found in the other operand, not running off the end of that operand; they say "If the prototype and the SUT are equal in length, the two strings must be identical for the comparison to be TRUE.". However, that statement is incorrect. In my previous commit message, I noted: The operation in this case is a search for a string (argument d to the helper) in another string (argument s to the helper); if a copy of d at a particular position would run off the end of s, the resulting output bit should be 0 whether or not the strings match in the region where they overlap, but the QEMU implementation was wrongly comparing only up to the point where s ends and counting it as a match if an initial segment of d matched a terminal segment of s. Here, "run off the end of s" means that some byte of d would overlap some byte outside of s; thus, if d has zero length, it is considered to match everywhere, including after the end of s. The description "some byte of d would overlap some byte outside of s" is accurate only when understood to refer to overlapping some byte within the 16-byte operand but at or after the zero terminator; it is valid to run over the end of s if the end of s is the end of the 16-byte operand. So the fix in the previous patch for the case of d being empty was correct, but the other part of that patch was not correct (as it never allowed partial matches even at the end of the 16-byte operand). Nor was the code before the previous patch correct for the case of d nonempty, as it would always have allowed partial matches at the end of s. Fix with a partial revert of my previous change, combined with inserting a check for the special case of s having maximum length to determine where it is necessary to check for matches. In the added test, test 1 is for the case of empty strings, which failed before my 2017 patch, test 2 is for the bug introduced by my 2017 patch and test 3 deals with the case where a match of an initial segment at the end of the string is not valid when the string ends before the end of the 16-byte operand (that is, the case that would be broken by a simple revert of the non-empty-string part of my 2017 patch). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2006121344290.9881@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: fix phadd* with identical destination and source register	Janne Grunau	2020-06-10	1	-20/+33
\| \| \| \| \| \| \| \| \| \| \|	Detected by asm test suite failures in dav1d (https://code.videolan.org/videolan/dav1d). Can be reproduced by `qemu-x86_64 -cpu core2duo ./tests/checkasm --test=mc_8bpc 1659890620`. Signed-off-by: Janne Grunau <j@jannau.net> Message-Id: <20200401225253.30745-1-j@jannau.net> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	softfloat: Name compare relation enum	Richard Henderson	2020-05-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	Give the previously unnamed enum a typedef name. Use it in the prototypes of compare functions. Use it to hold the results of the compare functions. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	target/i386: Return 'indefinite integer value' for invalid SSE fp->int ↵	Peter Maydell	2019-08-20	1	-28/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	conversions The x86 architecture requires that all conversions from floating point to integer which raise the 'invalid' exception (infinities of both signs, NaN, and all values which don't fit in the destination integer) return what the x86 spec calls the "indefinite integer value", which is 0x8000_0000 for 32-bits or 0x8000_0000_0000_0000 for 64-bits. The softfloat functions return the more usual behaviour of positive overflows returning the maximum value that fits in the destination integer format and negative overflows returning the minimum value that fits. Wrap the softfloat functions in x86-specific versions which detect the 'invalid' condition and return the indefinite integer. Note that we don't use these wrappers for the 3DNow! pf2id and pf2iw instructions, which do return the minimum value that fits in an int32 if the input float is a large negative number. Fixes: https://bugs.launchpad.net/qemu/+bug/1815423 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20190805180332.10185-1-peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: fix phminposuw in-place operation	Joseph Myers	2017-09-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SSE4.1 phminposuw instruction finds the minimum 16-bit element in the source vector, putting the value of that element in the low 16 bits of the destination vector, the index of that element in the next three bits and zeroing the rest of the destination. The helper for this operation fills the destination from high to low, meaning that when the source and destination are the same register, the minimum source element can be overwritten before it is copied to the destination. This patch fixes it to fill the destination from low to high instead, so the minimum source element is always copied first. This fixes one gcc test failure in my GCC 6-based testing (and so concludes the present sequence of patches, as I don't have any further gcc test failures left in that testing that I attribute to QEMU bugs). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708111422580.11919@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: fix pcmpxstrx substring search	Joseph Myers	2017-09-19	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One of the cases of the SSE4.2 pcmpestri / pcmpestrm / pcmpistri / pcmpistrm instructions does a substring search. The implementation of this case in the pcmpxstrx helper is incorrect. The operation in this case is a search for a string (argument d to the helper) in another string (argument s to the helper); if a copy of d at a particular position would run off the end of s, the resulting output bit should be 0 whether or not the strings match in the region where they overlap, but the QEMU implementation was wrongly comparing only up to the point where s ends and counting it as a match if an initial segment of d matched a terminal segment of s. Here, "run off the end of s" means that some byte of d would overlap some byte outside of s; thus, if d has zero length, it is considered to match everywhere, including after the end of s. This patch fixes the implementation to correspond with the proper instruction semantics. This fixes four gcc test failures in my GCC 6-based testing. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708102139310.8101@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: fix packusdw in-place operation	Joseph Myers	2017-09-19	1	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SSE4.1 packusdw instruction combines source and destination vectors of signed 32-bit integers into a single vector of unsigned 16-bit integers, with unsigned saturation. When the source and destination are the same register, this means each 32-bit element of that register is used twice as an input, to produce two of the 16-bit output elements, and so if the operation is carried out element-by-element in-place, no matter what the order in which it is applied to the elements, the first element's operation will overwrite some future input. The helper for packssdw avoids this issue by computing the result in a local temporary and copying it to the destination at the end; this patch fixes the packusdw helper to do likewise. This fixes three gcc test failures in my GCC 6-based testing. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708100023050.9262@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target/i386: fix pmovsx/pmovzx in-place operations	Joseph Myers	2017-09-19	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte or 4-byte inputs and sign-extend or zero-extend them to a wider vector output. The associated helpers for these instructions do the extension on each element in turn, starting with the lowest. If the input and output are the same register, this means that all the input elements after the first have been overwritten before they are read. This patch makes the helpers extend starting with the highest element, not the lowest, to avoid such overwriting. This fixes many GCC test failures (161 in the gcc testsuite in my GCC 6-based testing) when testing with a default CPU setting enabling those instructions. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708082018390.23380@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	target-i386: Use ctpop helper	Richard Henderson	2017-01-10	1	-26/+0
\| \| \| \|	Signed-off-by: Richard Henderson <rth@twiddle.net>
*	Move target-* CPU file into a target/ folder	Thomas Huth	2016-12-20	1	-0/+2296
	We've currently got 18 architectures in QEMU, and thus 18 target-xxx folders in the root folder of the QEMU source tree. More architectures (e.g. RISC-V, AVR) are likely to be included soon, too, so the main folder of the QEMU sources slowly gets quite overcrowded with the target-xxx folders. To disburden the main folder a little bit, let's move the target-xxx folders into a dedicated target/ folder, so that target-xxx/ simply becomes target/xxx/ instead. Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part] Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part] Acked-by: Michael Walle <michael@walle.cc> [lm32 part] Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part] Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part] Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part] Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part] Acked-by: Richard Henderson <rth@twiddle.net> [alpha part] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part] Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part] Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [cris&microblaze part] Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part] Signed-off-by: Thomas Huth <thuth@redhat.com>