summaryrefslogtreecommitdiffstats
path: root/target/i386/ops_sse.h
Commit message (Collapse)AuthorAgeFilesLines
* target/i386: fix phminposuw in-place operationJoseph Myers2017-09-191-3/+3
| | | | | | | | | | | | | | | | | | | | The SSE4.1 phminposuw instruction finds the minimum 16-bit element in the source vector, putting the value of that element in the low 16 bits of the destination vector, the index of that element in the next three bits and zeroing the rest of the destination. The helper for this operation fills the destination from high to low, meaning that when the source and destination are the same register, the minimum source element can be overwritten before it is copied to the destination. This patch fixes it to fill the destination from low to high instead, so the minimum source element is always copied first. This fixes one gcc test failure in my GCC 6-based testing (and so concludes the present sequence of patches, as I don't have any further gcc test failures left in that testing that I attribute to QEMU bugs). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708111422580.11919@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target/i386: fix pcmpxstrx substring searchJoseph Myers2017-09-191-2/+6
| | | | | | | | | | | | | | | | | | | | | | | One of the cases of the SSE4.2 pcmpestri / pcmpestrm / pcmpistri / pcmpistrm instructions does a substring search. The implementation of this case in the pcmpxstrx helper is incorrect. The operation in this case is a search for a string (argument d to the helper) in another string (argument s to the helper); if a copy of d at a particular position would run off the end of s, the resulting output bit should be 0 whether or not the strings match in the region where they overlap, but the QEMU implementation was wrongly comparing only up to the point where s ends and counting it as a match if an initial segment of d matched a terminal segment of s. Here, "run off the end of s" means that some byte of d would overlap some byte outside of s; thus, if d has zero length, it is considered to match everywhere, including after the end of s. This patch fixes the implementation to correspond with the proper instruction semantics. This fixes four gcc test failures in my GCC 6-based testing. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708102139310.8101@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target/i386: fix packusdw in-place operationJoseph Myers2017-09-191-8/+11
| | | | | | | | | | | | | | | | | | | | | The SSE4.1 packusdw instruction combines source and destination vectors of signed 32-bit integers into a single vector of unsigned 16-bit integers, with unsigned saturation. When the source and destination are the same register, this means each 32-bit element of that register is used twice as an input, to produce two of the 16-bit output elements, and so if the operation is carried out element-by-element in-place, no matter what the order in which it is applied to the elements, the first element's operation will overwrite some future input. The helper for packssdw avoids this issue by computing the result in a local temporary and copying it to the destination at the end; this patch fixes the packusdw helper to do likewise. This fixes three gcc test failures in my GCC 6-based testing. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708100023050.9262@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target/i386: fix pmovsx/pmovzx in-place operationsJoseph Myers2017-09-191-7/+7
| | | | | | | | | | | | | | | | | | The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte or 4-byte inputs and sign-extend or zero-extend them to a wider vector output. The associated helpers for these instructions do the extension on each element in turn, starting with the lowest. If the input and output are the same register, this means that all the input elements after the first have been overwritten before they are read. This patch makes the helpers extend starting with the highest element, not the lowest, to avoid such overwriting. This fixes many GCC test failures (161 in the gcc testsuite in my GCC 6-based testing) when testing with a default CPU setting enabling those instructions. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.20.1708082018390.23380@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target-i386: Use ctpop helperRichard Henderson2017-01-101-26/+0Star
| | | | Signed-off-by: Richard Henderson <rth@twiddle.net>
* Move target-* CPU file into a target/ folderThomas Huth2016-12-201-0/+2296
We've currently got 18 architectures in QEMU, and thus 18 target-xxx folders in the root folder of the QEMU source tree. More architectures (e.g. RISC-V, AVR) are likely to be included soon, too, so the main folder of the QEMU sources slowly gets quite overcrowded with the target-xxx folders. To disburden the main folder a little bit, let's move the target-xxx folders into a dedicated target/ folder, so that target-xxx/ simply becomes target/xxx/ instead. Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part] Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part] Acked-by: Michael Walle <michael@walle.cc> [lm32 part] Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part] Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part] Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part] Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part] Acked-by: Richard Henderson <rth@twiddle.net> [alpha part] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part] Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part] Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [cris&microblaze part] Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part] Signed-off-by: Thomas Huth <thuth@redhat.com>