bwlp/qemu.git - Experimental fork of QEMU with video encoding patches

	Commit message (Collapse)	Author	Age	Files	Lines
*	target/ppc: Fix typo in comments	BALATON Zoltan	2020-02-20	1	-3/+3
\| \| \| \| \| \| \| \| \|	"Deferred" was misspelled as "differed" in some comments, correct this typo, Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Message-Id: <20200214155748.0896B745953@zero.eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Fix for optimized vsl/vsr instructions	Stefan Brankovic	2019-10-24	1	-44/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In previous implementation, invocation of TCG shift function could request shift of TCG variable by 64 bits when variable 'sh' is 0, which is not supported in TCG (values can be shifted by 0 to 63 bits). This patch fixes this by using two separate invocation of TCG shift functions, with maximum shift amount of 32. Name of variable 'shifted' is changed to 'carry' so variable naming is similar to old helper implementation. Variables 'avrA' and 'avrB' are replaced with variable 'avr'. Fixes: 4e6d0920e7547e6af4bbac5ffe9adfe6ea621822 Reported-by: "Paul A. Clark" <pc@us.ibm.com> Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Suggested-by: Aleksandar Markovic <aleksandar.markovic@rt-rk.com> Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com> Message-Id: <1570196639-7025-2-git-send-email-stefan.brankovic@rt-rk.com> Tested-by: Paul A. Clarke <pc@us.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	ppc: Add support for 'mffsce' instruction	Paul A. Clarke	2019-10-04	2	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR) instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl. This patch adds support for 'mffsce' instruction. 'mffsce' is identical to 'mffs', except that it also clears the exception enable bits in the FPSCR. On CPUs without support for 'mffsce' (below ISA 3.0), the instruction will execute identically to 'mffs'. Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1568817082-1384-1-git-send-email-pc@us.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	ppc: Add support for 'mffscrn','mffscrni' instructions	Paul A. Clarke	2019-10-04	2	-1/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR) instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl. This patch adds support for 'mffscrn' and 'mffscrni' instructions. 'mffscrn' and 'mffscrni' are similar to 'mffsl', except they do not return the status bits (FI, FR, FPRF) and they also set the rounding mode in the FPSCR. On CPUs without support for 'mffscrn'/'mffscrni' (below ISA 3.0), the instructions will execute identically to 'mffs'. Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Message-Id: <1568817081-1345-1-git-send-email-pc@us.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Refactor emulation of vmrgew and vmrgow instructions	Stefan Brankovic	2019-08-29	1	-29/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since I found this two instructions implemented with tcg, I refactored them so they are consistent with other similar implementations that I introduced in this patch. Also, a new dual macro GEN_VXFORM_TRANS_DUAL is added. This macro is used if one instruction is realized with direct translation, and second one with a helper. Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com> Message-Id: <1566898663-25858-4-git-send-email-stefan.brankovic@rt-rk.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	ppc: Fix xsmaddmdp and friends	Paul A. Clarke	2019-08-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A class of instructions of the form: op Target,A,B which operate like: Target = Target * A + B have a bit set which distinguishes them from instructions that operate as: Target = Target * B + A This bit is not being checked properly (using PPC_BIT macro), so all instructions in this class are operating incorrectly as the second form above. The bit was being checked as if it were part of a 64-bit instruction opcode, rather than a proper 32-bit opcode. Fix by using the macro (PPC_BIT32) which treats the opcode as a 32-bit quantity. Fixes: c9f4e4d8b632 ("target/ppc: improve VSX_FMADD with new GEN_VSX_HELPER_VSX_MADD macro") Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Message-Id: <1566401321-22419-1-git-send-email-pc@us.ibm.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	ppc: Add support for 'mffsl' instruction	Paul A. Clarke	2019-08-21	2	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR) instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl. This patch adds support for 'mffsl'. 'mffsl' is identical to 'mffs', except it only returns mode, status, and enable bits from the FPSCR. On CPUs without support for 'mffsl' (below ISA 3.0), the 'mffsl' instruction will execute identically to 'mffs'. Note: I renamed FPSCR_RN to FPSCR_RN0 so I could create an FPSCR_RN mask which is both bits of the FPSCR rounding mode, as defined in the ISA. I also fixed a typo in the definition of FPSCR_FR. Signed-off-by: Paul A. Clarke <pc@us.ibm.com> v4: - nit: added some braces to resolve a checkpatch complaint. v3: - Changed tcg_gen_and_i64 to tcg_gen_andi_i64, eliminating the need for a temporary, per review from Richard Henderson. v2: - I found that I copied too much of the 'mffs' implementation. The 'Rc' condition code bits are not needed for 'mffsl'. Removed. - I now free the (renamed) 'tmask' temporary. - I now bail early for older ISA to the original 'mffs' implementation. Message-Id: <1565982203-11048-1-git-send-email-pc@us.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Optimize emulation of vclzw instruction	Stefan Brankovic	2019-08-21	1	-1/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize Altivec instruction vclzw (Vector Count Leading Zeros Word). This instruction counts the number of leading zeros of each word element in source register and places result in the appropriate word element of destination register. Counting is to be performed in four iterations of for loop(one for each word elemnt of source register vB). Every iteration consists of loading appropriate word element from source register, counting leading zeros with tcg_gen_clzi_i32, and saving the result in appropriate word element of destination register. Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1563200574-11098-7-git-send-email-stefan.brankovic@rt-rk.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Optimize emulation of vclzd instruction	Stefan Brankovic	2019-08-21	1	-1/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize Altivec instruction vclzd (Vector Count Leading Zeros Doubleword). This instruction counts the number of leading zeros of each doubleword element in source register and places result in the appropriate doubleword element of destination register. Using tcg-s count leading zeros instruction two times(once for each doubleword element of source register vB) and placing result in appropriate doubleword element of destination register vD. Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1563200574-11098-6-git-send-email-stefan.brankovic@rt-rk.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Optimize emulation of vgbbd instruction	Stefan Brankovic	2019-08-21	1	-1/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize altivec instruction vgbbd (Vector Gather Bits by Bytes by Doubleword) All ith bits (i in range 1 to 8) of each byte of doubleword element in source register are concatenated and placed into ith byte of appropriate doubleword element in destination register. Following solution is done for both doubleword elements of source register in parallel, in order to reduce the number of instructions needed(that's why arrays are used): First, both doubleword elements of source register vB are placed in appropriate element of array avr. Bits are gathered in 2x8 iterations(2 for loops). In first iteration bit 1 of byte 1, bit 2 of byte 2,... bit 8 of byte 8 are in their final spots so avr[i], i={0,1} can be and-ed with tcg_mask. For every following iteration, both avr[i] and tcg_mask variables have to be shifted right for 7 and 8 places, respectively, in order to get bit 1 of byte 2, bit 2 of byte 3.. bit 7 of byte 8 in their final spots so shifted avr values(saved in tmp) can be and-ed with new value of tcg_mask... After first 8 iteration(first loop), all the first bits are in their final places, all second bits but second bit from eight byte are in their places... only 1 eight bit from eight byte is in it's place). In second loop we do all operations symmetrically, in order to get other half of bits in their final spots. Results for first and second doubleword elements are saved in result[0] and result[1] respectively. In the end those results are saved in appropriate doubleword element of destination register vD. Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1563200574-11098-5-git-send-email-stefan.brankovic@rt-rk.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Optimize emulation of vsl and vsr instructions	Stefan Brankovic	2019-08-21	1	-2/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimization of altivec instructions vsl and vsr(Vector Shift Left/Rigt). Perform shift operation (left and right respectively) on 128 bit value of register vA by value specified in bits 125-127 of register vB. Lowest 3 bits in each byte element of register vB must be identical or result is undefined. For vsl instruction, the first step is bits 125-127 of register vB have to be saved in variable sh. Then, the highest sh bits of the lower doubleword element of register vA are saved in variable shifted, in order not to lose those bits when shift operation is performed on the lower doubleword element of register vA, which is the next step. After shifting the lower doubleword element shift operation is performed on higher doubleword element of vA, with replacement of the lowest sh bits(that are now 0) with bits saved in shifted. For vsr instruction, firstly, the bits 125-127 of register vB have to be saved in variable sh. Then, the lowest sh bits of the higher doubleword element of register vA are saved in variable shifted, in odred not to lose those bits when the shift operation is performed on the higher doubleword element of register vA, which is the next step. After shifting higher doubleword element, shift operation is performed on lower doubleword element of vA, with replacement of highest sh bits(that are now 0) with bits saved in shifted. Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1563200574-11098-3-git-send-email-stefan.brankovic@rt-rk.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Optimize emulation of lvsl and lvsr instructions	Stefan Brankovic	2019-08-21	1	-32/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding simple macro that is calling tcg implementation of appropriate instruction if altivec support is active. Optimization of altivec instruction lvsl (Load Vector for Shift Left). Place bytes sh:sh+15 of value 0x00 \|\| 0x01 \|\| 0x02 \|\| ... \|\| 0x1E \|\| 0x1F in destination register. Sh is calculated by adding 2 source registers and getting bits 60-63 of result. First, the bits [28-31] are placed from EA to variable sh. After that, the bytes are created in the following way: sh:(sh+7) of X(from description) by multiplying sh with 0x0101010101010101 followed by addition of the result with 0x0001020304050607. Value obtained is placed in higher doubleword element of vD. (sh+8):(sh+15) by adding the result of previous multiplication with 0x08090a0b0c0d0e0f. Value obtained is placed in lower doubleword element of vD. Optimization of altivec instruction lvsr (Load Vector for Shift Right). Place bytes 16-sh:31-sh of value 0x00 \|\| 0x01 \|\| 0x02 \|\| ... \|\| 0x1E \|\| 0x1F in destination register. Sh is calculated by adding 2 source registers and getting bits 60-63 of result. First, the bits [28-31] are placed from EA to variable sh. After that, the bytes are created in the following way: sh:(sh+7) of X(from description) by multiplying sh with 0x0101010101010101 followed by substraction of the result from 0x1011121314151617. Value obtained is placed in higher doubleword element of vD. (sh+8):(sh+15) by substracting the result of previous multiplication from 0x18191a1b1c1d1e1f. Value obtained is placed in lower doubleword element of vD. Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1563200574-11098-2-git-send-email-stefan.brankovic@rt-rk.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: improve VSX_FMADD with new GEN_VSX_HELPER_VSX_MADD macro	Mark Cave-Ayland	2019-07-02	2	-66/+85
\| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a new GEN_VSX_HELPER_VSX_MADD macro for the generator function which enables the source and destination registers to be decoded at translation time. This enables the determination of a or m form to be made at translation time so that a single helper function can now be used for both variants. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-16-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: decode target register in VSX_EXTRACT_INSERT at translation time	Mark Cave-Ayland	2019-07-02	1	-5/+5
\| \| \| \| \| \| \|	Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-15-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: decode target register in VSX_VECTOR_LOAD_STORE_LENGTH at ↵	Mark Cave-Ayland	2019-07-02	1	-23/+24
\| \| \| \| \| \| \| \| \|	translation time Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-14-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce GEN_VSX_HELPER_R2_AB macro to fpu_helper.c	Mark Cave-Ayland	2019-07-02	1	-3/+21
\| \| \| \| \| \| \| \| \| \| \|	Rather than perform the VSR register decoding within the helper itself, introduce a new GEN_VSX_HELPER_R2_AB macro which performs the decode based upon rA and rB at translation time. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-13-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce GEN_VSX_HELPER_R2 macro to fpu_helper.c	Mark Cave-Ayland	2019-07-02	1	-10/+28
\| \| \| \| \| \| \| \| \| \| \|	Rather than perform the VSR register decoding within the helper itself, introduce a new GEN_VSX_HELPER_R2 macro which performs the decode based upon rD and rB at translation time. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-12-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce GEN_VSX_HELPER_R3 macro to fpu_helper.c	Mark Cave-Ayland	2019-07-02	1	-8/+28
\| \| \| \| \| \| \| \| \| \| \|	Rather than perform the VSR register decoding within the helper itself, introduce a new GEN_VSX_HELPER_R3 macro which performs the decode based upon rD, rA and rB at translation time. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-11-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce GEN_VSX_HELPER_X1 macro to fpu_helper.c	Mark Cave-Ayland	2019-07-02	1	-4/+20
\| \| \| \| \| \| \| \| \| \| \|	Rather than perform the VSR register decoding within the helper itself, introduce a new GEN_VSX_HELPER_X1 macro which performs the decode based upon xB at translation time. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-10-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce GEN_VSX_HELPER_X2_AB macro to fpu_helper.c	Mark Cave-Ayland	2019-07-02	1	-6/+24
\| \| \| \| \| \| \| \| \| \| \|	Rather than perform the VSR register decoding within the helper itself, introduce a new GEN_VSX_HELPER_X2_AB macro which performs the decode based upon xA and xB at translation time. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-9-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce GEN_VSX_HELPER_X2 macro to fpu_helper.c	Mark Cave-Ayland	2019-07-02	1	-60/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than perform the VSR register decoding within the helper itself, introduce a new GEN_VSX_HELPER_X2 macro which performs the decode based upon xT and xB at translation time. With the previous change to the xscvqpdp generator and helper functions the opcode parameter is no longer required in the common case and can be removed. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-8-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce separate generator and helper for xscvqpdp	Mark Cave-Ayland	2019-07-02	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than perform the VSR register decoding within the helper itself, introduce a new generator and helper function which perform the decode based upon xT and xB at translation time. The xscvqpdp helper is the only 2 parameter xT/xB implementation that requires the opcode to be passed as an additional parameter, so handling this separately allows us to optimise the conversion in the next commit. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-7-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce GEN_VSX_HELPER_X3 macro to fpu_helper.c	Mark Cave-Ayland	2019-07-02	1	-60/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than perform the VSR register decoding within the helper itself, introduce a new GEN_VSX_HELPER_X3 macro which performs the decode based upon xT, xA and xB at translation time. With the previous changes to the VSX_CMP generator and helper macros the opcode parameter is no longer required in the common case and can be removed. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-6-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce separate VSX_CMP macro for xvcmp* instructions	Mark Cave-Ayland	2019-07-02	1	-8/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than perform the VSR register decoding within the helper itself, introduce a new VSX_CMP macro which performs the decode based upon xT, xA and xB at translation time. Subsequent commits will make the same changes for other instructions however the xvcmp* instructions are different in that they return a set of flags to be optionally written back to the crf[6] register. Move this logic from the helper function to the generator function, along with the float_status update. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190616123751.781-5-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Use tcg_gen_gvec_bitsel	Richard Henderson	2019-06-12	1	-22/+2
\| \| \| \| \| \| \| \|	Replace the target-specific implementation of XXSEL. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190603164927.8336-1-richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Fix lxvw4x, lxvh8x and lxvb16x	Anton Blanchard	2019-06-12	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	During the conversion these instructions were incorrectly treated as stores. We need to use set_cpu_vsr* and not get_cpu_vsr*. Fixes: 8b3b2d75c7c0 ("introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() helpers for VSR register access") Signed-off-by: Anton Blanchard <anton@ozlabs.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20190524065345.25591-1-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Use vector variable shifts for VSL, VSR, VSRA	Richard Henderson	2019-05-29	1	-12/+12
\| \| \| \| \| \| \| \| \|	The gvec expanders take care of masking the shift amount against the element width. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190518191430.21686-2-richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Fix xvabs[sd]p, xvnabs[sd]p, xvneg[sd]p, xvcpsgn[sd]p	Anton Blanchard	2019-05-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	We were using set_cpu_vsr() when we should have used get_cpu_vsr(). Fixes: 8b3b2d75c7c0 ("introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() helpers for VSR register access") Signed-off-by: Anton Blanchard <anton@ozlabs.org> Message-Id: <20190509104912.6b754dff@kryten> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Optimise VSX_LOAD_SCALAR_DS and VSX_VECTOR_LOAD_STORE	Anton Blanchard	2019-05-29	1	-10/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A few small optimisations: In VSX_LOAD_SCALAR_DS() we can don't need to read the VSR via get_cpu_vsrh(). Split VSX_VECTOR_LOAD_STORE() into two functions. Loads only need to write the VSRs (set_cpu_vsr()) and stores only need to read the VSRs (get_cpu_vsr()) Thanks to Mark Cave-Ayland for the suggestions. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Message-Id: <20190509103545.4a7fa71a@kryten> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Fix xxspltib	Anton Blanchard	2019-05-29	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	xxspltib raises a VMX or a VSX exception depending on the register set it is operating on. We had a check, but it was backwards. Fixes: f113283525a4 ("target-ppc: add xxspltib instruction") Signed-off-by: Anton Blanchard <anton@ozlabs.org> Message-Id: <20190509061713.69490488@kryten> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Fix xxbrq, xxbrw	Anton Blanchard	2019-05-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Fix a typo in xxbrq and xxbrw where we put both results into the lower doubleword. Fixes: 8b3b2d75c7c0 ("introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() helpers for VSR register access") Signed-off-by: Anton Blanchard <anton@ozlabs.org> Message-Id: <20190507004811.29968-3-anton@ozlabs.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Fix xvxsigdp	Anton Blanchard	2019-05-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Fix a typo in xvxsigdp where we put both results into the lower doubleword. Fixes: dd977e4f45cb ("target/ppc: Optimize x[sv]xsigdp using deposit_i64()") Signed-off-by: Anton Blanchard <anton@ozlabs.org> Message-Id: <20190507004811.29968-1-anton@ozlabs.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Use tcg_gen_abs_i32	Philippe Mathieu-Daudé	2019-05-14	1	-13/+1
\| \| \| \| \| \|	Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20190423102145.14812-2-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Specify optional vector requirements with a list	Richard Henderson	2019-05-13	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the single opcode in .opc with a null-terminated array in .opt_opc. We still require that all opcodes be used with the same .vece. Validate the contents of this list with CONFIG_DEBUG_TCG. All tcg_gen_*_vec functions will check any list active during .fniv expansion. Swap the active list in and out as we expand other opcodes, or take control away from the front-end function. Convert all existing vector aware front ends. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	target/ppc: Style fixes for translate/spe-impl.inc.c	David Gibson	2019-04-26	1	-5/+9
\| \| \| \| \| \|	Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
*	target/ppc: Style fixes for translate/vmx-impl.inc.c	David Gibson	2019-04-26	1	-11/+15
\| \| \| \| \| \|	Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
*	target/ppc: Style fixes for translate/vsx-impl.inc.c	David Gibson	2019-04-26	1	-7/+8
\| \| \| \| \| \|	Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
*	target/ppc: Style fixes for translate/fp-impl.inc.c	David Gibson	2019-04-26	1	-20/+32
\| \| \| \| \| \|	Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
*	target/ppc: Fix QEMU crash with stxsdx	Greg Kurz	2019-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've been hitting several QEMU crashes while running a fedora29 ppc64le guest under TCG. Each time, this would occur several minutes after the guest reached login: Fedora 29 (Twenty Nine) Kernel 4.20.6-200.fc29.ppc64le on an ppc64le (hvc0) Web console: https://localhost:9090/ localhost login: tcg/tcg.c:3211: tcg fatal error This happens because a bug crept up in the gen_stxsdx() helper when it was converted to use VSR register accessors by commit 8b3b2d75c7c04 "target/ppc: introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() helpers for VSR register access". The code creates a temporary, passes it directly to gen_qemu_st64_i64() and then to set_cpu_vrsh()... which looks like this was mistakenly coded as a load instead of a store. Reverse the logic: read the VSR to the temporary first and then store it to memory. Fixes: 8b3b2d75c7c0481544e277dad226223245e058eb Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155371035249.2038502.12364252604337688538.stgit@bahia.lan> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Optimize x[sv]xsigdp using deposit_i64()	Philippe Mathieu-Daudé	2019-03-12	1	-8/+4
\| \| \| \| \| \|	Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20190309214255.9952-3-f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Optimize xviexpdp() using deposit_i64()	Philippe Mathieu-Daudé	2019-03-12	1	-11/+3
\| \| \| \| \| \| \| \|	The t0 tcg_temp register is now unused, remove it. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20190309214255.9952-2-f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce vsr64_offset() to simplify get_cpu_vsr{l,h}() and ↵	Mark Cave-Ayland	2019-03-12	1	-30/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	set_cpu_vsr{l,h}() Now that all VSX registers are stored in host endian order, there is no need to go via different accessors depending upon the register number. Instead we introduce vsr64_offset() and use it directly from within get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}(). This also allows us to rewrite avr64_offset() and fpr_offset() in terms of the new vsr64_offset() function to more clearly express the relationship between the VSX, FPR and VMX registers, and also remove vsrl_offset() which is no longer required. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20190307180520.13868-8-mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: improve avr64_offset() and use it to simplify ↵	Mark Cave-Ayland	2019-03-12	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	get_avr64()/set_avr64() By using the VsrD macro in avr64_offset() the same offset calculation can be used regardless of the host endian. This allows get_avr64() and set_avr64() to be simplified accordingly. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20190307180520.13868-6-mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce avr_full_offset() function	Mark Cave-Ayland	2019-03-12	2	-16/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All TCG vector operations require pointers to the base address of the vector rather than separate access to the top and bottom 64-bits. Convert the VMX TCG instructions to use a new avr_full_offset() function instead of avr64_offset() which can then itself be written as a simple wrapper onto vsr_full_offset(). This same function can also reused in cpu_avr_ptr() to avoid having more than one copy of the offset calculation logic. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20190307180520.13868-5-mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: introduce single vsrl_offset() function	Mark Cave-Ayland	2019-03-12	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of having multiple copies of the offset calculation logic, move it to a single vsrl_offset() function. This commit also renames the existing get_vsr()/set_vsr() functions to get_vsrl()/set_vsrl() which better describes their purpose. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190307180520.13868-3-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: convert vmin* and vmax* to vector operations	Richard Henderson	2019-02-18	1	-16/+16
\| \| \| \| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190215100058.20015-18-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: convert vadds and vsubs to vector operations	Richard Henderson	2019-02-18	1	-12/+45
\| \| \| \| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190215100058.20015-17-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Add helper_mfvscr	Richard Henderson	2019-02-18	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is required before changing the representation of the register. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190215100058.20015-13-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: Pass integer to helper_mtvscr	Richard Henderson	2019-02-18	1	-4/+13
\| \| \| \| \| \| \| \| \| \|	We can re-use this helper elsewhere if we're not passing in an entire vector register. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190215100058.20015-10-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
*	target/ppc: convert xxsel to vector operations	Richard Henderson	2019-02-18	1	-28/+27
\| \| \| \| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190215100058.20015-9-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>