bwlp/qemu.git - Experimental fork of QEMU with video encoding patches

	Commit message (Collapse)	Author	Age	Files	Lines
*	softfloat: Define convert operations for bfloat16	LIU Zhiwei	2020-08-28	1	-0/+223
\| \| \| \| \| \| \| \|	Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200813071421.2509-3-zhiwei_liu@c-sky.com> [rth: Use FloatRoundMode for conversion functions.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Define operations for bfloat16	LIU Zhiwei	2020-08-28	1	-0/+168
\| \| \| \| \| \| \| \| \| \| \| \|	This patch implements operations for bfloat16 except conversion and some misc operations. We also add FloatFmt and pack/unpack interfaces for bfloat16. As they are both static fields, we can't make a sperate patch for them. Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200813071421.2509-2-zhiwei_liu@c-sky.com> [rth: Use FloatRelation for comparison operations.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Add fp16 and uint8/int8 conversion functions	Frank Chang	2020-08-28	1	-0/+34
\| \| \| \| \| \| \|	Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Frank Chang <frank.chang@sifive.com> Message-Id: <1596102747-20226-4-git-send-email-chihmin.chao@sifive.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: pass float_status pointer to pickNaN	Max Filippov	2020-08-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Pass float_status structure pointer to the pickNaN so that machine-specific settings are available to NaN selection code. Add use_first_nan property to float_status and use it in Xtensa-specific pickNaN. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: "Alex Bennée" <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
*	meson: rename included C source files to .c.inc	Paolo Bonzini	2020-08-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With Makefiles that have automatically generated dependencies, you generated includes are set as dependencies of the Makefile, so that they are built before everything else and they are available when first building the .c files. Alternatively you can use a fine-grained dependency, e.g. target/arm/translate.o: target/arm/decode-neon-shared.inc.c With Meson you have only one choice and it is a third option, namely "build at the beginning of the corresponding target"; the way you express it is to list the includes in the sources of that target. The problem is that Meson decides if something is a source vs. a generated include by looking at the extension: '.c', '.cc', '.m', '.C' are sources, while everything else is considered an include---including '.inc.c'. Use '.c.inc' to avoid this, as it is consistent with our other convention of using '.rst.inc' for included reStructuredText files. The editorconfig file is adjusted. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	softfloat: return low bits of quotient from floatx80_modrem	Joseph Myers	2020-06-26	1	-5/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both x87 and m68k need the low parts of the quotient for their remainder operations. Arrange for floatx80_modrem to track those bits and return them via a pointer. The architectures using float32_rem and float64_rem do not appear to need this information, so the *_rem interface is left unchanged and the information returned only from floatx80_modrem. The logic used to determine the low 7 bits of the quotient for m68k (target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it looks at the result of converting the remainder to integer, the quotient having been discarded by that point); this patch does not change that, but the m68k maintainers may wish to do so. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <alpine.DEB.2.21.2006081656500.23637@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	softfloat: do not set denominator high bit for floatx80 remainder	Joseph Myers	2020-06-26	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	The floatx80 remainder implementation unnecessarily sets the high bit of bSig explicitly. By that point in the function, arguments that are invalid, zero, infinity or NaN have already been handled and subnormals have been through normalizeFloatx80Subnormal, so the high bit will already be set. Remove the unnecessary code. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <alpine.DEB.2.21.2006081656220.23637@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	softfloat: do not return pseudo-denormal from floatx80 remainder	Joseph Myers	2020-06-26	1	-3/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	The floatx80 remainder implementation sometimes returns the numerator unchanged when the denominator is sufficiently larger than the numerator. But if the value to be returned unchanged is a pseudo-denormal, that is incorrect. Fix it to normalize the numerator in that case. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <alpine.DEB.2.21.2006081655520.23637@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	softfloat: fix floatx80 remainder pseudo-denormal check for zero	Joseph Myers	2020-06-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The floatx80 remainder implementation ignores the high bit of the significand when checking whether an operand (numerator) with zero exponent is zero. This means it mishandles a pseudo-denormal representation of 0x1p-16382L by treating it as zero. Fix this by checking the whole significand instead. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <alpine.DEB.2.21.2006081655180.23637@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	softfloat: merge floatx80_mod and floatx80_rem	Joseph Myers	2020-06-26	1	-11/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The m68k-specific softfloat code includes a function floatx80_mod that is extremely similar to floatx80_rem, but computing the remainder based on truncating the quotient toward zero rather than rounding it to nearest integer. This is also useful for emulating the x87 fprem and fprem1 instructions. Change the floatx80_rem implementation into floatx80_modrem that can perform either operation, with both floatx80_rem and floatx80_mod as thin wrappers available for all targets. There does not appear to be any use for the _mod operation for other floating-point formats in QEMU (the only other architectures using _rem at all are linux-user/arm/nwfpe, for FPA emulation, and openrisc, for instructions that have been removed in the latest version of the architecture), so no change is made to the code for other formats. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <alpine.DEB.2.21.2006081654280.23637@digraph.polyomino.org.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	fpu/softfloat: Silence 'bitwise negation of boolean expression' warning	Philippe Mathieu-Daudé	2020-06-18	1	-9/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When building with clang version 10.0.0-4ubuntu1, we get: CC lm32-softmmu/fpu/softfloat.o fpu/softfloat.c:3365:13: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation] absZ &= ~ ( ( ( roundBits ^ 0x40 ) == 0 ) & roundNearestEven ); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fpu/softfloat.c:3423:18: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation] absZ0 &= ~ ( ( (uint64_t) ( absZ1<<1 ) == 0 ) & roundNearestEven ); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... fpu/softfloat.c:4273:18: error: bitwise negation of a boolean expression; did you mean logical negation? [-Werror,-Wbool-operation] zSig1 &= ~ ( ( zSig2 + zSig2 == 0 ) & roundNearestEven ); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix by rewriting the fishy bitwise AND of two bools as an int. Suggested-by: Eric Blake <eblake@redhat.com> Buglink: https://bugs.launchpad.net/bugs/1881004 Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20200617201309.1640952-2-richard.henderson@linaro.org Message-Id: <20200528155420.9802-1-philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	softfloat: Inline floatx80 compare specializations	Richard Henderson	2020-05-19	1	-257/+0
\| \| \| \| \| \| \| \| \|	Replace the floatx80 compare specializations with inline functions that call the standard floatx80_compare{,_quiet} functions. Use bool as the return type. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Inline float128 compare specializations	Richard Henderson	2020-05-19	1	-238/+0
\| \| \| \| \| \| \| \| \|	Replace the float128 compare specializations with inline functions that call the standard float128_compare{,_quiet} functions. Use bool as the return type. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Inline float64 compare specializations	Richard Henderson	2020-05-19	1	-220/+0
\| \| \| \| \| \| \| \| \|	Replace the float64 compare specializations with inline functions that call the standard float64_compare{,_quiet} functions. Use bool as the return type. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Inline float32 compare specializations	Richard Henderson	2020-05-19	1	-216/+0
\| \| \| \| \| \| \| \| \|	Replace the float32 compare specializations with inline functions that call the standard float32_compare{,_quiet} functions. Use bool as the return type. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Name compare relation enum	Richard Henderson	2020-05-19	1	-18/+22
\| \| \| \| \| \| \| \| \| \|	Give the previously unnamed enum a typedef name. Use it in the prototypes of compare functions. Use it to hold the results of the compare functions. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Name rounding mode enum	Richard Henderson	2020-05-19	1	-23/+34
\| \| \| \| \| \| \| \| \| \| \| \|	Give the previously unnamed enum a typedef name. Use the packed attribute so that we do not affect the layout of the float_status struct. Use it in the prototypes of relevant functions. Adjust switch statements as necessary to avoid compiler warnings. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Change tininess_before_rounding to bool	Richard Henderson	2020-05-19	1	-34/+20
\| \| \| \| \| \| \| \|	Slightly tidies the usage within softfloat.c and the representation in float_status. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Replace flag with bool	Richard Henderson	2020-05-19	1	-96/+94
\| \| \| \| \| \| \| \|	We have had this on the to-do list for quite some time. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Use post test for floatN_mul	Richard Henderson	2020-05-19	1	-51/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	The existing f{32,64}_addsub_post test, which checks for zero inputs, is identical to f{32,64}_mul_fast_test. Which means we can eliminate the fast_test/fast_op hooks in favor of reusing the same post hook. This means we have one fewer test along the fast path for multiply. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: fix floatx80 pseudo-denormal round to integer	Joseph Myers	2020-05-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The softfloat function floatx80_round_to_int incorrectly handles the case of a pseudo-denormal where only the high bit of the significand is set, ignoring that bit (treating the number as an exact zero) rather than treating the number as an alternative representation of +/- 2^-16382 (which may round to +/- 1 depending on the rounding mode) as hardware does. Fix this check (simplifying the code in the process). Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2005042339420.22972@digraph.polyomino.org.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: fix floatx80 pseudo-denormal comparisons	Joseph Myers	2020-05-15	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	The softfloat floatx80 comparisons fail to allow for pseudo-denormals, which should compare equal to corresponding values with biased exponent 1 rather than 0. Add an adjustment for that case when comparing numbers with the same sign. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2005042338470.22972@digraph.polyomino.org.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: fix floatx80 pseudo-denormal addition / subtraction	Joseph Myers	2020-05-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The softfloat function addFloatx80Sigs, used for addition of values with the same sign and subtraction of values with opposite sign, fails to handle the case where the two values both have biased exponent zero and there is a carry resulting from adding the significands, which can occur if one or both values are pseudo-denormals (biased exponent zero, explicit integer bit 1). Add a check for that case, so making the results match those seen on x86 hardware for pseudo-denormals. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2005042337570.22972@digraph.polyomino.org.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: silence sNaN for conversions to/from floatx80	Joseph Myers	2020-05-15	1	-6/+18
\| \| \| \| \| \| \| \| \| \| \|	Conversions between IEEE floating-point formats should convert signaling NaNs to quiet NaNs. Most of those in QEMU's softfloat code do so, but those for floatx80 fail to. Fix those conversions to silence signaling NaNs as well. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Message-Id: <alpine.DEB.2.21.2005042336170.22972@digraph.polyomino.org.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Fix BAD_SHIFT from normalizeFloatx80Subnormal	Richard Henderson	2020-04-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	All other calls to normalize*Subnormal detect zero input before the call -- this is the only outlier. This case can happen with +0.0 + +0.0 = +0.0 or -0.0 + -0.0 = -0.0, so return a zero of the correct sign. Reported-by: Coverity (CID 1421991) Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20200327232042.10008-1-richard.henderson@linaro.org> Message-Id: <20200403191150.863-8-alex.bennee@linaro.org>
*	softfp: Added hardfloat conversion from float32 to float64	Matus Kysel	2019-10-30	1	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reintroduce float32_to_float64 that was removed here: https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg00455.html - nbench test it not actually calling this function at all - SPECS 2006 significat number of tests impoved their runtime, just few of them showed small slowdown Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Matus Kysel <mkysel@tachyum.com> Message-Id: <20191017142133.59439-1-mkysel@tachyum.com> [rth: Add comment about impossible inexact exceptions.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	fpu: rename softfloat-specialize.h -> .inc.c	Alex Bennée	2019-08-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This is not a normal header and should only be included in the main softfloat.c file to bring in the various target specific specialisations. Indeed as it contains non-inlined C functions it is not even a legal header. Rename it to match our included C convention. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
*	fpu: replace LIT64 with UINT64_C macros	Alex Bennée	2019-08-19	1	-59/+59
\| \| \| \| \| \| \| \| \|	In our quest to eliminate the home rolled LIT64 macro we fixup usage inside the softfloat code. While we are at it we remove some of the extraneous spaces to closer fit the house style. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
*	fpu: use min/max values from stdint.h for integral overflow	Alex Bennée	2019-08-19	1	-17/+15
\| \| \| \| \| \| \| \| \|	Remove some more use of LIT64 while making the meaning more clear. We also avoid the need of casts as the results by definition fit into the return type. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
*	fpu: convert float[16/32/64]_squash_denormal to new modern style	Alex Bennée	2019-08-19	1	-63/+49
\| \| \| \| \| \| \| \| \| \|	This also allows us to remove the extractFloat16exp/frac helpers. We avoid using the floatXX_pack_raw functions as they are slight overkill for masking out all but the top bit of the number. The generated code is almost exactly the same as makes no difference to the pre-conversion code. Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	hardfloat: fix float32/64 fused multiply-add	Kito Cheng	2019-03-25	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before falling back to softfloat FMA, we do not restore the original values of inputs A and C. Fix it. This bug was caught by running gcc's testsuite on RISC-V qemu. Note that this change gives a small perf increase for fp-bench: Host: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz Command: perf stat -r 3 taskset -c 0 ./fp-bench -o mulAdd -p $prec - $prec = single: - before: 101.71 MFlops 102.18 MFlops 100.96 MFlops - after: 103.63 MFlops 103.05 MFlops 102.96 MFlops - $prec = double: - before: 173.10 MFlops 173.93 MFlops 172.11 MFlops - after: 178.49 MFlops 178.88 MFlops 178.66 MFlops Signed-off-by: Kito Cheng <kito.cheng@gmail.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190322204320.17777-1-cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	softfloat: Support float_round_to_odd more places	Richard Henderson	2019-02-26	1	-5/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously this was only supported for roundAndPackFloat64. New support in round_canonical, round_to_int, float128_round_to_int, roundAndPackFloat32, roundAndPackInt32, roundAndPackInt64, roundAndPackUint64. This does not include any of the floatx80 routines, as we do not have users for that rounding mode there. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190215170225.15537-1-richard.henderson@linaro.org> Tested-by: David Hildenbrand <david@redhat.com> [AJB: add missing break] Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	softfloat: Implement float128_to_uint32	David Hildenbrand	2019-02-26	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \|	Handling it just like float128_to_uint32_round_to_zero, that hopefully is free of bugs :) Documentation basically copied from float128_to_uint64 Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	softfloat: enforce softfloat if the host's FMA is broken	Emilio G. Cota	2019-01-22	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \|	The added branch to the FMA ops is marked as unlikely and therefore its impact on performance (measured with fp-bench) is within noise range when measured on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz. Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	hardfloat: implement float32/64 comparison	Emilio G. Cota	2018-12-17	1	-14/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance results for fp-bench: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: cmp-single: 110.98 MFlops cmp-double: 107.12 MFlops - after: cmp-single: 506.28 MFlops cmp-double: 524.77 MFlops Note that flattening both eq and eq_signaling versions would give us extra performance (695v506, 615v524 Mflops for single/double, respectively) but this would emit two essentially identical functions for each eq/signaling pair, which is a waste. Aggregate performance improvement for the last few patches: [ all charts in png: https://imgur.com/a/4yV8p ] 1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz qemu-aarch64 NBench score; higher is better Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz 16 +-+-----------+-------------+----===-------+---===-------+-----------+-+ 14 +-+..........................@@@&&.=.......@@@&&.=...................+-+ 12 +-+..........................@.@.&.=.......@.@.&.=.....+befor=== +-+ 10 +-+..........................@.@.&.=.......@.@.&.=.....+ad@@&& = +-+ 8 +-+.......................$$$%.@.&.=.......@.@.&.=.....+ @@u& = +-+ 6 +-+............@@@&&=+*##.$%.@.&.=##$$%+@.&.=..###$$%%@i& = +-+ 4 +-+.......###$%%.@.&=...#.$%.@.&.=..#.$%.@.&.=+.#+$ +@m& = +-+ 2 +-+.....*.#$.%.@.&=...#.$%.@.&.=..#.$%.@.&.=..#+$+sqr& = +-+ 0 +-+-----##$%%@@&&=-##$$%@@&&==##$$%@@&&==-##$$%+cmp==-----+-+ FOURIER NEURAL NELU DECOMPOSITION gmean qemu-aarch64 SPEC06fp (test set) speedup over QEMU 4c2c1015905 Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz error bars: 95% confidence interval 4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+ 4 +-+..........................+@@+...........................................................................+-+ 3.5 +-+..............%%@&.........@@..............%%@&............................................+++dsub +-+ 2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&....................................+%%&+.+%@&++%%@& +-+ 2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.%%@&+......+%%@&.+%%&++$$@&++d%@& %%@&+-+ 1.5 +-+#$%&#$@&#%@&$%@#$%@#$%&#$@&$%@&#$%@#$%@#$%&#%@&$%@&#$%@#$%&#$@&+f%@&$%@&+-+ 0.5 +-+#$%&#$@&#%@&$%@#$%@#$%&#$@&$%@&#$%@#$%@#$%&#%@&$%@&#$%@#$%&#$@&+sqr@&$%@&+-+ 0 +-+#$%&#$@&#%@&$%@#$%@#$%&#$@&$%@&#$%@#$%@#$%&#%@&$%@&#$%@#$%&#$@&+cmp&$%@&+-+ 410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean 2. Host: ARM Aarch64 A57 @ 2.4GHz qemu-aarch64 NBench score; higher is better Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz 5 +-+-----------+-------------+-------------+-------------+-----------+-+ 4.5 +-+........................................@@@&==...................+-+ 3 4 +-+..........................@@@&==........@.@&.=.....+before +-+ 3 +-+..........................@.@&.=........@.@&.=.....+ad@@@&== +-+ 2.5 +-+.....................##$$%%.@&.=........@.@&.=.....+ @m@& = +-+ 2 +-+............@@@&==.#.$.%.@&.=.#$$%%.@&.=.#$$%%d@& = +-+ 1.5 +-+.....*#$$%%.@&.=..#.$.%.@&.=..#.$.%.@&.=..#+$ +f@& = +-+ 0.5 +-+......#.$.%.@&.=..#.$.%.@&.=..#.$.%.@&.=..#+$+sqr& = +-+ 0 +-+-----#$$%%@@&==-#$$%%@@&==-#$$%%@@&==-*#$$%+cmp==-----+-+ FOURIER NEURAL NLU DECOMPOSITION gmean Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	hardfloat: implement float32/64 square root	Emilio G. Cota	2018-12-17	1	-2/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance results for fp-bench: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: sqrt-single: 42.30 MFlops sqrt-double: 22.97 MFlops - after: sqrt-single: 311.42 MFlops sqrt-double: 311.08 MFlops Here USE_FP makes a huge difference for f64's, with throughput going from ~200 MFlops to ~300 MFlops. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	hardfloat: implement float32/64 fused multiply-add	Emilio G. Cota	2018-12-17	1	-4/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance results for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: fma-single: 74.73 MFlops fma-double: 74.54 MFlops - after: fma-single: 203.37 MFlops fma-double: 169.37 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: fma-single: 23.24 MFlops fma-double: 23.70 MFlops - after: fma-single: 66.14 MFlops fma-double: 63.10 MFlops 3. IBM POWER8E @ 2.1 GHz - before: fma-single: 37.26 MFlops fma-double: 37.29 MFlops - after: fma-single: 48.90 MFlops fma-double: 59.51 MFlops Here having 3FP64 set to 1 pays off for x86_64: [1] 170.15 vs [0] 153.12 MFlops Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	hardfloat: implement float32/64 division	Emilio G. Cota	2018-12-17	1	-2/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance results for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: div-single: 34.84 MFlops div-double: 34.04 MFlops - after: div-single: 275.23 MFlops div-double: 216.38 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: div-single: 9.33 MFlops div-double: 9.30 MFlops - after: div-single: 51.55 MFlops div-double: 15.09 MFlops 3. IBM POWER8E @ 2.1 GHz - before: div-single: 25.65 MFlops div-double: 24.91 MFlops - after: div-single: 96.83 MFlops div-double: 31.01 MFlops Here setting 2FP64_USE_FP to 1 pays off for x86_64: [1] 215.97 vs [0] 62.15 MFlops Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	hardfloat: implement float32/64 multiplication	Emilio G. Cota	2018-12-17	1	-2/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance results for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: mul-single: 126.91 MFlops mul-double: 118.28 MFlops - after: mul-single: 258.02 MFlops mul-double: 197.96 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: mul-single: 37.42 MFlops mul-double: 38.77 MFlops - after: mul-single: 73.41 MFlops mul-double: 76.93 MFlops 3. IBM POWER8E @ 2.1 GHz - before: mul-single: 58.40 MFlops mul-double: 59.33 MFlops - after: mul-single: 60.25 MFlops mul-double: 94.79 MFlops Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	hardfloat: implement float32/64 addition and subtraction	Emilio G. Cota	2018-12-17	1	-19/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance results (single and double precision) for fp-bench: 1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: add-single: 135.07 MFlops add-double: 131.60 MFlops sub-single: 130.04 MFlops sub-double: 133.01 MFlops - after: add-single: 443.04 MFlops add-double: 301.95 MFlops sub-single: 411.36 MFlops sub-double: 293.15 MFlops 2. ARM Aarch64 A57 @ 2.4GHz - before: add-single: 44.79 MFlops add-double: 49.20 MFlops sub-single: 44.55 MFlops sub-double: 49.06 MFlops - after: add-single: 93.28 MFlops add-double: 88.27 MFlops sub-single: 91.47 MFlops sub-double: 88.27 MFlops 3. IBM POWER8E @ 2.1 GHz - before: add-single: 72.59 MFlops add-double: 72.27 MFlops sub-single: 75.33 MFlops sub-double: 70.54 MFlops - after: add-single: 112.95 MFlops add-double: 201.11 MFlops sub-single: 116.80 MFlops sub-double: 188.72 MFlops Note that the IBM and ARM machines benefit from having HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance can suffer significantly: - IBM Power8: add-single: [1] 54.94 vs [0] 116.37 MFlops add-double: [1] 58.92 vs [0] 201.44 MFlops - Aarch64 A57: add-single: [1] 80.72 vs [0] 93.24 MFlops add-double: [1] 82.10 vs [0] 88.18 MFlops On the Intel machine, having 2F64 set to 1 pays off, but it doesn't for 2F32: - Intel i7-6700K: add-single: [1] 285.79 vs [0] 426.70 MFlops add-double: [1] 302.15 vs [0] 278.82 MFlops Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	fpu: introduce hardfloat	Emilio G. Cota	2018-12-17	1	-0/+319
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The appended paves the way for leveraging the host FPU for a subset of guest FP operations. For most guest workloads (e.g. FP flags aren't ever cleared, inexact occurs often and rounding is set to the default [to nearest]) this will yield sizable performance speedups. The approach followed here avoids checking the FP exception flags register. See the added comment for details. This assumes that QEMU is running on an IEEE754-compliant FPU and that the rounding is set to the default (to nearest). The implementation-dependent specifics of the FPU should not matter; things like tininess detection and snan representation are still dealt with in soft-fp. However, this approach will break on most hosts if we compile QEMU with flags that break IEEE compatibility. There is no way to detect all of these flags at compilation time, but at least we check for -ffast-math (which defines __FAST_MATH__) and disable hardfloat (plus emit a #warning) when it is set. This patch just adds common code. Some operations will be migrated to hardfloat in subsequent patches to ease bisection. Note: some architectures (at least PPC, there might be others) clear the status flags passed to softfloat before most FP operations. This precludes the use of hardfloat, so to avoid introducing a performance regression for those targets, we add a flag to disable hardfloat. In the long run though it would be good to fix the targets so that at least the inexact flag passed to softfloat is indeed sticky. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	softfloat: rename canonicalize to sf_canonicalize	Emilio G. Cota	2018-12-17	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glibc >= 2.25 defines canonicalize in commit eaf5ad0 (Add canonicalize, canonicalizef, canonicalizel., 2016-10-26). Given that we'll be including <math.h> soon, prepare for this by prefixing our canonicalize() with sf_ to avoid clashing with the libc's canonicalize(). Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
*	qemu/compiler: Wrap __attribute__((flatten)) in a macro	Thomas Huth	2018-10-17	1	-24/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Older versions of Clang (before 3.5) and GCC (before 4.1) do not support the "__attribute__((flatten))" yet. We don't care about such old versions of GCC anymore, but since Clang 3.4 is still used in EPEL for RHEL7 / CentOS 7, we should not use this attribute directly but with a wrapper macro instead. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
*	softfloat: Fix division	Richard Henderson	2018-10-05	1	-8/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The __udiv_qrnnd primitive that we nicked from gmp requires its inputs to be normalized. We were not doing that. Because the inputs are nearly normalized already, finishing that is trivial. Replace div128to64 with a "proper" udiv_qrnnd, so that this remains a reusable primitive. Fixes: cf07323d494 Fixes: https://bugs.launchpad.net/qemu/+bug/1793119 Tested-by: Emilio G. Cota <cota@braap.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Replace countLeadingZeros32/64 with clz32/64	Thomas Huth	2018-10-05	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our minimum required compiler for compiling QEMU is GCC 4.1 these days, so we can drop the support for compilers which do not provide the __builtin_clz*() functions yet. Since the countLeadingZeros32/64 are then identical to the clz32/64 functions, and we do not have to sync the softloat 2 codebase with upstream anymore (softloat 3 is a complete rewrite) we can simply replace the functions with our QEMU versions. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <1538118095-7003-1-git-send-email-thuth@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: remove float64_trunc_to_int	Emilio G. Cota	2018-10-05	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \|	It has not had users since f83311e476 ("target-m68k: use floatx80 internally", 2017-06-21). Note that no other bit-width has floatX_trunc_to_int. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	softfloat: Add scaling float-to-int routines	Richard Henderson	2018-08-24	1	-73/+316
\| \| \| \| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180814002653.12828-3-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	softfloat: Add scaling int-to-float routines	Richard Henderson	2018-08-24	1	-48/+136
\| \| \| \| \| \| \|	Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180814002653.12828-2-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	softfloat: Fix missing inexact for floating-point add	Richard Henderson	2018-08-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	For 0x1.0000000000003p+0 + 0x1.ffffffep+14 = 0x1.0001fffp+15 we dropped the sticky bit and so failed to raise inexact. Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com> Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com> Message-id: 20180810193129.1556-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	fpu/softfloat: Define floatN_silence_nan in terms of parts_silence_nan	Richard Henderson	2018-05-18	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Isolate the target-specific choice to 3 functions instead of 6. The code in floatx80_default_nan tried to be over-general. There are only two targets that support this format: x86 and m68k. Thus there is no point in inventing a mechanism for snan_bit_is_one. Move routines that no longer have ifdefs out of softfloat-specialize.h. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>