summaryrefslogtreecommitdiffstats
path: root/tcg/optimize.c
Commit message (Collapse)AuthorAgeFilesLines
...
* tcg: Put opcodes in a linked listRichard Henderson2015-02-131-170/+116Star
| | | | | | | | The previous setup required ops and args to be completely sequential, and was error prone when it came to both iteration and optimization. Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: Don't special case TCG_OPF_CALL_CLOBBERRichard Henderson2014-06-181-5/+4Star
| | | | | | | With the "old" ldst ops we didn't know the real width of the result of the load, but with the "new" ldst ops we do. Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg: Remove TCG_TARGET_HAS_new_ldstRichard Henderson2014-06-041-5/+0Star
| | | | | | | Since all backends have been converted, remove the compatibility code. Acked-by: Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: Remember garbage high bits for 32-bit opsRichard Henderson2014-05-281-7/+26
| | | | | | | | For a 64-bit host, the high bits of a register after a 32-bit operation are undefined. Adjust the temps mask for all 32-bit ops to reflect that. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: Move updating of gen_opc_buf into tcg_opt_gen_mov*Richard Henderson2014-05-281-61/+56Star
| | | | | | No functional change, just reduce a bit of redundancy. Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg: Optimize brcond2 and setcond2 ne/eqRichard Henderson2014-05-281-0/+94
| | | | | | | If either the high or low pair can be resolved, we can simplify to either a constant or to a 32-bit comparison. Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg: Make call address a constant parameterRichard Henderson2014-05-121-42/+33Star
| | | | | | | | | | Avoid allocating a tcg temporary to hold the constant address, and instead place it directly into the op_call arguments. At the same time, convert to the newly introduced tcg_out_call backend function, rather than invoking tcg_out_op for the call. Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg: Add INDEX_op_trunc_shr_i32Richard Henderson2014-04-281-0/+16
| | | | | | Let the backend do something special for truncation. Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg: Fix out of range shift in deposit optimizationsRichard Henderson2014-04-191-6/+4Star
| | | | | | | | By inspection, for a deposit(x, y, 0, 64), we'd have a shift of (1<<64) and everything else falls apart. But we can reuse the existing deposit logic to get this right. Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg: Mask shift quantities while foldingRichard Henderson2014-04-191-15/+20
| | | | | | | | The TCG result would be undefined, but we can at least produce one plausible result and avoid triggering the wrath of analysis tools. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: Add more identity simplificationsRichard Henderson2014-02-171-15/+24
| | | | | | | | Recognize 0 operand to andc, and -1 operands to and, orc, eqv. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: Optmize ANDC X,Y,Y to MOV X,0Richard Henderson2014-02-171-0/+1
| | | | | | | | Like we already do for SUB and XOR. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: Simply some logical ops to NOTRichard Henderson2014-02-171-0/+57
| | | | | | | | | | Given, of course, an appropriate constant. These could be generated from the "canonical" operation for inversion on the guest, or via other optimizations. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: Handle known-zeros masks for ANDCRichard Henderson2014-02-171-0/+11
| | | | | | Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: add known-zero bits compute for load opsAurelien Jarno2014-02-171-1/+25
| | | | | | Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: improve known-zero bits for 32-bit opsAurelien Jarno2014-02-171-0/+6
| | | | | | | | | | The shl_i32 op might set some bits of the unused 32 high bits of the mask. Fix that by clearing the unused 32 high bits for all 32-bit ops except load/store which operate on tl values. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: fix known-zero bits optimizationAurelien Jarno2014-02-171-1/+7
| | | | | | | | | | | | Known-zero bits optimization is a great idea that helps to generate more optimized code. However the current implementation only works in very few cases as the computed mask is not saved. Fix this to make it really working. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: fix known-zero bits for right shift opsAurelien Jarno2014-02-171-5/+14
| | | | | | | | | | | 32-bit versions of sar and shr ops should not propagate known-zero bits from the unused 32 high bits. For sar it could even lead to wrong code being generated. Cc: qemu-stable@nongnu.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* misc: Use new rotate functionsStefan Weil2013-09-251-8/+4Star
| | | | Signed-off-by: Stefan Weil <sw@weilnetz.de>
* tcg: Constant fold div, remRichard Henderson2013-09-021-0/+23
| | | | | Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg: Add muluh and mulsh opcodesRichard Henderson2013-09-021-0/+20
| | | | | | | | Use them in places where mulu2 and muls2 are used. Optimize mulx2 with dead low part to mulxh. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
* tcg/optimize: fix setcond2 optimizationAurelien Jarno2013-05-091-0/+1
| | | | | | | | | | When setcond2 is rewritten into setcond, the state of the destination temp should be reset, so that a copy of the previous value is not used instead of the result. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg-optimize: Fold sub r,0,x to neg r,xRichard Henderson2013-03-231-1/+33
| | | | | | Cc: Blue Swirl <blauwirbel@gmail.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* tcg: Add signed multiword multiplication operationsRichard Henderson2013-02-231-0/+1
| | | | | Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* tcg: Add 64-bit multiword arithmetic operationsRichard Henderson2013-02-231-2/+2
| | | | | | Matching the 32-bit multiword arithmetic that we already have. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* optimize: optimize using nonzero bitsPaolo Bonzini2013-01-191-2/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds two optimizations using the non-zero bit mask. In some cases involving shifts or ANDs the value can become zero, and can thus be optimized to a move of zero. Second, useless zero-extension or an AND with constant can be detected that would only zero bits that are already zero. The main advantage of this optimization is that it turns zero-extensions into moves, thus enabling much better copy propagation (around 1% code reduction). Here is for example a "test $0xff0000,%ecx + je" before optimization: mov_i64 tmp0,rcx movi_i64 tmp1,$0xff0000 discard cc_src and_i64 cc_dst,tmp0,tmp1 movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 and after (without patch on the left, with on the right): movi_i64 tmp1,$0xff0000 movi_i64 tmp1,$0xff0000 discard cc_src discard cc_src and_i64 cc_dst,rcx,tmp1 and_i64 cc_dst,rcx,tmp1 movi_i32 cc_op,$0x1c movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 brcond_i64 cc_dst,tmp12,eq,$0x0 Other similar cases: "test %eax, %eax + jne" where eax is already 32-bit (after optimization, without patch on the left, with on the right): discard cc_src discard cc_src mov_i64 cc_dst,rax mov_i64 cc_dst,rax movi_i32 cc_op,$0x1c movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,ne,$0x0 brcond_i64 rax,tmp12,ne,$0x0 "test $0x1, %dl + je": movi_i64 tmp1,$0x1 movi_i64 tmp1,$0x1 discard cc_src discard cc_src and_i64 cc_dst,rdx,tmp1 and_i64 cc_dst,rdx,tmp1 movi_i32 cc_op,$0x1a movi_i32 cc_op,$0x1a ext8u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 brcond_i64 cc_dst,tmp12,eq,$0x0 In some cases TCG even outsmarts GCC. :) Here the input code has "and $0x2,%eax + movslq %eax,%rbx + test %rbx, %rbx" and the optimizer, thanks to copy propagation, does the following: movi_i64 tmp12,$0x2 movi_i64 tmp12,$0x2 and_i64 rax,rax,tmp12 and_i64 rax,rax,tmp12 mov_i64 cc_dst,rax mov_i64 cc_dst,rax ext32s_i64 tmp0,rax -> nop mov_i64 rbx,tmp0 -> mov_i64 rbx,cc_dst and_i64 cc_dst,rbx,rbx -> nop Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* optimize: track nonzero bits of registersPaolo Bonzini2013-01-191-22/+110
| | | | | | | | | | | | | | | | Add a "mask" field to the tcg_temp_info struct. A bit that is zero in "mask" will always be zero in the corresponding temporary. Zero bits in the mask can be produced from moves of immediates, zero-extensions, ANDs with constants, shifts; they can then be be propagated by logical operations, shifts, sign-extensions, negations, deposit operations, and conditional moves. Other operations will just reset the mask to all-ones, i.e. unknown. [rth: s/target_ulong/tcg_target_ulong/] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* optimize: only write to state when clearing optimizer dataPaolo Bonzini2013-01-191-5/+14
| | | | | | | | | | | | | | The next patch will add to the TCG optimizer a field that should be non-zero in the default case. Thus, replace the memset of the temps array with a loop. Only the state field has to be up-to-date, because others are not used except if the state is TCG_TEMP_COPY or TCG_TEMP_CONST. [rth: Extracted the loop to a function.] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* TCG: Use gen_opc_buf from context instead of global variable.Evgeny Voevodin2012-11-171-31/+31
| | | | | | Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
* tcg: rework TCG helper flagsAurelien Jarno2012-10-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be confusing and doesn't provide enough granularity for some helpers (FP helpers for example). This patch changes them into the following helpers flags: - TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals, either directly or via an exception. They will not be saved to their canonical location before calling the helper. - TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any globals. They will only be saved to their canonical locations before calling helpers, but they won't be reloaded afterwise. - TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is removed if the return value is not used. It provides convenience flags, to avoid helper definitions longer than 80 characters. It also provides compatibility flags, and updates the documentation. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Optimize mulu2Richard Henderson2012-10-171-0/+26
| | | | | | | | | Like add2, do operand ordering, constant folding, and dead operand elimination. The latter happens about 15% of all mulu2 during an x86_64 bios boot. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Constant fold add2 and sub2Richard Henderson2012-10-171-0/+35
| | | | | Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Do constant folding on double-word comparisonsRichard Henderson2012-10-171-21/+72
| | | | | Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Split out subroutines from do_constant_folding_condRichard Henderson2012-10-171-71/+81
| | | | | | | We can re-use these for implementing double-word folding. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Optimize double-word comparisons against zeroRichard Henderson2012-10-171-0/+39
| | | | | Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Use common code when failing to optimizeRichard Henderson2012-10-171-59/+32Star
| | | | | | | This saves a whole lot of repetitive code sequences. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Swap commutative double-word comparisonsRichard Henderson2012-10-171-0/+26
| | | | | Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Canonicalize add2 operand orderingRichard Henderson2012-10-171-0/+5
| | | | | Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Split out swap_commutative as a subroutineRichard Henderson2012-10-171-32/+24Star
| | | | | | | | | | | | | | | | | | | | | Reduces code duplication and prefers movcond d, c1, c2, const, s to movcond d, c1, c2, s, const It also prefers add r, r, c over add r, c, r when both inputs are known constants. This doesn't matter for true add, as we will fully constant fold that. But it matters for a follow-on patch using this routine for add2 which may not be fully foldable. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Add TCG_COND_NEVER, TCG_COND_ALWAYSRichard Henderson2012-10-061-0/+6
| | | | | | | | | | There are several cases that can be handled easier inside both translators and code generators if we have out-of-band values for conditions. It's easy enough to handle ALWAYS and NEVER in the natural way inside the tcg middle-end. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg/optimize: add constant folding for depositAurelien Jarno2012-09-221-0/+20
| | | | | Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg/optimize: prefer the "op a, a, b" form for commutative opsAurelien Jarno2012-09-221-1/+4
| | | | | | | | | | | The "op a, a, b" form is better handled on non-RISC host than the "op a, b, a" form, so swap the arguments to this form when possible, and when b is not a constant. This reduces the number of generated instructions by a tiny bit. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg/optimize: further optimize brcond/movcond/setcondAurelien Jarno2012-09-221-51/+76
| | | | | | | | | When both argument of brcond/movcond/setcond are the same or when one of the two values is a constant equal to zero, it's possible to do further optimizations. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg/optimize: optimize "op r, a, a => movi r, 0"Aurelien Jarno2012-09-221-0/+16
| | | | | | | | | Now that it's possible to detect copies, we can optimize the case the "op r, a, a => movi r, 0". This helps in the computation of overflow flags when one of the two args is 0. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg/optimize: optimize "op r, a, a => mov r, a"Aurelien Jarno2012-09-221-1/+1
| | | | | | | | Now that we can easily detect all copies, we can optimize the "op r, a, a => mov r, a" case a bit more. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg/optimize: do copy propagation for all operationsAurelien Jarno2012-09-221-2/+9
| | | | | | | | | | It is possible to due copy propagation for all operations, even the one that have side effects or clobber arguments (it only concerns input arguments). That said, the call operation should be handled differently due to the variable number of arguments. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg/optimize: rework copy progagationAurelien Jarno2012-09-221-75/+92
| | | | | | | | | | | | | | | | | | | | | | | | | The copy propagation pass tries to keep track what is a copy of what and what has copy of what, and in addition it keep a circular list of of all the copies. Unfortunately this doesn't fully work: a mov from a temp which has a state "COPY" changed it into a state "HAS_COPY". Later when this temp is used again, it is considered has not having copy and thus no propagation is done. This patch fixes that by removing the hiearchy between copies, and thus only keeping a "COPY" state both meaning "is a copy" and "has a copy". The decision of which copy to use is deferred to the actual temp replacement. At this stage there is not one best choice to do, but only better choices than others. For doing the best choice the operation would have to be parsed in reversed to know if a temp is going to be used later or not. That what is done by the liveness analysis. At this stage it is known that globals will be always live, that local temps will be dead at the end of the translation block, and that the temps will be dead at the end of the basic block. This means that this stage should try to replace temps by local temps or globals and local temps by globals. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg/optimize: check types in copy propagationAurelien Jarno2012-09-221-10/+8Star
| | | | | | | | | | | | | | | | The copy propagation doesn't check the types of the temps during copy propagation. However TCG is using the mov_i32 for the i64 to i32 conversion and thus the two are not equivalent. With this patch tcg_opt_gen_mov() doesn't consider two temps of different type as copies anymore. So far it seems the optimization was not aggressive enough to trigger this bug, but it will be triggered later in this series once the copy propagation is improved. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg/optimize: remove TCG_TEMP_ANYAurelien Jarno2012-09-221-6/+5Star
| | | | | | | | TCG_TEMP_ANY has no different meaning than TCG_TEMP_UNDEF, so use the later instead. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* tcg: Optimize two-address commutative operationsRichard Henderson2012-09-211-1/+14
| | | | | | | | While swapping constants to the second operand, swap sources matching destinations to the first operand. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>