| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Win32 doesn't have a cpuid.h, and MacOSX may have one but without
the __cpuid() function we use, which means that commit 9d2eec20
broke the build for those platforms. Fix this by tightening up
our configure cpuid.h check to test that the functions we need
are present, and adding some missing #ifdef guards in
tcg/i386/tcg-target.c.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These three-operand shift instructions do not require the shift count
to be placed into ECX. This reduces the number of mov insns required,
with the mere addition of a new register constraint.
Don't attempt to get rid of the matching constraint, as that's impossible
to manipulate with just a new constraint. In addition, constant shifts
still need the matching constraint.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
| |
Note that the optimizer cannot simplify ANDC X,Y,C to AND X,Y,~C
so we must handle constants in the implementation of andc.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
| |
Prepare for emitting BMI insns which require VEX encoding.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
| |
These are not needed by users of tcg-target.h. No need to recompile
when we adjust them.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
| |
TCG_TARGET_HAS_movcond_i32 is always defined to 1 in tcg-target.h, so
remove the corresponding #ifdef #endif sequence, left from a previous
refactoring.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The movbe instruction has been added on some Intel Atom CPUs and on
recent Intel Haswell CPUs. It allows to load/store a value and at the
same time bswap it.
This patch detects the avaibility of this instruction and when available
use it in the qemu load/store routines in replacement of load/store +
bswap. Note that for 16-bit unsigned loads, movbe + movzw is basically the
same as movzw + bswap, so the patch doesn't touch this case.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[RTH: Reduced the number of conditionals using "movop".]
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
|
| |
Add support for three-byte opcodes, starting with the 0x0f 0x38 prefix.
Use P_EXT38 as the new constant, and shift all other constants so that
P_EXT and P_EXT38 have neighbouring values.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[RTH: Changed the name from P_EXT2 to P_EXT38.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
P_REXW is defined has a constant at the beginning of i386/tcg-target.c,
but the corresponding bit is later used in a harcoded way, which defeat
the purpose of a constant.
Fix that by using a conditional expression operator instead of a shift.
On x86 this actually makes the code slightly smaller as GCC does in
practice (opc >> 8) & 8 instead of (opc & 0x800) >> 8 so the constants
are smaller to load.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
| |
The comments apply to 8-bit stores, not 8-byte stores.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
|
|
|
|
|
| |
No support for helpers with non-default endianness yet,
but good enough to test the opcodes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once we form a combined qemu_st_i32 opcode, we won't be able to
have separate constraints based on size. This one is fairly easy
to work around, since eax is available as a scratch register.
When storing variable data, this tends to merely exchange one mov
for another. E.g.
-: mov %esi,%ecx
...
-: mov %cl,(%edx)
+: mov %esi,%eax
+: mov %al,(%edx)
Where we do have a regression is when storing constant data, in which
we may load the constant into edi, when only ecx/ebx ought to be used.
The proper way to recover this regression is to allow constants as
arguments to qemu_st_i32, so that we never load the constant data into
a register at all, must less the wrong register. TBD.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Pass two TCGReg to tcg_out_tlb_load, rather than idx+args.
Move ldst_optimization routines just below tcg_out_tlb_load to avoid
the need for forward declarations.
Use TCGReg enum in preference to int where apprpriate.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
| |
Step one in the transition, with constants passed down from tcg_out_op.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
| |
Step two in the transition, adding the new ldst opcodes. Keep the old
opcodes around until all backends support the new opcodes.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
| |
Move TCGLabelQemuLdst and related stuff out of tcg.h.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
| |
For 8 and 16-bit unsigned loads, rely on the zero-extension
from the helper and use a smaller 32-bit move insn.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
| |
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
|
|
| |
The _cmmu helpers can be moved to exec-all.h. The helpers that are
used from TCG will shortly need access to tcg_target_long so move
their declarations into tcg.h.
This requires minor include adjustments to all TCG backends.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
| |
Since we now perform it inside the helper, no need to do it here.
This also lets us perform a tail-call from the store slow path to
the helper.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
| |
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
| |
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
| |
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
| |
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
| |
There are several hosts for which it would be useful to use the
available 64-bit registers in a 32-bit pointer environment.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
| |
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
| |
Use them in places where mulu2 and muls2 are used.
Optimize mulx2 with dead low part to mulxh.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
| |
Discontinue the jump-around-jump-to-jump scheme, trading it for a single
immediate move instruction. The two extra jumps always consume 7 bytes,
whereas the immediate move is either 5 or 7 bytes depending on where the
code_gen_buffer gets located.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
| |
Use existing stack space for arguments; don't push/pop.
Use less ifdefs and more C ifs.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
| |
Use a 7 byte lea before the ultimate 10 byte movq.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
| |
No point in splitting the write into 32-bit pieces.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
| |
We can check the condition at compile time, rather than run time.
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
| |
These will necessarily be the same layout for all hosts. This limits
the amount of boilerplate required to implement jit debug for a host.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
| |
We've got a compile-time check for the condition in exec/cpu-defs.h.
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
|
|
|
| |
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
|
|
|
| |
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
|
|
|
|
| |
Matching the 32-bit multiword arithmetic that we already have.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
|
|
|
| |
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
|
|
|
| |
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
|
|
|
|
| |
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
|
|
|
|
|
|
| |
Existing compile-time detection is spotty at best. Convert
it all to runtime detection instead.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
|
|
| |
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
|
|
| |
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Add optimized TCG qemu_ld/st generation which locates the code of TLB miss
cases at the end of a block after generating the other IRs.
Currently, this optimization supports only i386 and x86_64 hosts.
Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
When we allocate a reserved_va for the guest, the kernel will likely
choose an address well above 4G. At which point we must use a pair
of movabsq+addq to form the host address. If we have OS support,
set up a segment register to point to guest_base instead.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On x86_64, remove the constraint on the third argument register which
is not needed:
- For loads the helper arguments are env, addr, mem_idx. The addr
value should not be in the two first argument registers as they are
used in tcg_out_tlb_load().
- For stores the helper arguments are env, addr, data, mem_idx.
The addr and data values should not be in the two first argument
registers as they are used in tcg_out_tlb_load(). The data value
should also not be in the two first argument registers, but could
be in the third argument register in which case it would be already
loaded at the right location.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that CONFIG_TCG_PASS_AREG0 has been removed, it's easier to get
an optimal code for the load/store functions.
First swap the two registers used in tcg_out_tlb_load() so that the
address end-up in the second register instead of the first one. Adjust
tcg_out_qemu_ld() and tcg_out_qemu_st() to respectively call
tcg_out_qemu_ld_direct() and tcg_out_qemu_st_direct() with the correct
registers. Then replace the register shifting by direct load of the
arguments.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
|
|
|
|
|
|
|
|
| |
GUEST_BASE support is now supported by all TCG backends, and is
now mandatory. Drop the now-pointless TCG_TARGET_HAS_GUEST_BASE
define (set by every backend) and the error if it is unset.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
|
|
|
|
|
|
|
|
|
|
| |
There are several cases that can be handled easier inside both
translators and code generators if we have out-of-band values
for conditions. It's easy enough to handle ALWAYS and NEVER in
the natural way inside the tcg middle-end.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TCG jmp operation doesn't really make sense in the QEMU context, it
is unused, it is not implemented by some targets, and it is wrongly
implemented by some others.
This patch simply removes it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Acked-by: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Stefan Weil<sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|