tcg/ppc32: Use trampolines to trim the code size for mmu slow path accessors

mmu access looks something like: <check tlb> if miss goto slow_path <fast path> done: ... ; end of the TB slow_path: <pre process> mr r3, r27 ; move areg0 to r3 ; (r3 holds the first argument for all the PPC32 ABIs) <call mmu_helper> b $+8 .long done <post process> b done On ppc32 <call mmu_helper> is: (SysV and Darwin) mmu_helper is most likely not within direct branching distance from the call site, necessitating a. moving 32 bit offset of mmu_helper into a GPR ; 8 bytes b. moving GPR to CTR/LR ; 4 bytes c. (finally) branching to CTR/LR ; 4 bytes r3 setting - 4 bytes call - 16 bytes dummy jump over retaddr - 4 bytes embedded retaddr - 4 bytes Total overhead - 28 bytes (PowerOpen (AIX)) a. moving 32 bit offset of mmu_helper's TOC into a GPR1 ; 8 bytes b. loading 32 bit function pointer into GPR2 ; 4 bytes c. moving GPR2 to CTR/LR ; 4 bytes d. loading 32 bit small area pointer into R2 ; 4 bytes e. (finally) branching to CTR/LR ; 4 bytes r3 setting - 4 bytes call - 24 bytes dummy jump over retaddr - 4 bytes embedded retaddr - 4 bytes Total overhead - 36 bytes Following is done to trim the code size of slow path sections: In tcg_target_qemu_prologue trampolines are emitted that look like this: trampoline: mfspr r3, LR addi r3, 4 mtspr LR, r3 ; fixup LR to point over embedded retaddr mr r3, r27 <jump mmu_helper> ; tail call of sorts And slow path becomes: slow_path: <pre process> <call trampoline> .long done <post process> b done call - 4 bytes (trampoline is within code gen buffer and most likely accessible via direct branch) embedded retaddr - 4 bytes Total overhead - 8 bytes In the end the icache pressure is decreased by 20/28 bytes at the cost of an extra jump to trampoline and adjusting LR (to skip over embedded retaddr) once inside. Signed-off-by: malc <av1474@comtv.ru>
author: malc 2012-11-05 18:47:04 +0100
committer: malc 2012-11-06 01:37:57 +0100
commit: c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6 (patch)
tree: c790c9ce7d20408df89067c52ed81fba5278695c /exec-all.h
parent: target-mips: use ULL for 64 bit constants (diff)
download: qemu-c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6.tar.gz
qemu-c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6.tar.xz
qemu-c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6.zip
1 files changed, 1 insertions, 1 deletions
diff --git a/exec-all.h b/exec-all.h
index 94ed613e37..6b3272ab9e 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -337,7 +337,7 @@ extern uintptr_t tci_tb_ptr;
                                     *(int32_t *)((void *)GETRA() + 3) - 1))
 # elif defined (_ARCH_PPC) && !defined (_ARCH_PPC64)
 #  define GETRA() ((uintptr_t)__builtin_return_address(0))
-#  define GETPC_LDST() ((uintptr_t) ((*(int32_t *)(GETRA() + 4)) - 1))
+#  define GETPC_LDST() ((uintptr_t) ((*(int32_t *)(GETRA() - 4)) - 1))
 # else
 #  error "CONFIG_QEMU_LDST_OPTIMIZATION needs GETPC_LDST() implementation!"
 # endif
author	malc	2012-11-05 18:47:04 +0100
committer	malc	2012-11-06 01:37:57 +0100
commit	c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6 (patch)
tree	c790c9ce7d20408df89067c52ed81fba5278695c /exec-all.h
parent	target-mips: use ULL for 64 bit constants (diff)
download	qemu-c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6.tar.gz qemu-c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6.tar.xz qemu-c878da3b27ceeed953c9f9a1eb002d59e9dcb4c6.zip