bwlp/qemu.git - Experimental fork of QEMU with video encoding patches

	Commit message (Collapse)	Author	Age	Files	Lines
*	accel/tcg: Precompute curr_cflags into cpu->tcg_cflags	Richard Henderson	2021-03-06	1	-6/+1
\| \| \| \| \| \| \| \| \|	The primary motivation is to remove a dozen insns along the fast-path in tb_lookup. As a byproduct, this allows us to completely remove parallel_cpus. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	include/exec: lightly re-arrange TranslationBlock	Alex Bennée	2021-03-06	1	-3/+8
\| \| \| \| \| \| \| \| \|	Lets make sure all the flags we compare when looking up blocks are together in the same place. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210224165811.11567-5-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	accel/tcg: drop the use of CF_HASH_MASK and rename params	Alex Bennée	2021-03-06	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \|	We don't really deal in cf_mask most of the time. The one time it's relevant is when we want to remove an invalidated TB from the QHT lookup. Everywhere else we should be looking up things without CF_INVALID set. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210224165811.11567-4-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	accel/tcg: move CF_CLUSTER calculation to curr_cflags	Alex Bennée	2021-03-06	1	-3/+5
\| \| \| \| \| \| \| \| \| \|	There is nothing special about this compile flag that doesn't mean we can't just compute it with curr_cflags() which we should be using when building a new set. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210224165811.11567-3-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	accel/tcg: allow plugin instrumentation to be disable via cflags	Alex Bennée	2021-02-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When icount is enabled and we recompile an MMIO access we end up double counting the instruction execution. To avoid this we introduce the CF_MEMI cflag which only allows memory instrumentation for the next TB (which won't yet have been counted). As this is part of the hashed compile flags we will only execute the generated TB while coming out of a cpu_io_recompile. While we are at it delete the old TODO. We might as well keep the translation handy as it's likely you will repeatedly hit it on each MMIO access. Reported-by: Aaron Lindsay <aaron@os.amperecomputing.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Aaron Lindsay <aaron@os.amperecomputing.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210213130325.14781-21-alex.bennee@linaro.org>
*	accel/tcg: remove CF_NOCACHE and special cases	Alex Bennée	2021-02-18	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	Now we no longer generate CF_NOCACHE blocks we can remove a bunch of the special case handling for them. While we are at it we can remove the unused tb->orig_tb field and save a few bytes on the TB structure. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210213130325.14781-20-alex.bennee@linaro.org>
*	exec: Use cpu_untagged_addr in g2h; split out g2h_untagged	Richard Henderson	2021-02-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use g2h_untagged in contexts that have no cpu, e.g. the binary loaders that operate before the primary cpu is created. As a colollary, target_mmap and friends must use untagged addresses, since they are used by the loaders. Use g2h_untagged on values returned from target_mmap, as the kernel never applies a tag itself. Use g2h_untagged on all pc values. The only current user of tags, aarch64, removes tags from code addresses upon branch, so "pc" is always untagged. Use g2h with the cpu context on hand wherever possible. Use g2h_untagged in lock_user, which will be updated soon. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210212184902.1251044-13-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	tcg/tci: Make tci_tb_ptr thread-local	Richard Henderson	2021-02-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Each thread must have its own pc, even under TCI. Remove the GETPC ifdef, because GETPC is always available for helpers, and thus is always required. Move the assignment under INDEX_op_call, because the value is only visible when we make a call to a helper function. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210204014509.882821-6-richard.henderson@linaro.org>
*	accel/tcg: Restrict cpu_io_recompile() from other accelerators	Philippe Mathieu-Daudé	2021-01-23	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	As cpu_io_recompile() is only called within TCG accelerator in cputlb.c, declare it locally. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210117164813.4101761-6-f4bug@amsat.org> [rth: Adjust vs changed tb_flush_jmp_cache patch.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	accel/tcg: Restrict tb_gen_code() from other accelerators	Philippe Mathieu-Daudé	2021-01-23	1	-5/+0
\| \| \| \| \| \| \| \| \|	tb_gen_code() is only called within TCG accelerator, declare it locally. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210117164813.4101761-4-f4bug@amsat.org> [rth: Adjust vs changed tb_flush_jmp_cache patch.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	accel/tcg: Move tb_flush_jmp_cache() to cputlb.c	Richard Henderson	2021-01-23	1	-3/+0
\| \| \| \| \| \| \| \|	Move and make the function static, as the only users are here in cputlb.c. Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	accel/tcg: Make cpu_gen_init() static	Philippe Mathieu-Daudé	2021-01-23	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	cpu_gen_init() is TCG specific, only used in tcg/translate-all.c. No need to export it to other accelerators, declare it statically. Reviewed-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210117164813.4101761-2-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Introduce tcg_splitwx_to_{rx,rw}	Richard Henderson	2021-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Add two helper functions, using a global variable to hold the displacement. The displacement is currently always 0, so no change in behaviour. Begin using the functions in tcg common code only. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	vl: extract softmmu/globals.c	Paolo Bonzini	2020-12-15	1	-3/+0
\| \| \| \| \|	Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	overall/alpha tcg cpus\|hppa: Fix Lesser GPL version number	Chetan Pant	2020-11-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Message-Id: <20201023123353.19796-1-chetan4windows@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
*	accel/tcg: Add tlb_flush_page_bits_by_mmuidx*	Richard Henderson	2020-10-20	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On ARM, the Top Byte Ignore feature means that only 56 bits of the address are significant in the virtual address. We are required to give the entire 64-bit address to FAR_ELx on fault, which means that we do not "clean" the top byte early in TCG. This new interface allows us to flush all 256 possible aliases for a given page, currently missed by tlb_flush_page*. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20201016210754.818257-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	cpu-timers, icount: new modules	Claudio Fontana	2020-10-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	refactoring of cpus.c continues with cpu timer state extraction. cpu-timers: responsible for the softmmu cpu timers state, including cpu clocks and ticks. icount: counts the TCG instructions executed. As such it is specific to the TCG accelerator. Therefore, it is built only under CONFIG_TCG. One complication is due to qtest, which uses an icount field to warp time as part of qtest (qtest_clock_warp). In order to solve this problem, provide a separate counter for qtest. This requires fixing assumptions scattered in the code that qtest_enabled() implies icount_enabled(), checking each specific case. Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [remove redundant initialization with qemu_spice_init] Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [fix lingering calls to icount_get] Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	qemu/atomic.h: rename atomic_ to qatomic_	Stefan Hajnoczi	2020-09-23	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
*	cputlb: destroy CPUTLB with tlb_destroy	Emilio G. Cota	2020-06-16	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \|	I was after adding qemu_spin_destroy calls, but while at it I noticed that we are leaking some memory. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Robert Foley <robert.foley@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20200609200738.445-5-robert.foley@linaro.org> Message-Id: <20200612190237.30436-8-alex.bennee@linaro.org>
*	accel/tcg: Add probe_access_flags	Richard Henderson	2020-05-11	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	This new interface will allow targets to probe for a page and then handle watchpoints themselves. This will be most useful for vector predicated memory operations, where one page lookup can be used for many operations, and one test can avoid many watchpoint checks. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20200508154359.7494-6-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	accel/tcg: Add block comment for probe_access	Richard Henderson	2020-05-11	1	-0/+17
\| \| \| \| \| \| \|	Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20200508154359.7494-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	tcg: cputlb: Add probe_read	Beata Michalska	2019-12-16	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	Add probe_read alongside the write probing equivalent. Signed-off-by: Beata Michalska <beata.michalska@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191121000843.24844-2-beata.michalska@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	include/exec: wrap cpu_ldst.h in CONFIG_TCG	Alex Bennée	2019-10-28	1	-0/+2
\| \| \| \| \| \| \|	This gets around a build problem with --disable-tcg. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
*	cputlb: introduce get_page_addr_code_hostp	Emilio G. Cota	2019-10-28	1	-0/+38
\| \| \| \| \| \| \| \| \|	This will be used by plugins to get the host address of instructions. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
*	cputlb: document get_page_addr_code	Emilio G. Cota	2019-10-28	1	-3/+21
\| \| \| \| \| \| \|	Suggested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
*	s390x/tcg: MVCL: Exit to main loop if requested	David Hildenbrand	2019-10-10	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MVCL is interruptible and we should check for interrupts and process them after writing back the variables to the registers. Let's check for any exit requests and exit to the main loop. Introduce a new helper function for that: cpu_loop_exit_requested(). When booting Fedora 30, I can see a handful of these exits and it seems to work reliable. Also, Richard explained why this works correctly even when MVCL is called via EXECUTE: (1) TB with EXECUTE runs, at address Ae - env->psw_addr stored with Ae. - helper_ex() runs, memory address Am computed from D2a(X2a,B2a) or from psw.addr+RI2. - env->ex_value stored with memory value modified by R1a (2) TB of executee runs, - env->ex_value stored with 0. - helper_mvcl() runs, using and updating R1b, R1b+1, R2b, R2b+1. (3a) helper_mvcl() completes, - TB of executee continues, psw.addr += ilen. - Next instruction is the one following EXECUTE. (3b) helper_mvcl() exits to main loop, - cpu_loop_exit_restore() unwinds psw.addr = Ae. - Next instruction is the EXECUTE itself... - goto 1. As the PoP mentiones that an interruptible instruction called via EXECUTE should avoid modifying storage/registers that are used by EXECUTE itself, it is fine to retrigger EXECUTE. Cc: Alex Bennée <alex.bennee@linaro.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com>
*	cputlb: Partially inline memory_region_section_get_iotlb	Richard Henderson	2019-09-25	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is only one caller, tlb_set_page_with_attrs. We cannot inline the entire function because the AddressSpaceDispatch structure is private to exec.c, and cannot easily be moved to include/exec/memory-internal.h. Compute is_ram and is_romd once within tlb_set_page_with_attrs. Fold the number of tests against these predicates. Compute cpu_physical_memory_is_clean outside of the tlb lock region. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Factor out probe_write() logic into probe_access()	David Hildenbrand	2019-09-03	1	-2/+8
\| \| \| \| \| \| \| \| \|	Let's also allow to probe other access types. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190830100959.26615-3-david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Make probe_write() return a pointer to the host page	David Hildenbrand	2019-09-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	... similar to tlb_vaddr_to_host(); however, allow access to the host page except when TLB_NOTDIRTY or TLB_MMIO is set. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190830100959.26615-2-david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Factor out CONFIG_USER_ONLY probe_write() from s390x code	David Hildenbrand	2019-09-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Factor it out into common code. Similar to the !CONFIG_USER_ONLY variant, let's not allow to cross page boundaries. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190826075112.25637-4-david@redhat.com> [rth: Move cpu & cc variables inside if block.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	include: Make headers more self-contained	Markus Armbruster	2019-08-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Back in 2016, we discussed[1] rules for headers, and these were generally liked: 1. Have a carefully curated header that's included everywhere first. We got that already thanks to Peter: osdep.h. 2. Headers should normally include everything they need beyond osdep.h. If exceptions are needed for some reason, they must be documented in the header. If all that's needed from a header is typedefs, put those into qemu/typedefs.h instead of including the header. 3. Cyclic inclusion is forbidden. This patch gets include/ closer to obeying 2. It's actually extracted from my "[RFC] Baby steps towards saner headers" series[2], which demonstrates a possible path towards checking 2 automatically. It passes the RFC test there. [1] Message-ID: <87h9g8j57d.fsf@blackfin.pond.sub.org> https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03345.html [2] Message-Id: <20190711122827.18970-1-armbru@redhat.com> https://lists.nongnu.org/archive/html/qemu-devel/2019-07/msg02715.html Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-2-armbru@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
*	Include qemu-common.h exactly where needed	Markus Armbruster	2019-06-12	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
*	tcg: Use CPUClass::tlb_fill in cputlb.c	Richard Henderson	2019-05-10	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \|	We can now use the CPUClass hook instead of a named function. Create a static tlb_fill function to avoid other changes within cputlb.c. This also isolates the asserts within. Remove the named tlb_fill function from all of the targets. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: Hoist max_insns computation to tb_gen_code	Richard Henderson	2019-04-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	In order to handle TB's that translate to too much code, we need to place the control of the length of the translation in the hands of the code gen master loop. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	exec-all: document that tlb_fill can trigger a TLB resize	Emilio G. Cota	2019-02-11	1	-0/+5
\| \| \| \| \| \|	Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190209162745.12668-2-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	accel/tcg: Add cluster number to TCG TB hash	Peter Maydell	2019-01-29	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Include the cluster number in the hash we use to look up TBs. This is important because a TB that is valid for one cluster at a given physical address and set of CPU flags is not necessarily valid for another: the two clusters may have different views of physical memory, or may have different CPU features (eg FPU present or absent). We put the cluster number in the high 8 bits of the TB cflags. This gives us up to 256 clusters, which should be enough for anybody. If we ever need more, or need more bits in cflags for other purposes, we could make tb_hash_func() take more data (and expand qemu_xxhash7() to qemu_xxhash8()). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Message-id: 20190121152218.9592-4-peter.maydell@linaro.org
*	exec: introduce tlb_init	Emilio G. Cota	2018-10-19	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	Paves the way for the addition of a per-TLB lock. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009174557.16125-4-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	accel/tcg: Check whether TLB entry is RAM consistently with how we set it up	Peter Maydell	2018-08-14	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We set up TLB entries in tlb_set_page_with_attrs(), where we have some logic for determining whether the TLB entry is considered to be RAM-backed, and thus has a valid addend field. When we look at the TLB entry in get_page_addr_code(), we use different logic for determining whether to treat the page as RAM-backed and use the addend field. This is confusing, and in fact buggy, because the code in tlb_set_page_with_attrs() correctly decides that rom_device memory regions not in romd mode are not RAM-backed, but the code in get_page_addr_code() thinks they are RAM-backed. This typically results in "Bad ram pointer" assertion if the guest tries to execute from such a memory region. Fix this by making get_page_addr_code() just look at the TLB_MMIO bit in the code_address field of the TLB, which tlb_set_page_with_attrs() sets if and only if the addend field is not valid for code execution. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180713150945.12348-1-peter.maydell@linaro.org
*	tcg: simplify !CONFIG_TCG handling of tb_invalidate_*	Paolo Bonzini	2018-07-02	1	-5/+3
\| \| \| \| \| \| \| \| \|	There is no need for a stub, since tb_invalidate_phys_addr can be excised altogether when TCG is disabled. This is a bit cleaner since it avoids using code that is clearly specific to user-mode emulation (it calls mmap_lock/unlock) for the !CONFIG_TCG case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	tcg: Fix --disable-tcg build breakage	Philippe Mathieu-Daudé	2018-07-02	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the --disable-tcg breakage introduced by 8bca9a03ec60d: $ configure --disable-tcg [...] $ make -C i386-softmmu exec.o make: Entering directory 'i386-softmmu' CC exec.o In file included from source/qemu/exec.c:62:0: source/qemu/include/exec/ram_addr.h:96:6: error: conflicting types for ‘tb_invalidate_phys_range’ void tb_invalidate_phys_range(ram_addr_t start, ram_addr_t end); ^~~~~~~~~~~~~~~~~~~~~~~~ In file included from source/qemu/exec.c:24:0: source/qemu/include/exec/exec-all.h:309:6: note: previous declaration of ‘tb_invalidate_phys_range’ was here void tb_invalidate_phys_range(target_ulong start, target_ulong end); ^~~~~~~~~~~~~~~~~~~~~~~~ source/qemu/exec.c:1043:6: error: conflicting types for ‘tb_invalidate_phys_addr’ void tb_invalidate_phys_addr(AddressSpace as, hwaddr addr, MemTxAttrs attrs) ^~~~~~~~~~~~~~~~~~~~~~~ In file included from source/qemu/exec.c:24:0: source/qemu/include/exec/exec-all.h:308:6: note: previous declaration of ‘tb_invalidate_phys_addr’ was here void tb_invalidate_phys_addr(target_ulong addr); ^~~~~~~~~~~~~~~~~~~~~~~ make: ** [source/qemu/rules.mak:69: exec.o] Error 1 make: Leaving directory 'i386-softmmu' Tested to build x86_64-softmmu and i386-softmmu targets. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180629200710.27626-1-f4bug@amsat.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	move public invalidate APIs out of translate-all.{c,h}, clean up	Paolo Bonzini	2018-06-28	1	-4/+4
\| \| \| \| \| \| \| \| \|	Place them in exec.c, exec-all.h and ram_addr.h. This removes knowledge of translate-all.h (which is an internal header) from several files outside accel/tcg and removes knowledge of AddressSpace from translate-all.c (as it only operates on ram_addr_t). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	tcg: remove tb_lock	Emilio G. Cota	2018-06-15	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use mmap_lock in user-mode to protect TCG state and the page descriptors. In !user-mode, each vCPU has its own TCG state, so no locks needed. Per-page locks are used to protect the page descriptors. Per-TB locks are used in both modes to protect TB jumps. Some notes: - tb_lock is removed from notdirty_mem_write by passing a locked page_collection to tb_invalidate_phys_page_fast. - tcg_tb_lookup/remove/insert/etc have their own internal lock(s), so there is no need to further serialize access to them. - do_tb_flush is run in a safe async context, meaning no other vCPU threads are running. Therefore acquiring mmap_lock there is just to please tools such as thread sanitizer. - Not visible in the diff, but tb_invalidate_phys_page already has an assert_memory_lock. - cpu_io_recompile is !user-only, so no mmap_lock there. - Added mmap_unlock()'s before all siglongjmp's that could be called in user-mode while mmap_lock is held. + Added an assert for !have_mmap_lock() after returning from the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic. Performance numbers before/after: Host: AMD Opteron(tm) Processor 6376 ubuntu 17.04 ppc64 bootup+shutdown time 700 +-+--+----+------+------------+-----------+--------------+-+ \| + + + + + B \| \| before *B* ** * \| \|tb lock removal ###D### * \| 600 +-+ * +-+ \| ** # \| \| B #D \| \| *** * ## \| 500 +-+ *** ### +-+ \| * *** ### \| \| B # ## \| \| ** * #D# \| 400 +-+ ## +-+ \| ### \| \| ## \| \| # ## \| 300 +-+ * B* #D# +-+ \| B *** ### \| \| * ** #### \| \| * *** ### \| 200 +-+ B B #D# +-+ \| #B * ## # \| \| #* ## \| \| + D##D# + + + + \| 100 +-+--+----+------+------------+-----------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/HwmBHXe debian jessie aarch64 bootup+shutdown time 90 +-+--+-----+-----+------------+------------+------------+--+-+ \| + + + + + + \| \| before *B* B \| 80 +tb lock removal ###D### D +-+ \| ### \| \| ## \| 70 +-+ # +-+ \| ## \| \| # \| 60 +-+ B ## +-+ \| * ## \| \| * #D \| 50 +-+ * ## +-+ \| * ### \| \| B* ### \| 40 +-+ ** # ## +-+ \| #D# \| \| B* ### \| 30 +-+ B*B #### +-+ \| B * * # ### \| \| B ###D# \| 20 +-+ D ##D## +-+ \| D# \| \| + + + + + + \| 10 +-+--+-----+-----+------------+------------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/iGpGFtv The gains are high for 4-8 CPUs. Beyond that point, however, unrelated lock contention significantly hurts scalability. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	translate-all: protect TB jumps with a per-destination-TB lock	Emilio G. Cota	2018-06-15	1	-13/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This applies to both user-mode and !user-mode emulation. Instead of relying on a global lock, protect the list of incoming jumps with tb->jmp_lock. This lock also protects tb->cflags, so update all tb->cflags readers outside tb->jmp_lock to use atomic reads via tb_cflags(). In order to find the destination TB (and therefore its jmp_lock) from the origin TB, we introduce tb->jmp_dest[]. I considered not using a linked list of jumps, which simplifies code and makes the struct smaller. However, it unnecessarily increases memory usage, which results in a performance decrease. See for instance these numbers booting+shutting down debian-arm: Time (s) Rel. err (%) Abs. err (s) Rel. slowdown (%) ------------------------------------------------------------------------------ before 20.88 0.74 0.154512 0. after 20.81 0.38 0.079078 -0.33524904 GTree 21.02 0.28 0.058856 0.67049808 GHashTable + xxhash 21.63 1.08 0.233604 3.5919540 Using a hash table or a binary tree to keep track of the jumps doesn't really pay off, not only due to the increased memory usage, but also because most TBs have only 0 or 1 jumps to them. The maximum number of jumps when booting debian-arm that I measured is 35, but as we can see in the histogram below a TB with that many incoming jumps is extremely rare; the average TB has 0.80 incoming jumps. n_jumps: 379208; avg jumps/tb: 0.801099 dist: [0.0,1.0)\|▄█▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁ ▁▁▁ ▁\|[34.0,35.0] Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	translate-all: introduce assert_no_pages_locked	Emilio G. Cota	2018-06-15	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	The appended adds assertions to make sure we do not longjmp with page locks held. Note that user-mode has nothing to check, since page_locks are !user-mode only. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	translate-all: use per-page locking in !user-mode	Emilio G. Cota	2018-06-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Groundwork for supporting parallel TCG generation. Instead of using a global lock (tb_lock) to protect changes to pages, use fine-grained, per-page locks in !user-mode. User-mode stays with mmap_lock. Sometimes changes need to happen atomically on more than one page (e.g. when a TB that spans across two pages is added/invalidated, or when a range of pages is invalidated). We therefore introduce struct page_collection, which helps us keep track of a set of pages that have been locked in the appropriate locking order (i.e. by ascending page index). This commit first introduces the structs and the function helpers, to then convert the calling code to use per-page locking. Note that tb_lock is not removed yet. While at it, rename tb_alloc_page to tb_page_add, which pairs with tb_page_remove. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB	Emilio G. Cota	2018-06-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit does several things, but to avoid churn I merged them all into the same commit. To wit: - Use uintptr_t instead of TranslationBlock * for the list of TBs in a page. Just like we did in (c37e6d7e "tcg: Use uintptr_t type for jmp_list_{next\|first} fields of TB"), the rationale is the same: these are tagged pointers, not pointers. So use a more appropriate type. - Only check the least significant bit of the tagged pointers. Masking with 3/~3 is unnecessary and confusing. - Introduce the TB_FOR_EACH_TAGGED macro, and use it to define PAGE_FOR_EACH_TB, which improves readability. Note that TB_FOR_EACH_TAGGED will gain another user in a subsequent patch. - Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there is a bug and we attempt to remove a TB that is not in the list, instead of segfaulting (since the list is NULL-terminated) we will reach g_assert_not_reached(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	tcg: track TBs with per-region BST's	Emilio G. Cota	2018-06-15	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This paves the way for enabling scalable parallel generation of TCG code. Instead of tracking TBs with a single binary search tree (BST), use a BST for each TCG region, protecting it with a lock. This is as scalable as it gets, since each TCG thread operates on a separate region. The core of this change is the introduction of struct tcg_region_tree, which contains a pointer to a GTree and an associated lock to serialize accesses to it. We then allocate an array of tcg_region_tree's, adding the appropriate padding to avoid false sharing based on qemu_dcache_linesize. Given a tc_ptr, we first find the corresponding region_tree. This is done by special-casing the first and last regions first, since they might be of size != region.size; otherwise we just divide the offset by region.stride. I was worried about this division (several dozen cycles of latency), but profiling shows that this is not a fast path. Note that region.stride is not required to be a power of two; it is only required to be a multiple of the host's page size. Note that with this design we can also provide consistent snapshots about all region trees at once; for instance, tcg_tb_foreach acquires/releases all region_tree locks before/after iterating over them. For this reason we now drop tb_lock in dump_exec_info(). As an alternative I considered implementing a concurrent BST, but this can be tricky to get right, offers no consistent snapshots of the BST, and performance and scalability-wise I don't think it could ever beat having separate GTrees, given that our workload is insert-mostly (all concurrent BST designs I've seen focus, understandably, on making lookups fast, which comes at the expense of convoluted, non-wait-free insertions/removals). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
*	exec.c: Handle IOMMUs in address_space_translate_for_iotlb()	Peter Maydell	2018-06-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we don't support board configurations that put an IOMMU in the path of the CPU's memory transactions, and instead just assert() if the memory region fonud in address_space_translate_for_iotlb() is an IOMMUMemoryRegion. Remove this limitation by having the function handle IOMMUs. This is mostly straightforward, but we must make sure we have a notifier registered for every IOMMU that a transaction has passed through, so that we can flush the TLB appropriately when any of the IOMMUs change their mappings. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-5-peter.maydell@linaro.org
*	cputlb: Pass cpu_transaction_failed() the correct physaddr	Peter Maydell	2018-06-15	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The API for cpu_transaction_failed() says that it takes the physical address for the failed transaction. However we were actually passing it the offset within the target MemoryRegion. We don't currently have any target CPU implementations of this hook that require the physical address; fix this bug so we don't get confused if we ever do add one. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180611125633.32755-3-peter.maydell@linaro.org
*	Make tb_invalidate_phys_addr() take a MemTxAttrs argument	Peter Maydell	2018-05-31	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to tb_invalidate_phys_addr(). Its callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180521140402.23318-3-peter.maydell@linaro.org