summaryrefslogtreecommitdiffstats
path: root/accel/kvm
Commit message (Collapse)AuthorAgeFilesLines
* Revert "accel: kvm: Add aligment assert for kvm_log_clear_one_slot"Paolo Bonzini2021-03-161-7/+0Star
| | | | | | | | | This reverts commit 3920552846e881bafa9f9aad0bb1a6eef874d7fb. Thomas Huth reported a failure with CentOS 6 guests: ../../devel/qemu/accel/kvm/kvm-all.c:690: kvm_log_clear_one_slot: Assertion `QEMU_IS_ALIGNED(start | size, psize)' failed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel: kvm: Fix kvm_type invocationAndrew Jones2021-03-121-0/+2
| | | | | | | | | | | | | | | | Prior to commit f2ce39b4f067 a MachineClass kvm_type method only needed to be registered to ensure it would be executed. With commit f2ce39b4f067 a kvm-type machine property must also be specified. hw/arm/virt relies on the kvm_type method to pass its selected IPA limit to KVM, but this is not exposed as a machine property. Restore the previous functionality of invoking kvm_type when it's present. Fixes: f2ce39b4f067 ("vl: make qemu_get_machine_opts static") Signed-off-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-id: 20210310135218.255205-2-drjones@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* accel: kvm: Add aligment assert for kvm_log_clear_one_slotKeqian Zhu2021-03-061-0/+7
| | | | | | | | | | | | | | | The parameters start and size are transfered from QEMU memory emulation layer. It can promise that they are TARGET_PAGE_SIZE aligned. However, KVM needs they are qemu_real_page_size aligned. Though no caller breaks this aligned requirement currently, we'd better add an explicit assert to avoid future breaking. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <20201217014941.22872-3-zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel: kvm: Fix memory waste under mismatch page sizeKeqian Zhu2021-03-061-1/+5
| | | | | | | | | | | | | | | | | When handle dirty log, we face qemu_real_host_page_size and TARGET_PAGE_SIZE. The first one is the granule of KVM dirty bitmap, and the second one is the granule of QEMU dirty bitmap. As qemu_real_host_page_size >= TARGET_PAGE_SIZE (kvm_init() enforced it), misuse TARGET_PAGE_SIZE to init kvmslot dirty_bmap may waste memory. For example, when qemu_real_host_page_size is 64K and TARGET_PAGE_SIZE is 4K, it wastes 93.75% (15/16) memory. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20201217014941.22872-2-zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* sev/i386: Don't allow a system reset under an SEV-ES guestTom Lendacky2021-02-161-0/+5
| | | | | | | | | | | | | | | | | | | | An SEV-ES guest does not allow register state to be altered once it has been measured. When an SEV-ES guest issues a reboot command, Qemu will reset the vCPU state and resume the guest. This will cause failures under SEV-ES. Prevent that from occuring by introducing an arch-specific callback that returns a boolean indicating whether vCPUs are resettable. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: David Hildenbrand <david@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Message-Id: <1ac39c441b9a3e970e9556e1cc29d0a0814de6fd.1611682609.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* sev/i386: Allow AP booting under SEV-ESPaolo Bonzini2021-02-161-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | When SEV-ES is enabled, it is not possible modify the guests register state after it has been initially created, encrypted and measured. Normally, an INIT-SIPI-SIPI request is used to boot the AP. However, the hypervisor cannot emulate this because it cannot update the AP register state. For the very first boot by an AP, the reset vector CS segment value and the EIP value must be programmed before the register has been encrypted and measured. Search the guest firmware for the guest for a specific GUID that tells Qemu the value of the reset vector to use. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <22db2bfb4d6551aed661a9ae95b4fdbef613ca21.1611682609.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel/kvm/kvm-all: Fix wrong return code handling in dirty log codeThomas Huth2021-02-081-9/+12
| | | | | | | | | | | | | | | | | | | The kvm_vm_ioctl() wrapper already returns -errno if the ioctl itself returned -1, so the callers of kvm_vm_ioctl() should not check for -1 but for a value < 0 instead. This problem has been fixed once already in commit b533f658a98325d0e4 but that commit missed that the ENOENT error code is not fatal for this ioctl, so the commit has been reverted in commit 50212d6346f33d6e since the problem occurred close to a pending release at that point in time. The plan was to fix it properly after the release, but it seems like this has been forgotten. So let's do it now finally instead. Resolves: https://bugs.launchpad.net/qemu/+bug/1294227 Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210129084354.42928-1-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* confidential guest support: Move SEV initialization into arch specific codeDavid Gibson2021-02-082-16/+2Star
| | | | | | | | | | | While we've abstracted some (potential) differences between mechanisms for securing guest memory, the initialization is still specific to SEV. Given that, move it into x86's kvm_arch_init() code, rather than the generic kvm_init() code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>
* sev: Add Error ** to sev_kvm_init()David Gibson2021-02-082-2/+4
| | | | | | | | | This allows failures to be reported richly and idiomatically. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
* confidential guest support: Rework the "memory-encryption" propertyDavid Gibson2021-02-082-4/+6
| | | | | | | | | | | | | | | | | | | Currently the "memory-encryption" property is only looked at once we get to kvm_init(). Although protection of guest memory from the hypervisor isn't something that could really ever work with TCG, it's not conceptually tied to the KVM accelerator. In addition, the way the string property is resolved to an object is almost identical to how a QOM link property is handled. So, create a new "confidential-guest-support" link property which sets this QOM interface link directly in the machine. For compatibility we keep the "memory-encryption" property, but now implemented in terms of the new property. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
* sev: Remove false abstraction of flash encryptionDavid Gibson2021-02-082-36/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When AMD's SEV memory encryption is in use, flash memory banks (which are initialed by pc_system_flash_map()) need to be encrypted with the guest's key, so that the guest can read them. That's abstracted via the kvm_memcrypt_encrypt_data() callback in the KVM state.. except, that it doesn't really abstract much at all. For starters, the only call site is in code specific to the 'pc' family of machine types, so it's obviously specific to those and to x86 to begin with. But it makes a bunch of further assumptions that need not be true about an arbitrary confidential guest system based on memory encryption, let alone one based on other mechanisms: * it assumes that the flash memory is defined to be encrypted with the guest key, rather than being shared with hypervisor * it assumes that that hypervisor has some mechanism to encrypt data into the guest, even though it can't decrypt it out, since that's the whole point * the interface assumes that this encrypt can be done in place, which implies that the hypervisor can write into a confidential guests's memory, even if what it writes isn't meaningful So really, this "abstraction" is actually pretty specific to the way SEV works. So, this patch removes it and instead has the PC flash initialization code call into a SEV specific callback. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
* accel: replace struct CpusAccel with AccelOpsClassClaudio Fontana2021-02-054-11/+23
| | | | | | | | | | | | | | | This will allow us to centralize the registration of the cpus.c module accelerator operations (in accel/accel-softmmu.c), and trigger it automatically using object hierarchy lookup from the new accel_init_interfaces() initialization step, depending just on which accelerators are available in the code. Rename all tcg-cpus.c, kvm-cpus.c, etc to tcg-accel-ops.c, kvm-accel-ops.c, etc, matching the object type names. Signed-off-by: Claudio Fontana <cfontana@suse.de> Message-Id: <20210204163931.7358-18-cfontana@suse.de> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* accel/kvm: avoid using predefined PAGE_SIZEJiaxun Yang2021-01-201-0/+3
| | | | | | | | | | | | | | | | As per POSIX specification of limits.h [1], OS libc may define PAGE_SIZE in limits.h. PAGE_SIZE is used in included kernel uapi headers. To prevent collosion of definition, we discard PAGE_SIZE from defined by libc and take QEMU's variable. [1]: https://pubs.opengroup.org/onlinepubs/7908799/xsh/limits.h.html Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <20210118063808.12471-8-jiaxun.yang@flygoat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
* kvm: Take into account the unaligned section size when preparing bitmapZenghui Yu2020-12-151-2/+5
| | | | | | | | | | | | | | | | | | | | | | The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the start and the size of the given range of pages. We have been careful to handle the unaligned cases when performing CLEAR on one slot. But it seems that we forget to take the unaligned *size* case into account when preparing bitmap for the interface, and we may end up clearing dirty status for pages outside of [start, start + size). If the size is unaligned, let's go through the slow path to manipulate a temp bitmap for the interface so that we won't bother with those unaligned bits at the end of bitmap. I don't think this can happen in practice since the upper layer would provide us with the alignment guarantee. I'm not sure if kvm-all could rely on it. And this patch is mainly intended to address correctness of the specific algorithm used inside kvm_log_clear_one_slot(). Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Message-Id: <20201208114013.875-1-yuzenghui@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* vl: make qemu_get_machine_opts staticPaolo Bonzini2020-12-151-7/+4Star
| | | | | | | Machine options can be retrieved as properties of the machine object. Encourage that by removing the "easy" accessor to machine options. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel/kvm: add PIO ioeventfds only in case kvm_eventfds_allowed is trueElena Afanasova2020-11-031-2/+4
| | | | | | | Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Elena Afanasova <eafanasova@gmail.com> Message-Id: <20201017210102.26036-1-eafanasova@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* kvm: kvm_init_vcpu take Error pointerDr. David Alan Gilbert2020-10-054-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the error handling in kvm_init_vcpu so we can see what went wrong more easily. Make it take an Error ** and fill it out with what failed, including the cpu id, so you can tell if it only fails at a given ID. Replace the remaining DPRINTF by a trace. This turns a: kvm_init_vcpu failed: Invalid argument into: kvm_init_vcpu: kvm_get_vcpu failed (256): Invalid argument and with the trace you then get to see: 19049@1595520414.310107:kvm_init_vcpu index: 169 id: 212 19050@1595520414.310635:kvm_init_vcpu index: 170 id: 256 qemu-system-x86_64: kvm_init_vcpu: kvm_get_vcpu failed (256): Invalid argument which makes stuff a lot more obvious. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200723160915.129069-1-dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: remove kvm specific functions from global includesClaudio Fontana2020-10-051-0/+7
| | | | | | Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* cpus: extract out kvm-specific code to accel/kvmClaudio Fontana2020-10-054-2/+122
| | | | | | | | | | | | register a "CpusAccel" interface for KVM as well. Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> [added const] Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qemu/atomic.h: rename atomic_ to qatomic_Stefan Hajnoczi2020-09-231-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang's C11 atomic_fetch_*() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
* util: rename qemu_open() to qemu_open_old()Daniel P. Berrangé2020-09-161-1/+1
| | | | | | | | | | | We want to introduce a new version of qemu_open() that uses an Error object for reporting problems and make this it the preferred interface. Rename the existing method to release the namespace for the new impl. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
* meson: accelMarc-André Lureau2020-08-212-2/+5
| | | | | Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* trace: switch position of headers to what Meson requiresPaolo Bonzini2020-08-211-0/+1
| | | | | | | | | | | | | | | | | Meson doesn't enjoy the same flexibility we have with Make in choosing the include path. In particular the tracing headers are using $(build_root)/$(<D). In order to keep the include directives unchanged, the simplest solution is to generate headers with patterns like "trace/trace-audio.h" and place forwarding headers in the source tree such that for example "audio/trace.h" includes "trace/trace-audio.h". This patch is too ugly to be applied to the Makefiles now. It's only a way to separate the changes to the tracing header files from the Meson rewrite of the tracing logic. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* error: Eliminate error_propagate() with Coccinelle, part 1Markus Armbruster2020-07-101-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... return ... } to if (!foo(..., errp)) { ... ... return ... } where nothing else needs @err. Coccinelle script: @rule1 forall@ identifier fun, err, errp, lbl; expression list args, args2; binary operator op; constant c1, c2; symbol false; @@ if ( ( - fun(args, &err, args2) + fun(args, errp, args2) | - !fun(args, &err, args2) + !fun(args, errp, args2) | - fun(args, &err, args2) op c1 + fun(args, errp, args2) op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; | return c2; | return false; ) } @rule2 forall@ identifier fun, err, errp, lbl; expression list args, args2; expression var; binary operator op; constant c1, c2; symbol false; @@ - var = fun(args, &err, args2); + var = fun(args, errp, args2); ... when != err if ( ( var | !var | var op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; | return c2; | return false; | return var; ) } @depends on rule1 || rule2@ identifier err; @@ - Error *err = NULL; ... when != err Not exactly elegant, I'm afraid. The "when != lbl:" is necessary to avoid transforming if (fun(args, &err)) { goto out } ... out: error_propagate(errp, err); even though other paths to label out still need the error_propagate(). For an actual example, see sclp_realize(). Without the "when strict", Coccinelle transforms vfio_msix_setup(), incorrectly. I don't know what exactly "when strict" does, only that it helps here. The match of return is narrower than what I want, but I can't figure out how to express "return where the operand doesn't use @err". For an example where it's too narrow, see vfio_intx_enable(). Silently fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Converted manually. Line breaks tidied up manually. One nested declaration of @local_err deleted manually. Preexisting unwanted blank line dropped in hw/riscv/sifive_e.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-35-armbru@redhat.com>
* qapi: Use returned bool to check for failure, manual partMarkus Armbruster2020-07-101-27/+23Star
| | | | | | | | | | | | The previous commit used Coccinelle to convert from checking the Error object to checking the return value. Convert a few more manually. Also tweak control flow in places to conform to the conventional "if error bail out" pattern. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-20-armbru@redhat.com>
* qapi: Use returned bool to check for failure, Coccinelle partMarkus Armbruster2020-07-101-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous commit enables conversion of visit_foo(..., &err); if (err) { ... } to if (!visit_foo(..., errp)) { ... } for visitor functions that now return true / false on success / error. Coccinelle script: @@ identifier fun =~ "check_list|input_type_enum|lv_start_struct|lv_type_bool|lv_type_int64|lv_type_str|lv_type_uint64|output_type_enum|parse_type_bool|parse_type_int64|parse_type_null|parse_type_number|parse_type_size|parse_type_str|parse_type_uint64|print_type_bool|print_type_int64|print_type_null|print_type_number|print_type_size|print_type_str|print_type_uint64|qapi_clone_start_alternate|qapi_clone_start_list|qapi_clone_start_struct|qapi_clone_type_bool|qapi_clone_type_int64|qapi_clone_type_null|qapi_clone_type_number|qapi_clone_type_str|qapi_clone_type_uint64|qapi_dealloc_start_list|qapi_dealloc_start_struct|qapi_dealloc_type_anything|qapi_dealloc_type_bool|qapi_dealloc_type_int64|qapi_dealloc_type_null|qapi_dealloc_type_number|qapi_dealloc_type_str|qapi_dealloc_type_uint64|qobject_input_check_list|qobject_input_check_struct|qobject_input_start_alternate|qobject_input_start_list|qobject_input_start_struct|qobject_input_type_any|qobject_input_type_bool|qobject_input_type_bool_keyval|qobject_input_type_int64|qobject_input_type_int64_keyval|qobject_input_type_null|qobject_input_type_number|qobject_input_type_number_keyval|qobject_input_type_size_keyval|qobject_input_type_str|qobject_input_type_str_keyval|qobject_input_type_uint64|qobject_input_type_uint64_keyval|qobject_output_start_list|qobject_output_start_struct|qobject_output_type_any|qobject_output_type_bool|qobject_output_type_int64|qobject_output_type_null|qobject_output_type_number|qobject_output_type_str|qobject_output_type_uint64|start_list|visit_check_list|visit_check_struct|visit_start_alternate|visit_start_list|visit_start_struct|visit_type_.*"; expression list args; typedef Error; Error *err; @@ - fun(args, &err); - if (err) + if (!fun(args, &err)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-19-armbru@redhat.com>
* accel/kvm: Convert to ram_block_discard_disable()David Hildenbrand2020-07-021-2/+2
| | | | | | | | | | | | Discarding memory does not work as expected. At the time this is called, we cannot have anyone active that relies on discards to work properly. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200626072248.78761-5-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* kvm: support to get/set dirty log initial-all-set capabilityJay Zhou2020-06-261-7/+14
| | | | | | | | | | | | Since the new capability KVM_DIRTY_LOG_INITIALLY_SET of KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 has been introduced in the kernel, tweak the userspace side to detect and enable this capability. Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20200304025554.2159-1-jianjay.zhou@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: Kick resamplefd for split kernel irqchipPeter Xu2020-06-102-2/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is majorly only for X86 because that's the only one that supports split irqchip for now. When the irqchip is split, we face a dilemma that KVM irqfd will be enabled, however the slow irqchip is still running in the userspace. It means that the resamplefd in the kernel irqfds won't take any effect and it will miss to ack INTx interrupts on EOIs. One example is split irqchip with VFIO INTx, which will break if we use the VFIO INTx fast path. This patch can potentially supports the VFIO fast path again for INTx, that the IRQ delivery will still use the fast path, while we don't need to trap MMIOs in QEMU for the device to emulate the EIOs (see the callers of vfio_eoi() hook). However the EOI of the INTx will still need to be done from the userspace by caching all the resamplefds in QEMU and kick properly for IOAPIC EOI broadcast. This is tricky because in this case the userspace ioapic irr & remote-irr will be bypassed. However such a change will greatly boost performance for assigned devices using INTx irqs (TCP_RR boosts 46% after this patch applied). When the userspace is responsible for the resamplefd kickup, don't register it on the kvm_irqfd anymore, because on newer kernels (after commit 654f1f13ea56, 5.2+) the KVM_IRQFD will fail if with both split irqchip and resamplefd. This will make sure that the fast path will work for all supported kernels. https://patchwork.kernel.org/patch/10738541/#22609933 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200318145204.74483-5-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: Pass EventNotifier into kvm_irqchip_assign_irqfdPeter Xu2020-06-101-6/+10
| | | | | | | | | | | | So that kvm_irqchip_assign_irqfd() can have access to the EventNotifiers, especially the resample event. It is needed in follow up patch to cache and kick resamplefds from QEMU. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200318145204.74483-4-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qom: Drop parameter @errp of object_property_add() & friendsMarkus Armbruster2020-05-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]
* qom: Drop object_property_set_description() parameter @errpMarkus Armbruster2020-05-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | object_property_set_description() and object_class_property_set_description() fail only when property @name is not found. There are 85 calls of object_property_set_description() and object_class_property_set_description(). None of them can fail: * 84 immediately follow the creation of the property. * The one in spapr_rng_instance_init() refers to a property created in spapr_rng_class_init(), from spapr_rng_properties[]. Every one of them still gets to decide what to pass for @errp. 51 calls pass &error_abort, 32 calls pass NULL, one receives the error and propagates it to &error_abort, and one propagates it to &error_fatal. I'm actually surprised none of them violates the Error API. What are we gaining by letting callers handle the "property not found" error? Use when the property is not known to exist is simpler: you don't have to guard the call with a check. We haven't found such a use in 5+ years. Until we do, let's make life a bit simpler and drop the @errp parameter. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-8-armbru@redhat.com> [One semantic rebase conflict resolved]
* KVM: Move hwpoison page related functions into kvm-all.cDongjiu Geng2020-05-141-0/+36
| | | | | | | | | | | | | | | | | kvm_hwpoison_page_add() and kvm_unpoison_all() will both be used by X86 and ARM platforms, so moving them into "accel/kvm/kvm-all.c" to avoid duplicate code. For architectures that don't use the poison-list functionality the reset handler will harmlessly do nothing, so let's register the kvm_unpoison_all() function in the generic kvm_init() function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com> Acked-by: Xiang Zheng <zhengxiang9@huawei.com> Message-id: 20200512030609.19593-8-gengdongjiu@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* Merge branch 'exec_rw_const_v4' of https://github.com/philmd/qemu into HEADPaolo Bonzini2020-02-251-3/+3
|\
| * Avoid address_space_rw() with a constant is_write argumentPeter Maydell2020-02-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The address_space_rw() function allows either reads or writes depending on the is_write argument passed to it; this is useful when the direction of the access is determined programmatically (as for instance when handling the KVM_EXIT_MMIO exit reason). Under the hood it just calls either address_space_write() or address_space_read_full(). We also use it a lot with a constant is_write argument, though, which has two issues: * when reading "address_space_rw(..., 1)" this is less immediately clear to the reader as being a write than "address_space_write(...)" * calling address_space_rw() bypasses the optimization in address_space_read() that fast-paths reads of a fixed length This commit was produced with the included Coccinelle script scripts/coccinelle/exec_rw_const.cocci. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200218112457.22712-1-peter.maydell@linaro.org> [PMD: Update macvm_set_cr0() reported by Laurent Vivier] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
* | accel/kvm: Check ioctl(KVM_SET_USER_MEMORY_REGION) return valuePhilippe Mathieu-Daudé2020-02-251-1/+11
|/ | | | | | | | | | | | | | | | kvm_vm_ioctl() can fail, check its return value, and log an error when it failed. This fixes Coverity CID 1412229: Unchecked return value (CHECKED_RETURN) check_return: Calling kvm_vm_ioctl without checking return value Reported-by: Coverity (CID 1412229) Fixes: 235e8982ad3 ("support using KVM_MEM_READONLY flag for regions") Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20200221163336.2362-1-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel: Replace current_machine->accelerator by current_accel() wrapperPhilippe Mathieu-Daudé2020-01-241-2/+2
| | | | | | | | | | | We actually want to access the accelerator, not the machine, so use the current_accel() wrapper instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200121110349.25842-10-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel/kvm: Make "kernel_irqchip" default onXiaoyao Li2020-01-071-6/+13
| | | | | | | | | | | | | | | | Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property") moves kernel_irqchip property from "-machine" to "-accel kvm", but it forgets to set the default value of kernel_irqchip_allowed and kernel_irqchip_split. Also cleaning up the three useless members (kernel_irqchip_allowed, kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20191228104326.21732-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: convert "-machine kernel_irqchip" to an accelerator propertyPaolo Bonzini2019-12-171-5/+54
| | | | Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: introduce kvm_kernel_irqchip_* functionsPaolo Bonzini2019-12-171-4/+19
| | | | | | | | The KVMState struct is opaque, so provide accessors for the fields that will be moved from current_machine to the accelerator. For now they just forward to the machine object, but this will change. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: convert "-machine kvm_shadow_mem" to an accelerator propertyPaolo Bonzini2019-12-171-0/+43
| | | | Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: Reallocate dirty_bmap when we change a slotDr. David Alan Gilbert2019-12-171-15/+29
| | | | | | | | | | | | | | | | | | kvm_set_phys_mem can be called to reallocate a slot by something the guest does (e.g. writing to PAM and other chipset registers). This can happen in the middle of a migration, and if we're unlucky it can now happen between the split 'sync' and 'clear'; the clear asserts if there's no bmap to clear. Recreate the bmap whenever we change the slot, keeping the clear path happy. Typically this is triggered by the guest rebooting during a migrate. Corresponds to: https://bugzilla.redhat.com/show_bug.cgi?id=1772774 https://bugzilla.redhat.com/show_bug.cgi?id=1771032 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* kvm: Introduce KVM irqchip change notifierDavid Gibson2019-11-261-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Awareness of an in kernel irqchip is usually local to the machine and its top-level interrupt controller. However, in a few cases other things need to know about it. In particular vfio devices need this in order to accelerate interrupt delivery. If interrupt routing is changed, such devices may need to readjust their connection to the KVM irqchip. pci_bus_fire_intx_routing_notifier() exists to do just this. However, for the pseries machine type we have a situation where the routing remains constant but the top-level irq chip itself is changed. This occurs because of PAPR feature negotiation which allows the guest to decide between the older XICS and newer XIVE irq chip models (both of which are paravirtualized). To allow devices like vfio to adjust to this change, introduce a new notifier for the purpose kvm_irqchip_change_notify(). Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Alex Williamson <alex.williamson@redhat.com>
* core: replace getpagesize() with qemu_real_host_page_sizeWei Yang2019-10-261-3/+3
| | | | | | | | | | | | | | | | | | | | | There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel/kvm: ensure ret always setAlex Bennée2019-10-031-3/+3
| | | | | | | | | Some of the cross compilers rightly complain there are cases where ret may not be set. 0 seems to be the reasonable default unless particular slot explicitly returns -1. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: split too big memory section on several memslotsIgor Mammedov2019-09-301-44/+80
| | | | | | | | | | | | | | | | Max memslot size supported by kvm on s390 is 8Tb, move logic of splitting RAM in chunks upto 8T to KVM code. This way it will hide KVM specific restrictions in KVM code and won't affect board level design decisions. Which would allow us to avoid misusing memory_region_allocate_system_memory() API and eventually use a single hostmem backend for guest RAM. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20190924144751.24149-4-imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* kvm: clear dirty bitmaps from all overlapping memslotsPaolo Bonzini2019-09-301-14/+22
| | | | | | | | | | | | | | Currently MemoryRegionSection has 1:1 mapping to KVMSlot. However next patch will allow splitting MemoryRegionSection into several KVMSlot-s, make sure that kvm_physical_log_slot_clear() is able to handle such 1:N mapping. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20190924144751.24149-3-imammedo@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* kvm: extract kvm_log_clear_one_slotPaolo Bonzini2019-09-301-46/+57
| | | | | | | | | | | | | We may need to clear the dirty bitmap for more than one KVM memslot. First do some code movement with no semantic change. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20190924144751.24149-2-imammedo@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [fixup line break]
* sysemu: Split sysemu/runstate.h off sysemu/sysemu.hMarkus Armbruster2019-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]
* Include sysemu/sysemu.h a lot lessMarkus Armbruster2019-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | In my "build everything" tree, changing sysemu/sysemu.h triggers a recompile of some 5400 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/qdev-core.h includes sysemu/sysemu.h since recent commit e965ffa70a "qdev: add qdev_add_vm_change_state_handler()". This is a bad idea: hw/qdev-core.h is widely included. Move the declaration of qdev_add_vm_change_state_handler() to sysemu/sysemu.h, and drop the problematic include from hw/qdev-core.h. Touching sysemu/sysemu.h now recompiles some 1800 objects. qemu/uuid.h also drops from 5400 to 1800. A few more headers show smaller improvement: qemu/notify.h drops from 5600 to 5200, qemu/timer.h from 5600 to 4500, and qapi/qapi-types-run-state.h from 5500 to 5000. Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190812052359.30071-28-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>