summaryrefslogtreecommitdiffstats
path: root/hw/core
Commit message (Collapse)AuthorAgeFilesLines
* core/qdev: fix memleak in qdev_get_gpio_out_connector()Pan Nengyuan2020-03-091-1/+1
| | | | | | | | | | Fix a memory leak in qdev_get_gpio_out_connector(). Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200307030756.5913-1-pannengyuan@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* multifd: Add zstd compression multifd supportJuan Quintela2020-02-281-1/+1
| | | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Add zlib compression multifd supportJuan Quintela2020-02-281-1/+1
| | | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Add multifd-compression parameterJuan Quintela2020-02-281-0/+13
| | | | | | | | | | | | This will store the compression method to use. We start with none. Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Rename multifd-method to multifd-compression
* Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of ↵Paolo Bonzini2020-02-253-78/+79
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://github.com/patchew-project/qemu into HEAD This series removes ad hoc RAM allocation API (memory_region_allocate_system_memory) and consolidates it around hostmem backend. It allows to * resolve conflicts between global -mem-prealloc and hostmem's "policy" option, fixing premature allocation before binding policy is applied * simplify complicated memory allocation routines which had to deal with 2 ways to allocate RAM. * reuse hostmem backends of a choice for main RAM without adding extra CLI options to duplicate hostmem features. A recent case was -mem-shared, to enable vhost-user on targets that don't support hostmem backends [1] (ex: s390) * move RAM allocation from individual boards into generic machine code and provide them with prepared MemoryRegion. * clean up deprecated NUMA features which were tied to the old API (see patches) - "numa: remove deprecated -mem-path fallback to anonymous RAM" - (POSTPONED, waiting on libvirt side) "forbid '-numa node,mem' for 5.0 and newer machine types" - (POSTPONED) "numa: remove deprecated implicit RAM distribution between nodes" Introduce a new machine.memory-backend property and wrapper code that aliases global -mem-path and -mem-alloc into automatically created hostmem backend properties (provided memory-backend was not set explicitly given by user). A bulk of trivial patches then follow to incrementally convert individual boards to using machine.memory-backend provided MemoryRegion. Board conversion typically involves: * providing MachineClass::default_ram_size and MachineClass::default_ram_id so generic code could create default backend if user didn't explicitly provide memory-backend or -m options * dropping memory_region_allocate_system_memory() call * using convenience MachineState::ram MemoryRegion, which points to MemoryRegion allocated by ram-memdev On top of that for some boards: * missing ram_size checks are added (typically it were boards with fixed ram size) * ram_size fixups are replaced by checks and hard errors, forcing user to provide correct "-m" values instead of ignoring it and continuing running. After all boards are converted, the old API is removed and memory allocation routines are cleaned up.
| * remove no longer used memory_region_allocate_system_memory()Igor Mammedov2020-02-191-34/+0Star
| | | | | | | | | | | | | | | | | | | | | | all boards were switched to using memdev backend for main RAM, so we can drop no longer used memory_region_allocate_system_memory() Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200219160953.13771-73-imammedo@redhat.com>
| * null-machine: use memdev for RAMIgor Mammedov2020-02-191-5/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-40-imammedo@redhat.com>
| * initialize MachineState::ram in NUMA caseIgor Mammedov2020-02-191-13/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of NUMA there are 2 cases to consider: 1. '-numa node,memdev', the only one that will be available for 5.0 and newer machine types. In this case reuse current behavior, with only difference memdevs are put into MachineState::ram container + a temporary glue to keep memory_region_allocate_system_memory() working until all boards converted. 2. fake NUMA ("-numa node mem" and default RAM splitting) the later has been deprecated and will be removed but the former is going to stay available for compat reasons for 5.0 and older machine types it takes allocate_system_memory_nonnuma() path, like non-NUMA case and falls under conversion to memdev. So extend non-NUMA MachineState::ram initialization introduced in previous patch to take care of fake NUMA case. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200219160953.13771-6-imammedo@redhat.com>
| * machine: introduce convenience MachineState::ramIgor Mammedov2020-02-192-13/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the new field will be used by boards to get access to main RAM memory region and will help to save boiler plate in boards which often introduce a field or variable just for this purpose. Memory region will be equivalent to what currently used memory_region_allocate_system_memory() is returning apart from that it will come from hostmem backend. Followup patches will incrementally switch boards to using RAM from MachineState::ram. Patch takes care of non-NUMA case and follow up patch will initialize MachineState::ram for NUMA case. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-5-imammedo@redhat.com>
| * machine: introduce memory-backend propertyIgor Mammedov2020-02-191-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Property will contain link to memory backend that will be used for backing initial RAM. Follow up commit will alias -mem-path and -mem-prealloc CLI options into memory backend options to make memory handling consistent (using only hostmem backend family for guest RAM allocation). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200219160953.13771-3-imammedo@redhat.com>
| * numa: remove deprecated -mem-path fallback to anonymous RAMIgor Mammedov2020-02-191-17/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | it has been deprecated since 4.0 by commit cb79224b7 (deprecate -mem-path fallback to anonymous RAM) Deprecation period ran out and it's time to remove it so it won't get in a way of switching to using hostmem backend for RAM. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-2-imammedo@redhat.com>
* | virtio: increase virtqueue size for virtio-scsi and virtio-blkDenis Plotnikov2020-02-221-0/+2
|/ | | | | | | | | | | | | | | | | | | | | | The goal is to reduce the amount of requests issued by a guest on 1M reads/writes. This rises the performance up to 4% on that kind of disk access pattern. The maximum chunk size to be used for the guest disk accessing is limited with seg_max parameter, which represents the max amount of pices in the scatter-geather list in one guest disk request. Since seg_max is virqueue_size dependent, increasing the virtqueue size increases seg_max, which, in turn, increases the maximum size of data to be read/write from a guest disk. More details in the original problem statment: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Message-id: 20200214074648.958-1-dplotnikov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qxl: introduce hardware revision 5Gerd Hoffmann2020-02-131-0/+2
| | | | | | | | | | | | | | The only difference to hardware revision 4 is that the device doesn't switch to VGA mode in case someone happens to touch a VGA register, which should make things more robust in configurations with multiple vga devices. Swtiching back to VGA mode happens on reset, either full machine reset or qxl device reset (QXL_IO_RESET ioport command). Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20200206074358.4274-1-kraxel@redhat.com
* hw/core: Allow setting 'virtio-blk-device.scsi' property on OSX hostPhilippe Mathieu-Daudé2020-02-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | Commit ed65fd1a2750 ("virtio-blk: switch off scsi-passthrough by default") changed the default value of the 'scsi' property of virtio-blk, which is only available on Linux hosts. It also added an unconditional compat entry for 2.4 or earlier machines. Trying to set this property on a pre-2.5 machine on OSX, we get: Unexpected error in object_property_find() at qom/object.c:1201: qemu-system-x86_64: -device virtio-blk-pci,id=scsi0,drive=drive0: can't apply global virtio-blk-device.scsi=true: Property '.scsi' not found Fix this error by marking the property optional. Fixes: ed65fd1a27 ("virtio-blk: switch off scsi-passthrough by default") Suggested-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 20200207001404.1739-1-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* hw/*/Makefile.objs: Move many .o files to common-objsThomas Huth2020-02-041-1/+1
| | | | | | | | | | | | | We have many files that apparently do not depend on the target CPU configuration, i.e. which can be put into common-obj-y instead of obj-y. This way, the code can be shared for example between qemu-system-arm and qemu-system-aarch64, or the various big and little endian variants like qemu-system-sh4 and qemu-system-sh4eb, so that we do not have to compile the code multiple times anymore. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200130133841.10779-1-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
* hw/core: deprecate old reset functions and introduce new onesDamien Hedde2020-01-303-0/+15
| | | | | | | | | | | | | | | | | | | | | Deprecate device_legacy_reset(), qdev_reset_all() and qbus_reset_all() to be replaced by new functions device_cold_reset() and bus_cold_reset() which uses resettable API. Also introduce resettable_cold_reset_fn() which may be used as a replacement for qdev_reset_all_fn and qbus_reset_all_fn(). Following patches will be needed to look at legacy reset call sites and switch to resettable api. The legacy functions will be removed when unused. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-9-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* hw/core/qdev: update hotplug reset regarding resettableDamien Hedde2020-01-301-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit make use of the resettable API to reset the device being hotplugged when it is realized. Also it ensures it is put in a reset state coherent with the parent it is plugged into. Note that there is a difference in the reset. Instead of resetting only the hotplugged device, we reset also its subtree (switch to resettable API). This is not expected to be a problem because sub-buses are just realized too. If a hotplugged device has any sub-buses it is logical to reset them too at this point. The recently added should_be_hidden and PCI's partially_hotplugged mechanisms do not interfere with realize operation: + In the should_be_hidden use case, device creation is delayed. + The partially_hotplugged mechanism prevents a device to be unplugged and unrealized from qdev POV and unrealized. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-8-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* hw/core/qdev: handle parent bus change regarding resettableDamien Hedde2020-01-301-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In qdev_set_parent_bus(), when changing the parent bus of a realized device, if the source and destination buses are not in the same reset state, some adaptations are required. This patch adds needed call to resettable_change_parent() to make sure a device reset state stays coherent with its parent bus. The addition is a no-op if: 1. the device being parented is not realized. 2. the device is realized, but both buses are not under reset. Case 2 means that as long as qdev_set_parent_bus() is called during the machine realization procedure (which is before the machine reset so nothing is in reset), it is a no op. There are 52 call sites of qdev_set_parent_bus(). All but one fall into the no-op case: + 29 trivial calls related to virtio (in hw/{s390x,display,virtio}/ {vhost,virtio}-xxx.c) to set a vdev(or vgpu) composing device parent bus just before realizing the same vdev(vgpu). + hw/core/qdev.c: when creating a device in qdev_try_create() + hw/core/sysbus.c: when initializing a device in the sysbus + hw/i386/amd_iommu.c: before realizing AMDVIState/pci + hw/isa/piix4.c: before realizing PIIX4State/rtc + hw/misc/auxbus.c: when creating an AUXBus + hw/misc/auxbus.c: when creating an AUXBus child + hw/misc/macio/macio.c: when initializing a MACIOState child + hw/misc/macio/macio.c: before realizing NewWorldMacIOState/pmu + hw/misc/macio/macio.c: before realizing NewWorldMacIOState/cuda + hw/net/virtio-net.c: Used for migration when using the failover mechanism to migration a vfio-pci/net. It is a no-op because at this point the device is already on the bus. + hw/pci-host/designware.c: before realizing DesignwarePCIEHost/root + hw/pci-host/gpex.c: before realizing GPEXHost/root + hw/pci-host/prep.c: when initialiazing PREPPCIState/pci_dev + hw/pci-host/q35.c: before realizing Q35PCIHost/mch + hw/pci-host/versatile.c: when initializing PCIVPBState/pci_dev + hw/pci-host/xilinx-pcie.c: before realizing XilinxPCIEHost/root + hw/s390x/event-facility.c: when creating SCLPEventFacility/ TYPE_SCLP_QUIESCE + hw/s390x/event-facility.c: ditto with SCLPEventFacility/ TYPE_SCLP_CPU_HOTPLUG + hw/s390x/sclp.c: Not trivial because it is called on a SLCPDevice just after realizing it. Ok because at this point the destination bus (sysbus) is not in reset; the realize step is before the machine reset. + hw/sd/core.c: Not OK. Used in sdbus_reparent_card(). See below. + hw/ssi/ssi.c: Used to put spi slave on spi bus and connect the cs line in ssi_auto_connect_slave(). Ok because this function is only used in realize step in hw/ssi/aspeed_smc.ci, hw/ssi/imx_spi.c, hw/ssi/mss-spi.c, hw/ssi/xilinx_spi.c and hw/ssi/xilinx_spips.c. + hw/xen/xen-legacy-backend.c: when creating a XenLegacyDevice device + qdev-monitor.c: in device hotplug creation procedure before realize Note that this commit alone will have no effect, right now there is no use of resettable API to reset anything. So a bus will never be tagged as in-reset by this same API. The one place where side-effect will occurs is in hw/sd/core.c in sdbus_reparent_card(). This function is only used in the raspi machines, including during the sysbus reset procedure. This case will be carrefully handled when doing the multiple phase reset transition. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-7-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* hw/core/resettable: add support for changing parentDamien Hedde2020-01-302-2/+61
| | | | | | | | | | | | | | | | | | | | | | | | Add a function resettable_change_parent() to do the required plumbing when changing the parent a of Resettable object. We need to make sure that the reset state of the object remains coherent with the reset state of the new parent. We make the 2 following hypothesis: + when an object is put in a parent under reset, the object goes in reset. + when an object is removed from a parent under reset, the object leaves reset. The added function avoids any glitch if both old and new parent are already in reset. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-6-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* hw/core: add Resettable support to BusClass and DeviceClassDamien Hedde2020-01-302-0/+190
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds support of Resettable interface to buses and devices: + ResettableState structure is added in the Bus/Device state + Resettable methods are implemented. + device/bus_is_in_reset function defined This commit allows to transition the objects to the new multi-phase interface without changing the reset behavior at all. Object single reset method can be split into the 3 different phases but the 3 phases are still executed in a row for a given object. From the qdev/qbus reset api point of view, nothing is changed. qdev_reset_all() and qbus_reset_all() are not modified as well as device_legacy_reset(). Transition of an object must be done from parent class to child class. Care has been taken to allow the transition of a parent class without requiring the child classes to be transitioned at the same time. Note that SysBus and SysBusDevice class do not need any transition because they do not override the legacy reset method. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-5-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* hw/core: create Resettable QOM interfaceDamien Hedde2020-01-303-0/+256
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit defines an interface allowing multi-phase reset. This aims to solve a problem of the actual single-phase reset (built in DeviceClass and BusClass): reset behavior is dependent on the order in which reset handlers are called. In particular doing external side-effect (like setting an qemu_irq) is problematic because receiving object may not be reset yet. The Resettable interface divides the reset in 3 well defined phases. To reset an object tree, all 1st phases are executed then all 2nd then all 3rd. See the comments in include/hw/resettable.h for a more complete description. The interface defines 3 phases to let the future possibility of holding an object into reset for some time. The qdev/qbus reset in DeviceClass and BusClass will be modified in following commits to use this interface. A mechanism is provided to allow executing a transitional reset handler in place of the 2nd phase which is executed in children-then-parent order inside a tree. This will allow to transition devices and buses smoothly while keeping the exact current qdev/qbus reset behavior for now. Documentation will be added in a following commit. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-4-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* hw/core/qdev: add trace events to help with resettable transitionDamien Hedde2020-01-302-3/+35
| | | | | | | | | | | | | Adds trace events to reset procedure and when updating the parent bus of a device. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-3-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* add device_legacy_reset function to prepare for reset api changeDamien Hedde2020-01-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Provide a temporary device_legacy_reset function doing what device_reset does to prepare for the transition with Resettable API. All occurrence of device_reset in the code tree are also replaced by device_legacy_reset. The new resettable API has different prototype and semantics (resetting child buses as well as the specified device). Subsequent commits will make the changeover for each call site individually; once that is complete device_legacy_reset() will be removed. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-2-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* hw/core/or-irq: Fix incorrect assert forbidding num-lines == MAX_OR_LINESPeter Maydell2020-01-301-1/+1
| | | | | | | | | | | | | | | | | | | The num-lines property of the TYPE_OR_GATE device sets the number of input lines it has. An assert() in or_irq_realize() restricts this to the maximum supported by the implementation. However we got the condition in the assert wrong: it should be using <=, because num-lines == MAX_OR_LINES is permitted, and means that all entries from 0 to MAX_OR_LINES-1 in the s->levels[] array are used. We didn't notice this previously because no user has so far needed that many input lines. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Message-id: 20200120142235.10432-1-peter.maydell@linaro.org
* hw/core/loader: Let load_elf() populate a field with CPU-specific flagsAleksandar Markovic2020-01-292-19/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While loading the executable, some platforms (like AVR) need to detect CPU type that executable is built for - and, with this patch, this is enabled by reading the field 'e_flags' of the ELF header of the executable in question. The change expands functionality of the following functions: - load_elf() - load_elf_as() - load_elf_ram() - load_elf_ram_sym() The argument added to these functions is called 'pflags' and is of type 'uint32_t*' (that matches 'pointer to 'elf_word'', 'elf_word' being the type of the field 'e_flags', in both 32-bit and 64-bit variants of ELF header). Callers are allowed to pass NULL as that argument, and in such case no lookup to the field 'e_flags' will happen, and no information will be returned, of course. CC: Richard Henderson <rth@twiddle.net> CC: Peter Maydell <peter.maydell@linaro.org> CC: Edgar E. Iglesias <edgar.iglesias@gmail.com> CC: Michael Walle <michael@walle.cc> CC: Thomas Huth <huth@tuxfamily.org> CC: Laurent Vivier <laurent@vivier.eu> CC: Philippe Mathieu-Daudé <f4bug@amsat.org> CC: Aleksandar Rikalo <aleksandar.rikalo@rt-rk.com> CC: Aurelien Jarno <aurelien@aurel32.net> CC: Jia Liu <proljc@gmail.com> CC: David Gibson <david@gibson.dropbear.id.au> CC: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> CC: BALATON Zoltan <balaton@eik.bme.hu> CC: Christian Borntraeger <borntraeger@de.ibm.com> CC: Thomas Huth <thuth@redhat.com> CC: Artyom Tarasenko <atar4qemu@gmail.com> CC: Fabien Chouteau <chouteau@adacore.com> CC: KONRAD Frederic <frederic.konrad@adacore.com> CC: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Aleksandar Rikalo <aleksandar.rikalo@rt-rk.com> Signed-off-by: Michael Rolnik <mrolnik@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com> Message-Id: <1580079311-20447-24-git-send-email-aleksandar.markovic@rt-rk.com>
* qdev: register properties as class propertiesMarc-André Lureau2020-01-242-58/+63
| | | | | | | | | | | | | | | | | | | Use class properties facilities to add properties to the class during device_class_set_props(). qdev_property_add_static() must be adapted as PropertyInfo now operates with classes (and not instances), so we must set_default_value() on the ObjectProperty, before calling its init() method on the object instance. Also, PropertyInfo.create() is now exclusively used for class properties. Fortunately, qdev_property_add_static() is only used in target/arm/cpu.c so far, which doesn't use "link" properties (that require create()). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-22-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qdev: move instance properties to class propertiesMarc-André Lureau2020-01-241-11/+13
| | | | | | Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-21-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qdev: rename DeviceClass.propsPaolo Bonzini2020-01-242-5/+5
| | | | | | Ensure that conflicts in the future will cause a syntax error. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qdev: set properties with device_class_set_props()Marc-André Lureau2020-01-246-5/+10
| | | | | | | | | | | | | | | | | | | | | The following patch will need to handle properties registration during class_init time. Let's use a device_class_set_props() setter. spatch --macro-file scripts/cocci-macro-file.h --sp-file ./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place --dir . @@ typedef DeviceClass; DeviceClass *d; expression val; @@ - d->props = val + device_class_set_props(d, val) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qdev: move helper function to monitor/miscMarc-André Lureau2020-01-241-26/+0Star
| | | | | | | | | Move the one-user function to the place it is being used. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200110153039.1379601-5-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qdev: remove extraneous errorMarc-André Lureau2020-01-241-11/+4Star
| | | | | | | | | All callers use error_abort, and even the function itself calls with error_abort. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-4-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qdev: remove duplicated qdev_property_add_static() docMarc-André Lureau2020-01-241-10/+0Star
| | | | | | | | The function is already documented in the header. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-3-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* hw/core/Makefile: Group generic objects versus system-mode objectsPhilippe Mathieu-Daudé2020-01-241-14/+14
| | | | | | | | | | | | To ease review/modifications of this Makefile, group generic objects first, then system-mode specific ones, and finally peripherals (which are only used in system-mode). No logical changes introduced here. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200118140619.26333-7-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* hw/core: Restrict reset handlers API to system-modePhilippe Mathieu-Daudé2020-01-241-1/+2
| | | | | | | | | | | The user-mode code does not use this API, restrict it to the system-mode. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20200118140619.26333-6-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* cpu: Introduce cpu_class_set_parent_reset()Greg Kurz2020-01-241-0/+8
| | | | | | | | | | | | | | Similarly to what we already do with qdev, use a helper to overload the reset QOM methods of the parent in children classes, for clarity. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <157650847239.354886.2782881118916307978.stgit@bahia.lan> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* migration: Define VMSTATE_INSTANCE_ID_ANYPeter Xu2020-01-201-1/+2
| | | | | | | | | | | Define the new macro VMSTATE_INSTANCE_ID_ANY for callers who wants to auto-generate the vmstate instance ID. Previously it was hard coded as -1 instead of this macro. It helps to change this default value in the follow up patches. No functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* usb-redir: remove 'remote wakeup' flag from configuration descriptorYuri Benditovich2020-01-131-0/+1
| | | | | | | | | | | | | | | | If the redirected device has this capability, Windows guest may place the device into D2 and expect it to wake when the device becomes active, but this will never happen. For example, when internal Bluetooth adapter is redirected, keyboards and mice connected to it do not work. Current commit removes this capability (starting from machine 5.0) Set 'usb-redir.suppress-remote-wake' property to 'off' to keep 'remote wake' as is or to 'on' to remove 'remote wake' on 4.2 or earlier. Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Message-id: 20200108091044.18055-3-yuri.benditovich@daynix.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
* usb-host: remove 'remote wakeup' flag from configuration descriptorYuri Benditovich2020-01-131-0/+1
| | | | | | | | | | | | | | | | If the redirected device has this capability, Windows guest may place the device into D2 and expect it to wake when the device becomes active, but this will never happen. For example, when internal Bluetooth adapter is redirected, keyboards and mice connected to it do not work. Current commit removes this capability (starting from machine 5.0) Set 'usb-host.suppress-remote-wake' property to 'off' to keep 'remote wake' as is or to 'on' to remove 'remote wake' on 4.2 or earlier. Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Message-id: 20200108091044.18055-2-yuri.benditovich@daynix.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
* Merge remote-tracking branch 'remotes/elmarco/tags/prop-ptr-pull-request' ↵Peter Maydell2020-01-073-63/+2Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging Clean-ups: qom-ify serial and remove QDEV_PROP_PTR Hi, QDEV_PROP_PTR is marked in multiple places as "FIXME/TODO/remove me". In most cases, it can be easily replaced with QDEV_PROP_LINK when the pointer points to an Object. There are a few places where such substitution isn't possible. For those places, it seems reasonable to use a specific setter method instead, and keep the user_creatable = false. In other places, proper usage of qdev or other facilies is the solution. The serial code wasn't converted to qdev, which makes it a bit more archaic to deal with. Let's convert it first, so we can more easily embed it from other devices, and re-export some properties and drop QDEV_PROP_PTR usage. # gpg: Signature made Tue 07 Jan 2020 15:01:26 GMT # gpg: using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5 # gpg: issuer "marcandre.lureau@redhat.com" # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full] # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full] # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * remotes/elmarco/tags/prop-ptr-pull-request: (37 commits) qdev/qom: remove some TODO limitations now that PROP_PTR is gone qdev: remove QDEV_PROP_PTR qdev: remove PROP_MEMORY_REGION omap-gpio: remove PROP_PTR omap-i2c: remove PROP_PTR omap-intc: remove PROP_PTR smbus-eeprom: remove PROP_PTR cris: improve passing PIC interrupt vector to the CPU mips/cps: fix setting saar property qdev: use g_strcmp0() instead of open-coding it leon3: use qdev gpio facilities for the PIL leon3: use qemu_irq framework instead of callback as property dp8393x: replace PROP_PTR with PROP_LINK etraxfs: remove PROP_PTR usage lance: replace PROP_PTR with PROP_LINK vmmouse: replace PROP_PTR with PROP_LINK sm501: make SerialMM a child, export chardev property mips: use sysbus_mmio_get_region() instead of internal fields mips: use sysbus_add_io() mips: baudbase is 115200 by default ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * qdev/qom: remove some TODO limitations now that PROP_PTR is goneMarc-André Lureau2020-01-071-8/+0Star
| | | | | | | | | | Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
| * qdev: remove QDEV_PROP_PTRMarc-André Lureau2020-01-071-18/+0Star
| | | | | | | | | | | | | | | | No longer used in the tree. The comment about user_creatable is still quite relevant, but there is already a similar comment in qdev-core.h. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
| * qdev: use g_strcmp0() instead of open-coding itMarc-André Lureau2020-01-071-5/+2Star
| | | | | | | | | | | | | | Minor code simplification. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
| * sysbus: remove unused sysbus_try_create*Marc-André Lureau2020-01-071-32/+0Star
| | | | | | | | | | | | Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
* | Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell2020-01-072-0/+365
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio, pci, pc: fixes, features Bugfixes all over the place. HMAT support. New flags for vhost-user-blk utility. Auto-tuning of seg max for virtio storage. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Mon 06 Jan 2020 17:05:05 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (32 commits) intel_iommu: add present bit check for pasid table entries intel_iommu: a fix to vtd_find_as_from_bus_num() virtio-net: delete also control queue when TX/RX deleted virtio: reset region cache when on queue deletion virtio-mmio: update queue size on guest write tests: add virtio-scsi and virtio-blk seg_max_adjust test virtio: make seg_max virtqueue size dependent hw: fix using 4.2 compat in 5.0 machine types for i440fx/q35 vhost-user-scsi: reset the device if supported vhost-user: add VHOST_USER_RESET_DEVICE to reset devices hw/pci/pci_host: Let pci_data_[read/write] use unsigned 'size' argument hw/pci/pci_host: Remove redundant PCI_DPRINTF() virtio-mmio: Clear v2 transport state on soft reset ACPI: add expected files for HMAT tests (acpihmat) tests/bios-tables-test: add test cases for ACPI HMAT tests/numa: Add case for QMP build HMAT hmat acpi: Build Memory Side Cache Information Structure(s) hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) hmat acpi: Build Memory Proximity Domain Attributes Structure(s) numa: Extend CLI to provide memory side cache information ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * virtio: make seg_max virtqueue size dependentDenis Plotnikov2020-01-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the patch, seg_max parameter was immutable and hardcoded to 126 (128 - 2) without respect to queue size. This has two negative effects: 1. when queue size is < 128, we have Virtio 1.1 specfication violation: (2.6.5.3.1 Driver Requirements) seq_max must be <= queue_size. This violation affects the old Linux guests (ver < 4.14). These guests crash on these queue_size setups. 2. when queue_size > 128, as was pointed out by Denis Lunev <den@virtuozzo.com>, seg_max restrics guest's block request length which affects guests' performance making them issues more block request than needed. https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html To mitigate this two effects, the patch adds the property adjusting seg_max to queue size automaticaly. Since seg_max is a guest visible parameter, the property is machine type managable and allows to choose between old (seg_max = 126 always) and new (seg_max = queue_size - 2) behaviors. Not to change the behavior of the older VMs, prevent setting the default seg_max_adjust value for older machine types. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Message-Id: <20191220140905.1718-2-dplotnikov@virtuozzo.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * numa: Extend CLI to provide memory side cache informationLiu Jingqi2020-01-051-0/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add -numa hmat-cache option to provide Memory Side Cache Information. These memory attributes help to build Memory Side Cache Information Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT). Before using hmat-cache option, enable HMAT with -machine hmat=on. Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191213011929.2520-4-tao3.xu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
| * numa: Extend CLI to provide memory latency and bandwidth informationLiu Jingqi2020-01-051-0/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add -numa hmat-lb option to provide System Locality Latency and Bandwidth Information. These memory attributes help to build System Locality Latency and Bandwidth Information Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT). Before using hmat-lb option, enable HMAT with -machine hmat=on. Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191213011929.2520-3-tao3.xu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
| * numa: Extend CLI to provide initiator information for numa nodesTao Xu2020-01-052-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT), The initiator represents processor which access to memory. And in 5.2.27.3 Memory Proximity Domain Attributes Structure, the attached initiator is defined as where the memory controller responsible for a memory proximity domain. With attached initiator information, the topology of heterogeneous memory can be described. Add new machine property 'hmat' to enable all HMAT specific options. Extend CLI of "-numa node" option to indicate the initiator numa node-id. In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report the platform's HMAT tables. Before using initiator option, enable HMAT with -machine hmat=on. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191213011929.2520-2-tao3.xu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio-pci: disable vring processing when bus-mastering is disabledMichael Roth2020-01-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the SLOF firmware for pseries guests will disable/re-enable a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND register after the initial probe/feature negotiation, as it tends to work with a single device at a time at various stages like probing and running block/network bootloaders without doing a full reset in-between. In QEMU, when PCI_COMMAND_MASTER is disabled we disable the corresponding IOMMU memory region, so DMA accesses (including to vring fields like idx/flags) will no longer undergo the necessary translation. Normally we wouldn't expect this to happen since it would be misbehavior on the driver side to continue driving DMA requests. However, in the case of pseries, with iommu_platform=on, we trigger the following sequence when tearing down the virtio-blk dataplane ioeventfd in response to the guest unsetting PCI_COMMAND_MASTER: #2 0x0000555555922651 in virtqueue_map_desc (vdev=vdev@entry=0x555556dbcfb0, p_num_sg=p_num_sg@entry=0x7fffe657e1a8, addr=addr@entry=0x7fffe657e240, iov=iov@entry=0x7fffe6580240, max_num_sg=max_num_sg@entry=1024, is_write=is_write@entry=false, pa=0, sz=0) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757 #3 0x0000555555922a89 in virtqueue_pop (vq=vq@entry=0x555556dc8660, sz=sz@entry=184) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950 #4 0x00005555558d3eca in virtio_blk_get_request (vq=0x555556dc8660, s=0x555556dbcfb0) at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255 #5 0x00005555558d3eca in virtio_blk_handle_vq (s=0x555556dbcfb0, vq=0x555556dc8660) at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776 #6 0x000055555591dd66 in virtio_queue_notify_aio_vq (vq=vq@entry=0x555556dc8660) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550 #7 0x000055555591ecef in virtio_queue_notify_aio_vq (vq=0x555556dc8660) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546 #8 0x000055555591ecef in virtio_queue_host_notifier_aio_poll (opaque=0x555556dc86c8) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527 #9 0x0000555555d02164 in run_poll_handlers_once (ctx=ctx@entry=0x55555688bfc0, timeout=timeout@entry=0x7fffe65844a8) at /home/mdroth/w/qemu.git/util/aio-posix.c:520 #10 0x0000555555d02d1b in try_poll_mode (timeout=0x7fffe65844a8, ctx=0x55555688bfc0) at /home/mdroth/w/qemu.git/util/aio-posix.c:607 #11 0x0000555555d02d1b in aio_poll (ctx=ctx@entry=0x55555688bfc0, blocking=blocking@entry=true) at /home/mdroth/w/qemu.git/util/aio-posix.c:639 #12 0x0000555555d0004d in aio_wait_bh_oneshot (ctx=0x55555688bfc0, cb=cb@entry=0x5555558d5130 <virtio_blk_data_plane_stop_bh>, opaque=opaque@entry=0x555556de86f0) at /home/mdroth/w/qemu.git/util/aio-wait.c:71 #13 0x00005555558d59bf in virtio_blk_data_plane_stop (vdev=<optimized out>) at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288 #14 0x0000555555b906a1 in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38) at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245 #15 0x0000555555b90dbb in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38) at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237 #16 0x0000555555b92a8e in virtio_pci_stop_ioeventfd (proxy=0x555556db4e40) at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292 #17 0x0000555555b92a8e in virtio_write_config (pci_dev=0x555556db4e40, address=<optimized out>, val=1048832, len=<optimized out>) at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613 I.e. the calling code is only scheduling a one-shot BH for virtio_blk_data_plane_stop_bh, but somehow we end up trying to process an additional virtqueue entry before we get there. This is likely due to the following check in virtio_queue_host_notifier_aio_poll: static bool virtio_queue_host_notifier_aio_poll(void *opaque) { EventNotifier *n = opaque; VirtQueue *vq = container_of(n, VirtQueue, host_notifier); bool progress; if (!vq->vring.desc || virtio_queue_empty(vq)) { return false; } progress = virtio_queue_notify_aio_vq(vq); namely the call to virtio_queue_empty(). In this case, since no new requests have actually been issued, shadow_avail_idx == last_avail_idx, so we actually try to access the vring via vring_avail_idx() to get the latest non-shadowed idx: int virtio_queue_empty(VirtQueue *vq) { bool empty; ... if (vq->shadow_avail_idx != vq->last_avail_idx) { return 0; } rcu_read_lock(); empty = vring_avail_idx(vq) == vq->last_avail_idx; rcu_read_unlock(); return empty; but since the IOMMU region has been disabled we get a bogus value (0 usually), which causes virtio_queue_empty() to falsely report that there are entries to be processed, which causes errors such as: "virtio: zero sized buffers are not allowed" or "virtio-blk missing headers" and puts the device in an error state. This patch works around the issue by introducing virtio_set_disabled(), which sets a 'disabled' flag to bypass checks like virtio_queue_empty() when bus-mastering is disabled. Since we'd check this flag at all the same sites as vdev->broken, we replace those checks with an inline function which checks for either vdev->broken or vdev->disabled. The 'disabled' flag is only migrated when set, which should be fairly rare, but to maintain migration compatibility we disable it's use for older machine types. Users requiring the use of the flag in conjunction with older machine types can set it explicitly as a virtio-device option. NOTES: - This leaves some other oddities in play, like the fact that DRIVER_OK also gets unset in response to bus-mastering being disabled, but not restored (however the device seems to continue working) - Similarly, we disable the host notifier via virtio_bus_stop_ioeventfd(), which seems to move the handling out of virtio-blk dataplane and back into the main IO thread, and it ends up staying there till a reset (but otherwise continues working normally) Cc: David Gibson <david@gibson.dropbear.id.au>, Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Message-Id: <20191120005003.27035-1-mdroth@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | Merge remote-tracking branch ↵Peter Maydell2020-01-063-3/+42
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'remotes/elmarco/tags/dbus-vmstate7-pull-request' into staging Add dbus-vmstate Hi, With external processes or helpers participating to the VM support, it becomes necessary to handle their migration. Various options exist to transfer their state: 1) as the VM memory, RAM or devices (we could say that's how vhost-user devices can be handled today, they are expected to restore from ring state) 2) other "vmstate" (as with TPM emulator state blobs) 3) left to be handled by management layer 1) is not practical, since an external processes may legitimatelly need arbitrary state date to back a device or a service, or may not even have an associated device. 2) needs ad-hoc code for each helper, but is simple and working 3) is complicated for management layer, QEMU has the migration timing The proposed "dbus-vmstate" object will connect to a given D-Bus address, and save/load from org.qemu.VMState1 owners on migration. Thus helpers can easily have their state migrated with QEMU, without implementing ad-hoc support (such as done for TPM emulation) D-Bus is ubiquitous on Linux (it is systemd IPC), and can be made to work on various other OSes. There are several implementations and good bindings for various languages. (the tests/dbus-vmstate-test.c is a good example of how simple the implementation of services can be, even in C) dbus-vmstate is put into use by the libvirt series "[PATCH 00/23] Use a slirp helper process". v2: - fix build with broken mingw-glib # gpg: Signature made Mon 06 Jan 2020 14:43:35 GMT # gpg: using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5 # gpg: issuer "marcandre.lureau@redhat.com" # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full] # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full] # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * remotes/elmarco/tags/dbus-vmstate7-pull-request: tests: add dbus-vmstate-test tests: add migration-helpers unit dockerfiles: add dbus-daemon to some of latest distributions configure: add GDBUS_CODEGEN Add dbus-vmstate object util: add dbus helper unit docs: start a document to describe D-Bus usage vmstate: replace DeviceState with VMStateIf vmstate: add qom interface to get id Signed-off-by: Peter Maydell <peter.maydell@linaro.org>