summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* pci: Add root bus parameter to pci_nic_init()David Gibson2013-07-071-2/+4
| | | | | | | | | | | | | At present, pci_nic_init() and pci_nic_init_nofail() assume that they will only create a NIC under the primary PCI root. As we add support for multiple PCI roots, that may no longer be the case. This patch adds a root bus parameter to pci_nic_init() (and updates callers accordingly) to allow the machine init code using it to specify the right PCI root for NICs created by old-style -net nic parameters. NICs created new-style, with -device can of course be put anywhere. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* pci: Add root bus argument to pci_get_bus_devfn()David Gibson2013-07-071-1/+1
| | | | | | | | | | | | | pci_get_bus_devfn() interprets a full PCI address string to give a PCIBus * and device/function number within that bus. Currently it assumes it is working on an address under the primary PCI root bus. This patch extends it to allow the caller to specify a root bus. This might seem a little odd since the supplied address can (theoretically) include a PCI domain number. However, attempting to use a non-zero domain number there is currently an error, so that shouldn't really cause problems. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* pci: Replace pci_find_domain() with more general pci_root_bus_path()David Gibson2013-07-072-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pci_find_domain() is used in a number of places where we want an id for a whole PCI domain (i.e. the subtree under a PCI root bus). The trouble is that many platforms may support multiple independent host bridges with no hardware supplied notion of domain number. This patch, therefore, replaces calls to pci_find_domain() with calls to a new pci_root_bus_path() returning a string. The new call is implemented in terms of a new callback in the host bridge class, so it can be defined in some way that's well defined for the platform. When no callback is available we fall back on the qbus name. Most current uses of pci_find_domain() are for error or informational messages, so the change in identifiers should be harmless. The exception is pci_get_dev_path(), whose results form part of migration streams. To maintain compatibility with old migration streams, the PIIX PCI host is altered to always supply "0000" for this path, which matches the old domain number (since the code didn't actually support domains other than 0). For the pseries (spapr) PCI bridge we use a different platform-unique identifier (pseries machines can routinely have dozens of PCI host bridges). Theoretically that breaks migration streams, but given that we don't yet have migration support for pseries, it doesn't matter. Any other machines that have working migration support including PCI devices will need to be updated to maintain migration stream compatibility. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* pci: Use helper to find device's root bus in pci_find_domain()David Gibson2013-07-071-1/+2
| | | | | | | | | | | | | Currently pci_find_domain() performs two functions - it locates the PCI root bus above the given bus, then looks up that root bus's domain number. This patch adds a helper function to perform the first task, finding the root bus for a given PCI device. This is then used in pci_find_domain(). This changes pci_find_domain()'s signature slightly, taking a PCIDevice instead of a PCIBus - since all callers passed something of the form dev->bus, this simplifies things slightly. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* pci: Abolish pci_find_root_bus()David Gibson2013-07-071-1/+1
| | | | | | | | | | | | | pci_find_root_bus() takes a domain parameter. Currently PCI root buses with domain other than 0 can't be created, so this is more or less a long winded way of retrieving the main PCI root bus. Numbered domains don't actually properly cover the (non x86) possibilities for multiple PCI root buses, so this patch for now enforces the domain == 0 restriction in other places to replace pci_find_root_bus() with an explicit pci_find_primary_bus(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* pci: Move pci_read_devaddr to pci-hotplug-old.cDavid Gibson2013-07-041-2/+2
| | | | | | | | | pci_read_devaddr() is only used by the legacy functions for the old PCI hotplug interface in pci-hotplug-old.c. So we move the function there, and make it static. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* pvpanic: initialization cleanupMichael S. Tsirkin2013-07-041-1/+1
| | | | | | | | | | | | | | | | | Avoid use of static variables: PC systems initialize pvpanic device through pvpanic_init, so we can simply create the fw_cfg file at that point. This also makes it possible to skip device creation completely if fw_cfg is not there, e.g. for xen - so the ports it reserves are not discoverable by guests. Also, make pvpanic_init void since callers ignore return status anyway. Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Paul Durrant <Paul.Durrant@citrix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* pc: pass PCI hole ranges to GuestsMichael S. Tsirkin2013-07-041-0/+1
| | | | | | | | | Guest currently has to jump through lots of hoops to guess the PCI hole ranges. It's fragile, and makes us change BIOS each time we add a new chipset. Let's report the window in a ROM file, to make BIOS do exactly what QEMU intends. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* pci: store PCI hole ranges in guestinfo structureMichael S. Tsirkin2013-07-043-1/+21
| | | | | | Will be used to pass hole ranges to guests. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* range: add Range structureMichael S. Tsirkin2013-07-041-0/+16
| | | | | | | | | | | | | Sometimes we need to pass ranges around, add a handy structure for this purpose. Note: memory.c defines its own concept of AddrRange structure for working with 128 addresses. It's necessary there for doing range math. This is not needed for most users: struct Range is much simpler, and is only used for passing the range around. Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* PPC: Add clock-frequency export for Mac machinesAlexander Graf2013-07-011-0/+1
| | | | | | | | | | | Support in fwcfg has been around for exposure of the clock-frequency CPU property. OpenBIOS reads it, we just never exposed it. Since Mac OS X is very picky about its clock frequency values, let's just take a known good value and always expose that. Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Alexander Graf <agraf@suse.de>
* spapr-rtas: add CPU argument to RTAS callsAnthony Liguori2013-07-011-2/+3
| | | | | | | | | | | | | | | | | | | | | RTAS is a hypervisor provided binary blob that a guest loads and calls into to execute certain functions. It's similar to the vsyscall page in Linux or the short lived VMCI paravirt interface from VMware. The QEMU implementation of the RTAS blob is simply a passthrough that proxies all RTAS calls to the hypervisor via an hypercall. While we pass a CPU argument for hypercall handling in QEMU, we don't pass it for RTAS calls. Since some RTAs calls require making hypercalls (normally RTAS is implemented as guest code) we have nasty hacks to allow that. Add a CPU argument to RTAS call handling so we can more easily invoke hypercalls just as guest code would. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* intc/openpic_kvm: Fix QOM and build issuesAndreas Färber2013-07-011-0/+1
| | | | | Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Alexander Graf <agraf@suse.de>
* intc/openpic: QOM'ifyAndreas Färber2013-07-011-0/+2
| | | | | | | | Introduce type constant and cast macro. Signed-off-by: Andreas Färber <afaerber@suse.de> Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* kvm/openpic: in-kernel mpic supportScott Wood2013-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Enables support for the in-kernel MPIC that thas been merged into the KVM next branch. This includes irqfd/KVM_IRQ_LINE support from Alex Graf (along with some other improvements). Note from Alex regarding kvm_irqchip_create(): On x86, one would call kvm_irqchip_create() to initialize an in-kernel interrupt controller. That function then goes ahead and initializes global capability variables as well as the default irq routing table. On ppc, we can't call kvm_irqchip_create() because we can have different types of interrupt controllers. So we want to do all the things that function would do for us in the in-kernel device init handler. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: squash in kvm_irqchip_commit_routes patch, fix non-kvm build, fix ppcemb] Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PIC: Only commit irq routing when necessaryAlexander Graf2013-07-011-0/+1
| | | | | | | | | | | | | The current logic updates KVM's view of our interrupt map every time we change it. While this is nice and bullet proof, it slows things down badly for me. QEMU spends about 3 seconds on every start telling KVM what news it has on its routing maps. Instead, let's just synchronize the whole irq routing map as a whole when we're done constructing it. For things that change during runtime, we can still update the routing table on demand. Signed-off-by: Alexander Graf <agraf@suse.de>
* openpic: factor out some common defines into openpic.hScott Wood2013-07-011-0/+11
| | | | | | | ...for use by the KVM in-kernel irqchip stub. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: Export kvm_init_irq_routingAlexander Graf2013-07-011-0/+1
| | | | | | | | | | On PPC, we can have different types of interrupt controllers, so we really only know that we are going to use one when we created it. Export kvm_init_irq_routing() to common code, so that we don't have to call kvm_irqchip_create(). Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: Don't assume that mpstate exists with in-kernel PIC alwaysAlexander Graf2013-07-011-0/+10
| | | | | | | | | | | | | | On PPC, we don't support MP state. So far it's not necessary and I'm not convinced yet that we really need to support it ever. However, the current idle logic in QEMU assumes that an in-kernel PIC also means we support MP state. This assumption is not true anymore. Let's split up the two cases into two different variables. That way PPC can expose an in-kernel PIC, while not implementing MP state. Signed-off-by: Alexander Graf <agraf@suse.de> CC: Jan Kiszka <jan.kiszka@siemens.com>
* Merge remote-tracking branch 'mjt/trivial-patches' into stagingAnthony Liguori2013-06-281-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | # By Gerd Hoffmann (13) and Michael Tokarev (1) # Via Michael Tokarev * mjt/trivial-patches: doc: we use seabios, not bochs bios qemu-socket: don't leak opts on error qemu-char: report udp backend errors qemu-char: add -chardev mux support qemu-char: minor mux chardev fixes qemu-char: use ChardevBackendKind in CharDriver qemu-char: don't leak opts on error qemu-char: fix documentation for telnet+wait socket flags qemu-char: print notification to stderr qemu-char: use more specific error_setg_* variants qemu-char: check optional fields using has_* qemu-socket: catch monitor_get_fd failures qemu-socket: drop pointless allocation qemu-socket: zero-initialize SocketAddress Message-id: 1372443465-22384-1-git-send-email-mjt@msgid.tls.msk.ru Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
| * qemu-char: use ChardevBackendKind in CharDriverGerd Hoffmann2013-06-281-1/+1
| | | | | | | | | | | | Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
* | Merge remote-tracking branch 'afaerber/qom-cpu' into stagingAnthony Liguori2013-06-2810-32/+142
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | # By Andreas Färber # Via Andreas Färber * afaerber/qom-cpu: (24 commits) cpu: Turn cpu_unassigned_access() into a CPUState hook hwaddr: Make hwaddr type usable beyond softmmu cpu: Change qemu_init_vcpu() argument to CPUState cpus: Change qemu_dummy_start_vcpu() argument to CPUState cpus: Change qemu_kvm_start_vcpu() argument to CPUState cpus: Change cpu_handle_guest_debug() argument to CPUState gdbstub: Set gdb_set_stop_cpu() argument to CPUState kvm: Change kvm_cpu_exec() argument to CPUState kvm: Change kvm_handle_internal_error() argument to CPUState cpu: Turn cpu_dump_{state,statistics}() into CPUState hooks cpus: Change qemu_kvm_init_cpu_signals() argument to CPUState kvm: Change kvm_set_signal_mask() argument to CPUState cpus: Change qemu_kvm_wait_io_event() argument to CPUState cpus: Change cpu_thread_is_idle() argument to CPUState cpu: Change cpu_exit() argument to CPUState kvm: Change cpu_synchronize_state() argument to CPUState kvm: Change kvm_cpu_synchronize_state() argument to CPUState gdbstub: Simplify find_cpu() cpu: Guard cpu_{save,load}() definitions target-openrisc: Register VMStateDescription for OpenRISCCPU ...
| * cpu: Turn cpu_unassigned_access() into a CPUState hookAndreas Färber2013-06-281-0/+33
| | | | | | | | | | | | | | Use it for all targets, but be careful not to pass invalid CPUState. cpu_single_env can be NULL, e.g. on Xen. Signed-off-by: Andreas Färber <afaerber@suse.de>
| * hwaddr: Make hwaddr type usable beyond softmmuAndreas Färber2013-06-284-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | While not normally needed for *-user, it can safely be used there since always based on uint64_t, to avoid ifdeffery. To avoid accidental uses, move the guards from exec/hwaddr.h to its inclusion sites. No need for them in include/hw/. Prepares for hwaddr use in qom/cpu.h. Signed-off-by: Andreas Färber <afaerber@suse.de>
| * cpu: Change qemu_init_vcpu() argument to CPUStateAndreas Färber2013-06-282-8/+8
| | | | | | | | | | | | | | | | This allows to move the call into CPUState's realizefn. Therefore move the stub into libqemustub.a. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Andreas Färber <afaerber@suse.de>
| * gdbstub: Set gdb_set_stop_cpu() argument to CPUStateAndreas Färber2013-06-281-1/+1
| | | | | | | | | | | | | | | | | | Use CPUState::env_ptr for now. Prepares for changing cpu_handle_guest_debug() argument to CPUState. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Andreas Färber <afaerber@suse.de>
| * kvm: Change kvm_cpu_exec() argument to CPUStateAndreas Färber2013-06-281-1/+1
| | | | | | | | | | | | | | | | | | | | It no longer uses CPUArchState. Prepares for changing qemu_kvm_cpu_thread_fn() opaque to CPUState. Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Andreas Färber <afaerber@suse.de>
| * cpu: Turn cpu_dump_{state,statistics}() into CPUState hooksAndreas Färber2013-06-283-11/+43
| | | | | | | | | | | | | | | | | | Make cpustats monitor command available unconditionally. Prepares for changing kvm_handle_internal_error() and kvm_cpu_exec() arguments to CPUState. Signed-off-by: Andreas Färber <afaerber@suse.de>
| * kvm: Change kvm_set_signal_mask() argument to CPUStateAndreas Färber2013-06-281-1/+1
| | | | | | | | | | | | | | | | | | | | CPUArchState is no longer needed. Prepares for changing qemu_kvm_init_cpu_signals() argument to CPUState. Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Andreas Färber <afaerber@suse.de>
| * cpu: Change cpu_exit() argument to CPUStateAndreas Färber2013-06-282-2/+8
| | | | | | | | | | | | | | | | It no longer depends on CPUArchState, so move it to qom/cpu.c. Prepares for changing GDBState::c_cpu to CPUState. Signed-off-by: Andreas Färber <afaerber@suse.de>
| * kvm: Change cpu_synchronize_state() argument to CPUStateAndreas Färber2013-06-281-2/+2
| | | | | | | | | | | | | | | | Change Monitor::mon_cpu to CPUState as well. Reviewed-by: liguang <lig.fnst@cn.fujitsu.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andreas Färber <afaerber@suse.de>
| * kvm: Change kvm_cpu_synchronize_state() argument to CPUStateAndreas Färber2013-06-281-2/+2
| | | | | | | | | | | | | | | | | | It no longer relies on CPUArchState since 20d695a. Reviewed-by: liguang <lig.fnst@cn.fujitsu.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Andreas Färber <afaerber@suse.de>
| * cpu: Guard cpu_{save,load}() definitionsAndreas Färber2013-06-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | A few targets already managed to implement cpu_save() and cpu_load() without defining CPU_SAVE_VERSION that causes them to be registered. Guard the prototypes with CPU_SAVE_VERSION to avoid this happening again until all targets are converted to VMState (or QIDL). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Andreas Färber <afaerber@suse.de>
| * cpu: Introduce VMSTATE_CPU() macro for CPUStateAndreas Färber2013-06-281-0/+14
| | | | | | | | | | | | | | To be used to embed common CPU state into CPU subclasses. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Andreas Färber <afaerber@suse.de>
| * cpu: Introduce device_class_set_vmsd() helperAndreas Färber2013-06-281-0/+21
| | | | | | | | | | | | | | | | | | It's the equivalent to cpu_class_set_vmsd(), to assign DeviceClass::vmsd. It wasn't needed before since only static, unmigratable VMStateDescriptions were assigned so far. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Andreas Färber <afaerber@suse.de>
| * cpu: Fix cpu_class_set_vmsd() documentationAndreas Färber2013-06-281-1/+1
| | | | | | | | | | | | | | | | It's CPUClass::vmsd, not CPUState::vmsd. Reviewed-by: liguang <lig.fnst@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Andreas Färber <afaerber@suse.de>
* | block: change default of .has_zero_init to 0Peter Lieven2013-06-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .has_zero_init defaults to 1 for all formats and protocols. this is a dangerous default since this means that all new added drivers need to manually overwrite it to 0 if they do not ensure that a device is zero initialized after bdrv_create(). if a driver needs to explicitly set this value to 1 its easier to verify the correctness in the review process. during review of the existing drivers it turned out that ssh and gluster had a wrong default of 1. both protocols support host_devices as backend which are not by default zero initialized. this wrong assumption will lead to possible corruption if qemu-img convert is used to write to such a backend. vpc and vmdk also defaulted to 1 altough they support fixed respectively flat extends. this has to be addresses in separate patches. both formats as well as the mentioned ssh and gluster are turned to the default of 0 with this patch for safety. a similar problem with the wrong default existed for iscsi most likely because the driver developer did oversee the default value of 1. Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: add basic backup support to block driverDietmar Maurer2013-06-281-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | backup_start() creates a block job that copies a point-in-time snapshot of a block device to a target block device. We call backup_do_cow() for each write during backup. That function reads the original data from the block device before it gets overwritten. The data is then written to the target device. Currently backup cluster size is hardcoded to 65536 bytes. [I made a number of changes to Dietmar's original patch and folded them in to make code review easy. Here is the full list: * Drop BackupDumpFunc interface in favor of a target block device * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes() * Use 0 delay instead of 1us, like other block jobs * Unify creation/start functions into backup_start() * Simplify cleanup, free bitmap in backup_run() instead of cb * function * Use HBitmap to avoid duplicating bitmap code * Use bdrv_getlength() instead of accessing ->total_sectors * directly * Delete the backup.h header file, it is no longer necessary * Move ./backup.c to block/backup.c * Remove #ifdefed out code * Coding style and whitespace cleanups * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks * Keep our own in-flight CowRequest list instead of using block.c tracked requests. This means a little code duplication but is much simpler than trying to share the tracked requests list and use the backup block size. * Add on_source_error and on_target_error error handling. * Use trace events instead of DPRINTF() -- stefanha] Signed-off-by: Dietmar Maurer <dietmar@proxmox.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: add bdrv_add_before_write_notifier()Stefan Hajnoczi2013-06-281-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bdrv_add_before_write_notifier() function installs a callback that is invoked before a write request is processed. This will be used to implement copy-on-write point-in-time snapshots where we need to copy out old data before overwriting it. Note that BdrvTrackedRequest is moved to block_int.h since it is passed to .notify() functions. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | notify: add NotiferWithReturn so notifier list can abortStefan Hajnoczi2013-06-281-0/+29
|/ | | | | | | | | | | | | | notifier_list_notify() has no return value. This is fine when we just want to invoke side-effects. Sometimes it's useful for notifiers to produce a return value. This allows notifiers to "veto" an operation and will be used by the block layer before-write notifier. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* linux-user: Fix compilation failurePeter Maydell2013-06-272-1/+2
| | | | | | | | | Fix compilation failures for linux-user targets following recent migration related commits bd2fa51fcd and 43487c67. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1372362818-4740-1-git-send-email-peter.maydell@linaro.org Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* rdma: introduce capability x-rdma-pin-allMichael R. Hines2013-06-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This capability allows you to disable dynamic chunk registration for better throughput on high-performance links. For example, using an 8GB RAM virtual machine with all 8GB of memory in active use and the VM itself is completely idle using a 40 gbps infiniband link: 1. x-rdma-pin-all disabled total time: approximately 7.5 seconds @ 9.5 Gbps 2. x-rdma-pin-all enabled total time: approximately 4 seconds @ 26 Gbps These numbers would of course scale up to whatever size virtual machine you have to migrate using RDMA. Enabling this feature does *not* have any measurable affect on migration *downtime*. This is because, without this feature, all of the memory will have already been registered already in advance during the bulk round and does not need to be re-registered during the successive iteration rounds. Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* rdma: new QEMUFileOps hooksMichael R. Hines2013-06-272-0/+49
| | | | | | | | | | | | | | | | | | | | | These are the prototypes and implementation of new hooks that RDMA takes advantage of to perform dynamic page registration. An optional hook is also introduced for a custom function to be able to override the default save_page function. Also included are the prototypes and accessor methods used by arch_init.c which invoke funtions inside savevm.c to call out to the hooks that may or may not have been overridden inside of QEMUFileOps. Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* rdma: introduce qemu_ram_foreach_block()Michael R. Hines2013-06-271-0/+5
| | | | | | | | | | | | | | This is used during RDMA initialization in order to transmit a description of all the RAM blocks to the peer for later dynamic chunk registration purposes. Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* rdma: export qemu_fflush()Michael R. Hines2013-06-271-0/+1
| | | | | | | | | | | | | RDMA uses this to flush the control channel before sending its own message to handle page registrations. Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* rdma: introduce qemu_file_mode_is_not_valid()Michael R. Hines2013-06-271-0/+1
| | | | | | | | | | | | | QEMUFileRDMA also has read and write modes. This function is now shared to reduce code duplication. Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* rdma: export throughput w/ MigrationStats QMPMichael R. Hines2013-06-271-0/+1
| | | | | | | | | | | | This exposes throughput (in megabits/sec) through QMP. Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* rdma: export yield_until_fd_readable()Michael R. Hines2013-06-271-0/+6
| | | | | | | | | | | | | | The RDMA event channel can be made non-blocking just like a TCP socket. Exporting this function allows us to yield so that the QEMU monitor remains available. Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* rdma: introduce qemu_update_position()Michael R. Hines2013-06-272-0/+3
| | | | | | | | | | | | | | RDMA writes happen asynchronously, and thus the performance accounting also needs to be able to occur asynchronously. This allows anybody to call into savevm.c to update both f->pos as well as into arch_init.c to update the acct_info structure with up-to-date values when the RDMA transfer actually completes. Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Tested-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* Revert "xen: start PCI hole at 0xe0000000 (same as pc_init1 and ↵Stefano Stabellini2013-06-251-3/+0Star
| | | | | | | | | | | | qemu-xen-traditional)" This reverts commit 9f24a8030a70ea4954b5b8c48f606012f086f65f. The start of the PCI hole is actually set to 0xf0000000 by hvmloader. In order to retain ABI compatibility with Xen we leave the start of the PCI hole at 0xf0000000 in QEMU (for Xen) too. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>