summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* throttle: Make LeakyBucket.avg and LeakyBucket.max integer typesAlberto Garcia2017-08-293-7/+7
| | | | | | | | | | | | | | | | | | | | | | Both the throttling limits set with the throttling.iops-* and throttling.bps-* options and their QMP equivalents defined in the BlockIOThrottle struct are integer values. Those limits are also reported in the BlockDeviceInfo struct and they are integers there as well. Therefore there's no reason to store them internally as double and do the conversion everytime we're setting or querying them, so this patch uses uint64_t for those types. Let's also use an unsigned type because we don't allow negative values anyway. LeakyBucket.level and LeakyBucket.burst_level do however remain double because their value changes depending on the fraction of time elapsed since the previous I/O operation. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: f29b840422767b5be2c41c2dfdbbbf6c5f8fedf8.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* throttle: Remove throttle_fix_bucket() / throttle_unfix_bucket()Alberto Garcia2017-08-291-39/+23Star
| | | | | | | | | | | | | | The throttling code can change internally the value of bkt->max if it hasn't been set by the user. The problem with this is that if we want to retrieve the original value we have to undo this change first. This is ugly and unnecessary: this patch removes the throttle_fix_bucket() and throttle_unfix_bucket() functions completely and moves the logic to throttle_compute_wait(). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 5b0b9e1ac6eb208d709eddc7b09e7669a523bff3.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* throttle: Make throttle_is_valid() a bit less verboseAlberto Garcia2017-08-291-8/+7Star
| | | | | | | | | | Use a pointer to the bucket instead of repeating cfg->buckets[i] all the time. This makes the code more concise and will help us expand the checks later and save a few line breaks. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 763ffc40a26b17d54cf93f5a999e4656049fcf0c.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* throttle: Update the throttle_fix_bucket() documentationAlberto Garcia2017-08-291-8/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The way the throttling algorithm works is that requests start being throttled once the bucket level exceeds the burst limit. When we get there the bucket leaks at the level set by the user (bkt->avg), and that leak rate is what prevents guest I/O from exceeding the desired limit. If we don't allow bursts (i.e. bkt->max == 0) then we can start throttling requests immediately. The problem with keeping the threshold at 0 is that it only allows one request at a time, and as soon as there's a bit of I/O from the guest every other request will be throttled and performance will suffer considerably. That can even make the guest unable to reach the throttle limit if that limit is high enough, and that happens regardless of the block scheduler used by the guest. Increasing that threshold gives flexibility to the guest, allowing it to perform short bursts of I/O before being throttled. Increasing the threshold too much does not make a difference in the long run (because it's the leak rate what defines the actual throughput) but it does allow the guest to perform longer initial bursts and exceed the throttle limit for a short while. A burst value of bkt->avg / 10 allows the guest to perform 100ms' worth of I/O at the target rate without being throttled. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 31aae6645f0d1fbf3860fb2b528b757236f0c0a7.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* throttle: Fix wrong variable name in the header documentationAlberto Garcia2017-08-291-1/+1
| | | | | | | | | | The level of the burst bucket is stored in bkt.burst_level, not bkt.burst_length. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 49aab2711d02f285567f3b3b13a113847af33812.1503580370.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* nvme: Fix get/set number of queues feature, againDan Aloni2017-08-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The number of queues that should be return by the admin command should: 1) Only mention the number of non-admin queues. 2) It is zero-based, meaning that '0 == one non-admin queue', '1 == two non-admin queues', and so forth. Because our `num_queues` means the number of queues _plus_ the admin queue, then the right calculation for the number returned from the admin command is `num_queues - 2`, combining the two requirements mentioned. The issue was discovered by reducing num_queues from 64 to 8 and running a Linux VM with an SMP parameter larger than that (e.g. 22). It tries to utilize all queues, and therefore fails with an invalid queue number when trying to queue I/Os on the last queue. Signed-off-by: Dan Aloni <dan@kernelim.com> CC: Alex Friedman <alex@e8storage.com> CC: Keith Busch <keith.busch@intel.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* Update version for v2.10.0-rc4 releasePeter Maydell2017-08-241-1/+1
| | | | Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-08-23' into ↵Peter Maydell2017-08-2310-25/+170
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging nbd patches for 2017-08-23 - Fam Zheng: 0/4 block: Fix non-shared storage migration - Stefan Hajnoczi: qemu-iotests: add 194 non-shared storage migration test - Stefan Hajnoczi: nbd-client: avoid spurious qio_channel_yield() re-entry # gpg: Signature made Wed 23 Aug 2017 17:22:53 BST # gpg: using RSA key 0xA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2017-08-23: nbd-client: avoid spurious qio_channel_yield() re-entry qemu-iotests: add 194 non-shared storage migration test block: Update open_flags after ->inactivate() callback mirror: Mark target BB as "force allow inactivate" block-backend: Allow more "can inactivate" cases block-backend: Refactor inactivate check Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * nbd-client: avoid spurious qio_channel_yield() re-entryStefan Hajnoczi2017-08-232-14/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following scenario leads to an assertion failure in qio_channel_yield(): 1. Request coroutine calls qio_channel_yield() successfully when sending would block on the socket. It is now yielded. 2. nbd_read_reply_entry() calls nbd_recv_coroutines_enter_all() because nbd_receive_reply() failed. 3. Request coroutine is entered and returns from qio_channel_yield(). Note that the socket fd handler has not fired yet so ioc->write_coroutine is still set. 4. Request coroutine attempts to send the request body with nbd_rwv() but the socket would still block. qio_channel_yield() is called again and assert(!ioc->write_coroutine) is hit. The problem is that nbd_read_reply_entry() does not distinguish between request coroutines that are waiting to receive a reply and those that are not. This patch adds a per-request bool receiving flag so nbd_read_reply_entry() can avoid spurious aio_wake() calls. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170822125113.5025-1-stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * qemu-iotests: add 194 non-shared storage migration testStefan Hajnoczi2017-08-234-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | Non-shared storage migration with NBD and drive-mirror is currently not tested by qemu-iotests. This test case covers the basic migration scenario. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Based-on: <20170823134242.12080-1-famz@redhat.com> Message-Id: <20170823140506.28723-1-stefanha@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * block: Update open_flags after ->inactivate() callbackStefan Hajnoczi2017-08-231-4/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the ->inactivate() callbacks, permissions are updated, which typically involves a recursive check of the whole graph. Setting BDRV_O_INACTIVE right before doing that creates a state that bdrv_is_writable() returns false, which causes permission update failure. Reorder them so the flag is updated after calling the function. Note that this doesn't break the assert in bdrv_child_cb_inactivate() because for any specific BDS, we still update its flags first before calling ->inactivate() on it one level deeper in the recursion. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-5-famz@redhat.com> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * mirror: Mark target BB as "force allow inactivate"Fam Zheng2017-08-231-2/+12
| | | | | | | | | | | | | | Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-4-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * block-backend: Allow more "can inactivate" casesFam Zheng2017-08-232-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | These two conditions corresponds to mirror job's source and target, which need to be allowed as they are part of the non-shared storage migration workflow: failing to inactivate either will result in a failure during migration completion. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-3-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: improve comment grammar] Signed-off-by: Eric Blake <eblake@redhat.com>
| * block-backend: Refactor inactivate checkFam Zheng2017-08-231-5/+14
|/ | | | | | | | | The logic will be fixed (extended), move it to a separate function. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170823134242.12080-2-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* numa: Move numa_legacy_auto_assign_ram to pc-i440fx-2.9Eduardo Habkost2017-08-231-1/+1
| | | | | | | | | | | | | | | The 'm->numa_auto_assign_ram = numa_legacy_auto_assign_ram;' line was supposed to be in pc_i440fx_2_9_machine_options() (see commit 3bfe5716 "numa: equally distribute memory on nodes"), but the merge commit adb354dd ("Merge remote-tracking branch 'mst/tags/for_upstream' into staging") moved it to the pc_i440fx_2_10_machine_options(). Move the line back to pc_i440fx_2_9_machine_options(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-id: 20170818190943.23858-1-ehabkost@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* fix build failure in nbd_read_reply_entry()Igor Mammedov2017-08-231-1/+1
| | | | | | | | | | | | | travis builds fail at HEAD at rc3 master with block/nbd-client.c: In function ‘nbd_read_reply_entry’: block/nbd-client.c:110:8: error: ‘ret’ may be used uninitialized in this function [-Werror=uninitialized] fix it by initializing 'ret' to 0 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170823' ↵Peter Maydell2017-08-2313-31/+153
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging ppc patch queue 2017-08-23 This is identical to the pull request from yesterday (20180822), except that a bug in one patch is fixed so that it doesn't break TCG on a ppc host. Last minute ppc related fixes for qemu-2.10. I'm not sure if these are critical enough to prompt another rc, but I'm submitting them for consideration. First, is Cornelia's fix for 480bc11e6 which meant "make check" would always fail on a ppc host. Tracking that down delayed submission of the rest of these patches, sorry. The rest are all fairly important bugfixes for qemu crashes or guest behaviour regression on ppc. Patches 2-4 specifically are fixes for regressions from qemu-2.9, caused by the compatibility mode and hotplug handling cleanups for the pseries machine type. # gpg: Signature made Wed 23 Aug 2017 01:31:47 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.10-20170823: hw/ppc/spapr_iommu: Fix crash when removing the "spapr-tce-table" device hw/ppc/spapr_rtc: Mark the RTC device with user_creatable = false hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev' spapr: Allow configure-connector to be called multiple times ppc: fix ppc_set_compat() with KVM PR target/ppc: 'PVR != host PVR' in KVM_SET_SREGS workaround boot-serial-test: prefer tcg accelerator Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * hw/ppc/spapr_iommu: Fix crash when removing the "spapr-tce-table" deviceThomas Huth2017-08-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QEMU currently aborts unexpectedly when the user tries to add and remove a "spapr-tce-table" device: $ qemu-system-ppc64 -nographic -S -nodefaults -monitor stdio QEMU 2.9.92 monitor - type 'help' for more information (qemu) device_add spapr-tce-table,id=x (qemu) device_del x ** ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl) Aborted (core dumped) The device should not be accessable for the users at all, it's just used internally, so mark it with user_creatable = false. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * hw/ppc/spapr_rtc: Mark the RTC device with user_creatable = falseThomas Huth2017-08-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QEMU currently aborts unexpectedly when a user tries to do something like this: $ qemu-system-ppc64 -nographic -S -nodefaults -monitor stdio QEMU 2.9.92 monitor - type 'help' for more information (qemu) device_add spapr-rtc,id=spapr-rtc (qemu) device_del spapr-rtc ** ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl) Aborted (core dumped) The RTC device is not meant to be hot-pluggable - it's an internal device only and it even should not be possible to create it a second time with the "-device" parameter, so let's mark this with "user_creatable = false". Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev'Thomas Huth2017-08-225-19/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QEMU currently crashes when trying to use a 'pc-dimm' on the pseries machine without specifying its 'memdev' property. This happens because pc_dimm_get_memory_region() does not check whether the 'memdev' property has properly been set by the user. Looking closer at this function, it's also obvious that it is using &error_abort to call another function - and this is bad in a function that is used in the hot-plugging calling chain since this can also cause QEMU to exit unexpectedly. So let's fix these issues in a proper way now: Add a "Error **errp" parameter to pc_dimm_get_memory_region() which we use in case the 'memdev' property has not been set by the user, and which we can use instead of the &error_abort, and change the callers of get_memory_region() to make use of this "errp" parameter for proper error checking. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * spapr: Allow configure-connector to be called multiple timesBharata B Rao2017-08-221-7/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of in-kernel memory hot unplug, when the guest is not able to remove all the LMBs that are requested for removal, it will add back any LMBs that have been successfully removed. The DR Connectors of these LMBs wouldn't have been unconfigured and hence the addition of these LMBs will result in configure-connector call being issued on LMB DR connectors that are already in configured state. Such configure-connector calls will fail resulting in a DIMM which is partially unplugged. This however worked till recently before we overhauled the DRC implementation in QEMU. Commit 9d4c0f4f0a71e: "spapr: Consolidate DRC state variables" is the first commit where this problem shows up as per git bisect. Ideally guest shouldn't be issuing configure-connector call on an already configured DR connector. However for now, work around this in QEMU by allowing configure-connector to be called multiple times for all types of DR connectors. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> [dwg: Corrected buglet that would have initialized fdt pointers ready for reading on a device not present at reset] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * ppc: fix ppc_set_compat() with KVM PRGreg Kurz2017-08-221-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running in KVM PR mode, kvmppc_set_compat() always fail because the current PR implementation doesn't handle KVM_REG_PPC_ARCH_COMPAT. Now that the machine code inconditionally calls ppc_set_compat_all() at reset time to restore the compat mode default value (commit 66d5c492dd3a9), it is impossible to start a guest with PR: qemu-system-ppc64: Unable to set CPU compatibility mode in KVM: Invalid argument A tentative patch [1] was recently sent by Suraj to address the issue, but it would prevent the compat mode to be turned off on reset. And we really don't want to explicitely check for KVM PR. During the patch's review, David suggested that we should only call the KVM ioctl() if the compat PVR changes. This allows at least to run with KVM PR, provided no compat mode is requested from the command line (which should be the case when running PR nested). This is what this patch does. While here, we also fix the side effect where KVM would fail but we would change the CPU state in QEMU anyway. [1] http://patchwork.ozlabs.org/patch/782039/ Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * target/ppc: 'PVR != host PVR' in KVM_SET_SREGS workaroundDaniel Henrique Barboza2017-08-223-0/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d5fc133eed ("ppc: Rework CPU compatibility testing across migration") changed the way cpu_post_load behaves with the PVR setting, causing an unexpected bug in KVM-HV migrations between hosts that are compatible (POWER8 and POWER8E, for example). Even with pvr_match() returning true, the guest freezes right after cpu_post_load. The reason is that the guest kernel can't handle a different PVR value other that the running host in KVM_SET_SREGS. In [1] it was discussed the possibility of a new KVM capability that would indicate that the guest kernel can handle a different PVR in KVM_SET_SREGS. Even if such feature is implemented, there is still the problem with older kernels that will not have this capability and will fail to migrate. This patch implements a workaround for that scenario. If running with KVM, check if the guest kernel does not have the capability (named here as 'cap_ppc_pvr_compat'). If it doesn't, calls kvmppc_is_pr() to see if the guest is running in KVM-HV. If all this happens, set env->spr[SPR_PVR] to the same value as the current host PVR. This ensures that we allow migrations with 'close enough' PVRs to still work in KVM-HV but also makes the code ready for this new KVM capability when it is done. A new function called 'kvmppc_pvr_workaround_required' was created to encapsulate the conditions said above and to avoid calling too many kvm.c internals inside cpu_post_load. [1] https://lists.gnu.org/archive/html/qemu-ppc/2017-06/msg00503.html Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Fix for the case of using TCG on a PPC host] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * boot-serial-test: prefer tcg acceleratorCornelia Huck2017-08-221-1/+5
|/ | | | | | | | | | Prefer to use the tcg accelarator if it is available: This is our only real smoke test for tcg, and fast enough to use it for that. Fixes: 480bc11e6 ("boot-serial-test: fallback to kvm accelerator") Reported-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Update version for v2.10.0-rc3 releasePeter Maydell2017-08-151-1/+1
| | | | Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-08-15' into ↵Peter Maydell2017-08-1511-14/+157
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging nbd patches for 2017-08-15 - Eric Blake: nbd: Fix trace message for disconnect - Stefan Hajnoczi: qemu-iotests: step clock after each test iteration - Fam Zheng: 0/4 block: Fix non-shared storage migration - Eric Blake: nbd-client: Fix regression when server sends garbage # gpg: Signature made Tue 15 Aug 2017 16:06:02 BST # gpg: using RSA key 0xA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2017-08-15: nbd-client: Fix regression when server sends garbage iotests: Add non-shared storage migration case 192 block-backend: Defer shared_perm tightening migration completion nbd: Fix order of bdrv_set_perm and bdrv_invalidate_cache stubs: Add vm state change handler stubs qemu-iotests: step clock after each test iteration nbd: Fix trace message for disconnect Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * nbd-client: Fix regression when server sends garbageEric Blake2017-08-152-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we switched NBD to use coroutines for qemu 2.9 (in particular, commit a12a712a), we introduced a regression: if a server sends us garbage (such as a corrupted magic number), we quit the read loop but do not stop sending further queued commands, resulting in the client hanging when it never reads the response to those additional commands. In qemu 2.8, we properly detected that the server is no longer reliable, and cancelled all existing pending commands with EIO, then tore down the socket so that all further command attempts get EPIPE. Restore the proper behavior of quitting (almost) all communication with a broken server: Once we know we are out of sync or otherwise can't trust the server, we must assume that any further incoming data is unreliable and therefore end all pending commands with EIO, and quit trying to send any further commands. As an exception, we still (try to) send NBD_CMD_DISC to let the server know we are going away (in part, because it is easier to do that than to further refactor nbd_teardown_connection, and in part because it is the only command where we do not have to wait for a reply). Based on a patch by Vladimir Sementsov-Ogievskiy. A malicious server can be created with the following hack, followed by setting NBD_SERVER_DEBUG to a non-zero value in the environment when running qemu-nbd: | --- a/nbd/server.c | +++ b/nbd/server.c | @@ -919,6 +919,17 @@ static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp) | stl_be_p(buf + 4, reply->error); | stq_be_p(buf + 8, reply->handle); | | + static int debug; | + static int count; | + if (!count++) { | + const char *str = getenv("NBD_SERVER_DEBUG"); | + if (str) { | + debug = atoi(str); | + } | + } | + if (debug && !(count % debug)) { | + buf[0] = 0; | + } | return nbd_write(ioc, buf, sizeof(buf), errp); | } Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170814213426.24681-1-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
| * iotests: Add non-shared storage migration case 192Fam Zheng2017-08-153-0/+71
| | | | | | | | | | | | | | Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170815130740.31229-5-famz@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * block-backend: Defer shared_perm tightening migration completionFam Zheng2017-08-151-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | As in the case of nbd_export_new(), bdrv_invalidate_cache() can be called when migration is still in progress. In this case we are not ready to tighten the shared permissions fenced by blk->disable_perm. Defer to a VM state change handler. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170815130740.31229-4-famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * nbd: Fix order of bdrv_set_perm and bdrv_invalidate_cacheKevin Wolf2017-08-151-9/+11
| | | | | | | | | | | | | | | | | | | | | | The "inactive" state of BDS affects whether the permissions can be granted, we must call bdrv_invalidate_cache before bdrv_set_perm to support "-incoming defer" case. Reported-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170815130740.31229-3-famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * stubs: Add vm state change handler stubsFam Zheng2017-08-152-0/+15
| | | | | | | | | | | | | | | | | | They will be used by BlockBackend code in block-obj-y, which doesn't always get linked with common-obj-y. Add stubs to keep ld happy. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170815130740.31229-2-famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * qemu-iotests: step clock after each test iterationStefan Hajnoczi2017-08-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 093 throttling test submits twice as many requests as the throttle limit in order to ensure that we reach the limit. The remaining requests are left in-flight at the end of each test iteration. Commit 452589b6b47e8dc6353df257fc803dfc1383bed8 ("vl.c/exit: pause cpus before closing block devices") exposed a hang in 093. This happens because requests are still in flight when QEMU terminates but QEMU_CLOCK_VIRTUAL time is frozen. bdrv_drain_all() hangs forever since throttled requests cannot complete. Step the clock at the end of each test iteration so in-flight requests actually finish. This solves the hang and is cleaner than leaving tests in-flight. Note that this could also be "fixed" by disabling throttling when drives are closed in QEMU. That approach has two issues: 1. We must drain requests before disabling throttling, so the hang cannot be easily avoided! 2. Any time QEMU disables throttling internally there is a chance that malicious users can abuse the code path to bypass throttling limits. Therefore it makes more sense to fix the test case than to modify QEMU. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170815130502.8736-1-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * nbd: Fix trace message for disconnectEric Blake2017-08-151-1/+1
| | | | | | | | | | | | | | | | NBD_CMD_DISC is a disconnect request, not a data discard request. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170811015749.20365-1-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* | mmio-interface: Mark as not user creatablePeter Maydell2017-08-151-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mmio-interface device is not something we want to allow users to create on the command line: * it is intended as an implementation detail of the memory subsystem, which gets created and deleted by that subsystem on demand; it makes no sense to create it by hand on the command line * it uses a pointer property 'host_ptr' which can't be set on the command line Mark the device as not user_creatable to avoid confusion. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1502807418-9994-1-git-send-email-peter.maydell@linaro.org Reviewed-by: Thomas Huth <thuth@redhat.com>
* | target/arm: Require alignment for load exclusiveAlistair Francis2017-08-151-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the ARM ARM exclusive loads require the same alignment as exclusive stores. Let's update the memops used for the load to match that of the store. This adds the alignment requirement to the memops. Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20170815145714.17635-4-richard.henderson@linaro.org [rth: Require 16-byte alignment for 64-bit LDXP.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* | target/arm: Correct load exclusive pair atomicityRichard Henderson2017-08-151-23/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are not providing the required single-copy atomic semantics for the 64-bit operation that is the 32-bit paired load. At the same time, leave the entire 64-bit value in cpu_exclusive_val and stop writing to cpu_exclusive_high. This means that we do not have to re-assemble the 64-bit quantity when it comes time to store. At the same time, drop a redundant temporary and perform all loads directly into the cpu_exclusive_* globals. Tested-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Alistair Francis <alistair.francis@xilinx.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20170815145714.17635-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* | target/arm: Correct exclusive store cmpxchg memop maskAlistair Francis2017-08-151-1/+1
|/ | | | | | | | | | | | | | | | | | When we perform the atomic_cmpxchg operation we want to perform the operation on a pair of 32-bit registers. Previously we were just passing the register size in which was set to MO_32. This would result in the high register to be ignored. To fix this issue we hardcode the size to be 64-bits long when operating on 32-bit pairs. Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Tested-by: Portia Stephens <portia.stephens@xilinx.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20170815145714.17635-2-richard.henderson@linaro.org Message-Id: <bc18dddca56e8c2ea4a3def48d33ceb5d21d1fff.1502488636.git.alistair.francis@xilinx.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* qxl: call qemu_spice_display_init_common for secondary devicesPaolo Bonzini2017-08-151-1/+1
| | | | | | | | | | | | | | | | Fixes this 2.10 regression: $ qemu-system-x86_64 -cpu host -m 6144 -vga qxl -device qxl qemu-system-x86_64: util/qemu-thread-posix.c:64: qemu_mutex_lock: Assertion `mutex->initialized' failed. Reported-by: ademaria@redhat.com Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20170814231552.24593-1-pbonzini@redhat.com Cc: kraxel@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' ↵Peter Maydell2017-08-152-35/+20Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging # gpg: Signature made Tue 15 Aug 2017 14:27:34 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/tracing-pull-request: simpletrace: fix flight recorder --no-header option trace: use static event ID mapping in simpletrace.stp Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * simpletrace: fix flight recorder --no-header optionStefan Hajnoczi2017-08-151-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The simpletrace.py script can pretty-print flight recorder ring buffers. These are not full simpletrace binary trace files but just the end of a trace file. There is no header and the event ID mapping information is often unavailable since the ring buffer may have filled up and discarded event ID mapping records. The simpletrace.stp script that generates ring buffer traces uses the same trace-events-all input file as simpletrace.py. Therefore both scripts have the same global ordering of trace events. A dynamic event ID mapping isn't necessary: just use the trace-events-all file as the reference for how event IDs are numbered. It is now possible to analyze simpletrace.stp ring buffers again using: $ ./simpletrace.py trace-events-all path/to/ring-buffer Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170815084430.7128-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * trace: use static event ID mapping in simpletrace.stpStefan Hajnoczi2017-08-151-29/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a partial revert of commit 7f1b588f20d027730676e627713ae3bbf6baab04 ("trace: emit name <-> ID mapping in simpletrace header"), which broke the SystemTap flight recorder because event mapping records may not be present in the ring buffer when the trace is analyzed. This means simpletrace.py --no-header does not know the event ID mapping needed to pretty-print the trace. Instead of numbering events dynamically, use a static event ID mapping as dictated by the event order in the trace-events-all file. The simpletrace.py script also uses trace-events-all so the next patch will fix the simpletrace.py --no-header option to take advantage of this knowledge. Cc: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170815084430.7128-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* | Merge remote-tracking branch 'remotes/famz/tags/build-and-test-pull-request' ↵Peter Maydell2017-08-154-28/+82
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging # gpg: Signature made Tue 15 Aug 2017 11:50:36 BST # gpg: using RSA key 0xCA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/build-and-test-pull-request: docker: add centos7 image docker: install more packages on CentOS to extend code coverage docker: add Xen libs to centos6 image docker: use one package per line in CentOS config Makefile: Let "make check-help" work without running ./configure Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * docker: add centos7 imagePhilippe Mathieu-Daudé2017-08-151-0/+31
| | | | | | | | | | | | Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170728233316.13352-5-f4bug@amsat.org> Signed-off-by: Fam Zheng <famz@redhat.com>
| * docker: install more packages on CentOS to extend code coveragePhilippe Mathieu-Daudé2017-08-151-0/+10
| | | | | | | | | | | | Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170728233316.13352-4-f4bug@amsat.org> Signed-off-by: Fam Zheng <famz@redhat.com>
| * docker: add Xen libs to centos6 imagePhilippe Mathieu-Daudé2017-08-151-1/+2
| | | | | | | | | | | | Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170728233316.13352-3-f4bug@amsat.org> Signed-off-by: Fam Zheng <famz@redhat.com>
| * docker: use one package per line in CentOS configPhilippe Mathieu-Daudé2017-08-151-4/+14
| | | | | | | | | | | | | | | | | | This ease rebase/cherry-pick, also it is faster to visually find if a package is here. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170728233316.13352-2-f4bug@amsat.org> Signed-off-by: Fam Zheng <famz@redhat.com>
| * Makefile: Let "make check-help" work without running ./configureFam Zheng2017-08-152-23/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if you do "make check-help" in a fresh checkout, only an error is printed which is not nice: $ make check-help V=1 cc -nostdlib -o check-help.mo cc: fatal error: no input files compilation terminated. rules.mak:115: recipe for target 'check-help.mo' failed make: *** [check-help.mo] Error 1 Move the config-host.mak condition into the body of tests/Makefile.include and always include the rule for check-help. Reported-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170810085025.14076-1-famz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Fam Zheng <famz@redhat.com>
* | Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20170815' into stagingPeter Maydell2017-08-151-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Regression fix for 'make clean' on s390x. # gpg: Signature made Tue 15 Aug 2017 08:24:54 BST # gpg: using RSA key 0xDECF6B93C6F02FAF # gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>" # gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>" # gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>" # gpg: aka "Cornelia Huck <cohuck@kernel.org>" # gpg: aka "Cornelia Huck <cohuck@redhat.com>" # Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF * remotes/cohuck/tags/s390x-20170815: pc-bios/s390-ccw: Use rm command during make clean Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * pc-bios/s390-ccw: Use rm command during make cleanEric Farman2017-08-151-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | This reverts a change that replaced the "rm -f" command with the undefined variable RM (expected to be set by make), and causes the "make clean" command to fail for a s390 target: make[1]: Entering directory '/usr/src/qemu/build/pc-bios/s390-ccw' rm -f *.timestamp *.o *.d *.img *.elf *~ *.a /bin/sh: *.o: command not found Makefile:39: recipe for target 'clean' failed make[1]: *** [clean] Error 127 make[1]: Leaving directory '/usr/src/qemu/build/pc-bios/s390-ccw' Makefile:489: recipe for target 'clean' failed make: *** [clean] Error 1 Fixes: 3e4415a751d2 ("pc-bios/s390-ccw: Add core files for the network bootloading program") Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Message-Id: <20170814204450.24118-2-farman@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
* xlnx-qspi: add a property for mmio-executionKONRAD Frederic2017-08-141-3/+33
| | | | | | | | | | | This adds mmio-exec property to workaround the migration bug. When enabled the migration is blocked and will return an error. Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Message-id: 1502438052-5117-1-git-send-email-frederic.konrad@adacore.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>