summaryrefslogtreecommitdiffstats
path: root/include/block
Commit message (Collapse)AuthorAgeFilesLines
...
* block/amend: add 'force' optionMaxim Levitsky2020-07-062-0/+2
| | | | | | | | | | | | | | 'force' option will be used for some unsafe amend operations. This includes things like erasing last keyslot in luks based formats (which destroys the data, unless the master key is backed up by external means), but that _might_ be desired result. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200608094030.670121-4-mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* osdep: Make MIN/MAX evaluate arguments only onceEric Blake2020-06-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not aware of any immediate bugs in qemu where a second runtime evaluation of the arguments to MIN() or MAX() causes a problem, but proactively preventing such abuse is easier than falling prey to an unintended case down the road. At any rate, here's the conversation that sparked the current patch: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg05718.html Update the MIN/MAX macros to only evaluate their argument once at runtime; this uses typeof(1 ? (a) : (b)) to ensure that we are promoting the temporaries to the same type as the final comparison (we have to trigger type promotion, as typeof(bitfield) won't compile; and we can't use typeof((a) + (b)) or even typeof((a) + 0), as some of our uses of MAX are on void* pointers where such addition is undefined). However, we are unable to work around gcc refusing to compile ({}) in a constant context (such as the array length of a static variable), even when only used in the dead branch of a __builtin_choose_expr(), so we have to provide a second macro pair MIN_CONST and MAX_CONST for use when both arguments are known to be compile-time constants and where the result must also be usable as a constant; this second form evaluates arguments multiple times but that doesn't matter for constants. By using a void expression as the expansion if a non-constant is presented to this second form, we can enlist the compiler to ensure the double evaluation is not attempted on non-constants. Alas, as both macros now rely on compiler intrinsics, they are no longer usable in preprocessor #if conditions; those will just have to be open-coded or the logic rewritten into #define or runtime 'if' conditions (but where the compiler dead-code-elimination will probably still apply). I tested that both gcc 10.1.1 and clang 10.0.0 produce errors for all forms of macro mis-use. As the errors can sometimes be cryptic, I'm demonstrating the gcc output: Use of MIN when MIN_CONST is needed: In file included from /home/eblake/qemu/qemu-img.c:25: /home/eblake/qemu/include/qemu/osdep.h:249:5: error: braced-group within expression allowed only inside a function 249 | ({ \ | ^ /home/eblake/qemu/qemu-img.c:92:12: note: in expansion of macro ‘MIN’ 92 | char array[MIN(1, 2)] = ""; | ^~~ Use of MIN_CONST when MIN is needed: /home/eblake/qemu/qemu-img.c: In function ‘is_allocated_sectors’: /home/eblake/qemu/qemu-img.c:1225:15: error: void value not ignored as it ought to be 1225 | i = MIN_CONST(i, n); | ^ Use of MIN in the preprocessor: In file included from /home/eblake/qemu/accel/tcg/translate-all.c:20: /home/eblake/qemu/accel/tcg/translate-all.c: In function ‘page_check_range’: /home/eblake/qemu/include/qemu/osdep.h:249:6: error: token "{" is not valid in preprocessor expressions 249 | ({ \ | ^ Fix the resulting callsites that used #if or computed a compile-time constant min or max to use the new macros. cpu-defs.h is interesting, as CPU_TLB_DYN_MAX_BITS is sometimes used as a constant and sometimes dynamic. It may be worth improving glib's MIN/MAX definitions to be saner, but that is a task for another day. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200625162602.700741-1-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* hw/block/nvme: use constants in identifyKlaus Jensen2020-06-171-0/+8
| | | | | | | | | Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Message-Id: <20200609190333.59390-6-its@irrelevant.dk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/dirty-bitmap: add bdrv_has_named_bitmaps helperVladimir Sementsov-Ogievskiy2020-05-281-0/+1
| | | | | | | | | | To be used for bitmap migration in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200521220648.3255-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* blockdev: Promote several bitmap functions to non-staticEric Blake2020-05-191-0/+12
| | | | | | | | | | | | | | | The next patch will split blockdev.c, which will require accessing some previously-static functions from more than one .c file. But part of promoting a function to public is picking a naming scheme that does not reek of exposing too many internals (two of the three functions were named starting with 'do_'). To make future code motion easier, perform the function rename and non-static promotion into its own patch. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513011648.166876-5-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
* block: Make it easier to learn which BDS support bitmapsEric Blake2020-05-192-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upcoming patches will enhance bitmap support in qemu-img, but in doing so, it turns out to be nice to suppress output when persistent bitmaps make no sense (such as on a qcow2 v2 image). Add a hook to make this easier to query. This patch adds a new callback .bdrv_supports_persistent_dirty_bitmap, rather than trying to shoehorn the answer in via existing callbacks. In particular, while it might have been possible to overload .bdrv_co_can_store_new_dirty_bitmap to special-case a NULL input to answer whether any persistent bitmaps are supported, that is at odds with whether a particular bitmap can be stored (for example, even on an image that supports persistent bitmaps but has currently filled up the maximum number of bitmaps, attempts to store another one should fail); and the new functionality doesn't require coroutine safety. Similarly, we could have added one more piece of information to .bdrv_get_info, but then again, most callers to that function tend to already discard extraneous information, and making it a catch-all rather than a series of dedicated scalar queries hasn't really simplified life. In the future, when we improve the ability to look up bitmaps through a filter, we will probably also want to teach the block layer to automatically let filters pass this request on through. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513011648.166876-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
* Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell2020-05-191-0/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging Pull request # gpg: Signature made Tue 19 May 2020 09:00:32 BST # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: aio-posix: disable fdmon-io_uring when GSource is used aio-posix: don't duplicate fd handler deletion in fdmon_io_uring_destroy() tests/fuzz: Extract ioport_fuzz_qtest() method tests/fuzz: Extract pciconfig_fuzz_qos() method tests/fuzz: Remove unuseful/unused typedefs tests/fuzz: Add missing space in test description Makefile: List fuzz targets in 'make help' tests/fuzz/Makefile: Do not link code using unavailable devices Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * aio-posix: disable fdmon-io_uring when GSource is usedStefan Hajnoczi2020-05-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The glib event loop does not call fdmon_io_uring_wait() so fd handlers waiting to be submitted build up in the list. There is no benefit is using io_uring when the glib GSource is being used, so disable it instead of implementing a more complex fix. This fixes a memory leak where AioHandlers would build up and increasing amounts of CPU time were spent iterating them in aio_pending(). The symptom is that guests become slow when QEMU is built with io_uring support. Buglink: https://bugs.launchpad.net/qemu/+bug/1877716 Fixes: 73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf ("aio-posix: add io_uring fd monitoring implementation") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Message-id: 20200511183630.279750-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* | block: Drop @child_class from bdrv_child_perm()Max Reitz2020-05-181-3/+1Star
| | | | | | | | | | | | | | | | | | Implementations should decide the necessary permissions based on @role. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-35-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Drop child_fileMax Reitz2020-05-181-1/+0Star
| | | | | | | | | | | | | | Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-33-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Drop bdrv_format_default_perms()Max Reitz2020-05-181-11/+0Star
| | | | | | | | | | | | | | Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-32-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Make bdrv_filter_default_perms() staticMax Reitz2020-05-181-10/+0Star
| | | | | | | | | | | | | | Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-31-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Drop child_backingMax Reitz2020-05-181-1/+0Star
| | | | | | | | | | | | | | Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-25-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Drop child_formatMax Reitz2020-05-181-1/+0Star
| | | | | | | | | | | | | | Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-23-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Add bdrv_default_perms()Max Reitz2020-05-181-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This callback can be used by BDSs that use child_of_bds with the appropriate BdrvChildRole for their children. Also, make bdrv_format_default_perms() use it for child_of_bds children (just a temporary solution until we can drop bdrv_format_default_perms() altogether). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-20-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Add child_of_bdsMax Reitz2020-05-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Any current user of child_file, child_format, and child_backing can and should use this generic BdrvChildClass instead, as it can handle all of these cases. However, to be able to do so, the users must pass the appropriate BdrvChildRole when the child is created/attached. (The following commits will take care of that.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-15-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Pass parent_is_format to .inherit_options()Max Reitz2020-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We plan to unify the generic .inherit_options() functions. The resulting common function will need to decide whether to force-enable format probing, force-disable it, or leave it as-is. To make this decision, it will need to know whether the parent node is a format node or not (because we never want format probing if the parent is a format node already (except for the backing chain)). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Pass BdrvChildRole to .inherit_options()Max Reitz2020-05-181-1/+2
| | | | | | | | | | | | | | | | | | | | For now, all callers (effectively) pass 0 and no callee evaluates thie value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-8-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Pass BdrvChildRole to bdrv_child_perm()Max Reitz2020-05-181-1/+4
| | | | | | | | | | | | | | | | | | | | For now, all callers pass 0 and no callee evaluates this value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Add BdrvChildRole to BdrvChildMax Reitz2020-05-182-0/+4
| | | | | | | | | | | | | | | | | | | | For now, it is always set to 0. Later patches in this series will ensure that all callers pass an appropriate combination of flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Add BdrvChildRole and BdrvChildRoleBitsMax Reitz2020-05-181-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | This mask will supplement BdrvChildClass when it comes to what role (or combination of roles) a child takes for its parent. It consists of BdrvChildRoleBits values (which is an enum). Because empty enums are not allowed, let us just start with it filled. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-5-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Rename BdrvChildRole to BdrvChildClassMax Reitz2020-05-182-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This structure nearly only contains parent callbacks for child state changes. It cannot really reflect a child's role, because different roles may overlap (as we will see when real roles are introduced), and because parents can have custom callbacks even when the child fulfills a standard role. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Add BlockDriver.is_formatMax Reitz2020-05-181-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to unify child_format and child_file at some point. One of the important things that set format drivers apart from other drivers is that they do not expect other format nodes under them (except in the backing chain), i.e. we must not probe formats inside of formats. That means we need something on which to distinguish format drivers from others, and hence this flag. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: Add bdrv_make_empty()Max Reitz2020-05-181-0/+1
|/ | | | | | | | | | | | | | | Right now, all users of bdrv_make_empty() call the BlockDriver method directly. That is not only bad style, it is also wrong, unless the caller has a BdrvChild with a WRITE or WRITE_UNCHANGED permission. (WRITE_UNCHANGED suffices, because callers generally use this function to clear a node with a backing file after a commit operation.) Introduce bdrv_make_empty() that verifies that it does. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429141126.85159-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: introduce compression type featureDenis Plotnikov2020-05-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds some preparation parts for incompatible compression type feature to qcow2 allowing the use different compression methods for image clusters (de)compressing. It is implied that the compression type is set on the image creation and can be changed only later by image conversion, thus compression type defines the only compression algorithm used for the image, and thus, for all image clusters. The goal of the feature is to add support of other compression methods to qcow2. For example, ZSTD which is more effective on compression than ZLIB. The default compression is ZLIB. Images created with ZLIB compression type are backward compatible with older qemu versions. Adding of the compression type breaks a number of tests because now the compression type is reported on image creation and there are some changes in the qcow2 header in size and offsets. The tests are fixed in the following ways: * filter out compression_type for many tests * fix header size, feature table size and backing file offset affected tests: 031, 036, 061, 080 header_size +=8: 1 byte compression type 7 bytes padding feature_table += 48: incompatible feature compression type backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change) * add "compression type" for test output matching when it isn't filtered affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206, 242, 255, 274, 280 Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Drop unused .bdrv_has_zero_init_truncateEric Blake2020-05-082-8/+0Star
| | | | | | | | | | | | | | | | | | Now that there are no clients of bdrv_has_zero_init_truncate, none of the drivers need to worry about providing it. What's more, this eliminates a source of some confusion: a literal reading of the documentation as written in ceaca56f and implemented in commit 1dcaf527 claims that a driver which returns 0 for bdrv_has_zero_init_truncate() must not return 1 for bdrv_has_zero_init(); this condition was violated for parallels, qcow, and sometimes for vdi, although in practice it did not matter since those drivers also lacked .bdrv_co_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-10-eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* nvme: introduce PMR support from NVMe 1.4 specAndrzej Jakowski2020-04-301-0/+172
| | | | | | | | | | | | | | | This patch introduces support for PMR that has been defined as part of NVMe 1.4 spec. User can now specify a pmrdev option that should point to HostMemoryBackend. pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe device. Guest OS can perform mmio read and writes to the PMR region that will stay persistent across system reboot. Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200330164656.9348-1-andrzej.jakowski@linux.intel.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Add flags to bdrv(_co)_truncate()Kevin Wolf2020-04-301-2/+3
| | | | | | | | | | | | | Now that block drivers can support flags for .bdrv_co_truncate, expose the parameter in the node level interfaces bdrv_co_truncate() and bdrv_truncate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Add flags to BlockDriver.bdrv_co_truncate()Kevin Wolf2020-04-301-1/+9
| | | | | | | | | | | | | | | | This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate() driver callbacks, and a supported_truncate_flags field in BlockDriverState that allows drivers to advertise support for request flags in the context of truncate. For now, we always pass 0 and no drivers declare support for any flag. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* aio-wait: delegate polling of main AioContext if BQL not heldPaolo Bonzini2020-04-092-19/+32
| | | | | | | | | | | | | | | | | | | | | | Any thread that is not a iothread returns NULL for qemu_get_current_aio_context(). As a result, it would also return true for in_aio_context_home_thread(qemu_get_aio_context()), causing AIO_WAIT_WHILE to invoke aio_poll() directly. This is incorrect if the BQL is not held, because aio_poll() does not expect to run concurrently from multiple threads, and it can actually happen when savevm writes to the vmstate file from the migration thread. Therefore, restrict in_aio_context_home_thread to return true for the main AioContext only if the BQL is held. The function is moved to aio-wait.h because it is mostly used there and to avoid a circular reference between main-loop.h and block/aio.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200407140746.8041-5-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: trickle down the fallback image creation function use to the block ↵Maxim Levitsky2020-03-262-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers Instead of checking the .bdrv_co_create_opts to see if we need the fallback, just implement the .bdrv_co_create_opts in the drivers that need it. This way we don't break various places that need to know if the underlying protocol/format really supports image creation, and this way we still allow some drivers to not support image creation. Fixes: fd17146cd93d1704cd96d7c2757b325fc7aac6fd Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007 Note that technically this driver reverts the image creation fallback for the vxhs driver since I don't have a means to test it, and IMHO it is better to leave it not supported as it was prior to generic image creation patches. Also drop iscsi_create_opts which was left accidentally. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> [mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and bdrv_create_opts_simple from block.h into block_int.h] Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: pass BlockDriver reference to the .bdrv_co_createMaxim Levitsky2020-03-261-1/+2
| | | | | | | | | | | This will allow the reuse of a single generic .bdrv_co_create implementation for several drivers. No functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/dirty-bitmap: improve _next_dirty_area APIVladimir Sementsov-Ogievskiy2020-03-181-1/+2
| | | | | | | | | | | | | | | | | Firstly, _next_dirty_area is for scenarios when we may contiguously search for next dirty area inside some limited region, so it is more comfortable to specify "end" which should not be recalculated on each iteration. Secondly, let's add a possibility to limit resulting area size, not limiting searching area. This will be used in NBD code in further commit. (Note that now bdrv_dirty_bitmap_next_dirty_area is unused) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-8-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block/dirty-bitmap: add _next_dirty APIVladimir Sementsov-Ogievskiy2020-03-181-0/+2
| | | | | | | | | | | | | | | We have bdrv_dirty_bitmap_next_zero, let's add corresponding bdrv_dirty_bitmap_next_dirty, which is more comfortable to use than bitmap iterators in some cases. For test modify test_hbitmap_next_zero_check_range to check both next_zero and next_dirty and add some new checks. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-7-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block/dirty-bitmap: switch _next_dirty_area and _next_zero to int64_tVladimir Sementsov-Ogievskiy2020-03-181-3/+3
| | | | | | | | | | | | | | | | | | | We are going to introduce bdrv_dirty_bitmap_next_dirty so that same variable may be used to store its return value and to be its parameter, so it would int64_t. Similarly, we are going to refactor hbitmap_next_dirty_area to use hbitmap_next_dirty together with hbitmap_next_zero, therefore we want hbitmap_next_zero parameter type to be int64_t too. So, for convenience update all parameters of *_next_zero and *_next_dirty_area to be int64_t. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20200205112041.6003-6-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell2020-03-122-0/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Block layer patches: - Relax restrictions for blockdev-snapshot (allows libvirt to do live storage migration with blockdev-mirror) - luks: Delete created files when block_crypto_co_create_opts_luks fails - Fix memleaks in qmp_object_add # gpg: Signature made Wed 11 Mar 2020 15:38:59 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: qemu-iotests: adding LUKS cleanup for non-UTF8 secret error crypto.c: cleanup created file when block_crypto_co_create_opts_luks fails block.c: adding bdrv_co_delete_file block: introducing 'bdrv_co_delete_file' interface tests/qemu-iotests: Fix socket_scm_helper build path qapi: Add '@allow-write-only-overlay' feature for 'blockdev-snapshot' iotests: Add iothread cases to 155 block: Fix cross-AioContext blockdev-snapshot iotests: Test mirror with temporarily disabled target backing file iotests: Fix run_job() with use_log=False block: Relax restrictions for blockdev-snapshot block: Make bdrv_get_cumulative_perm() public qom-qmp-cmds: fix two memleaks in qmp_object_add Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * block.c: adding bdrv_co_delete_fileDaniel Henrique Barboza2020-03-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using the new 'bdrv_co_delete_file' interface, a pure co_routine function 'bdrv_co_delete_file' inside block.c can can be used in a way similar of the existing bdrv_create_file to to clean up a created file. We're creating a pure co_routine because the only caller of 'bdrv_co_delete_file' will be already in co_routine context, thus there is no need to add all the machinery to check for qemu_in_coroutine() and create a separated co_routine to do the job. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200130213907.2830642-3-danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * block: introducing 'bdrv_co_delete_file' interfaceDaniel Henrique Barboza2020-03-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding to Block Drivers the capability of being able to clean up its created files can be useful in certain situations. For the LUKS driver, for instance, a failure in one of its authentication steps can leave files in the host that weren't there before. This patch adds the 'bdrv_co_delete_file' interface to block drivers and add it to the 'file' driver in file-posix.c. The implementation is given by 'raw_co_delete_file'. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200130213907.2830642-2-danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * block: Make bdrv_get_cumulative_perm() publicKevin Wolf2020-03-111-0/+3
| | | | | | | | | | | | | | Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200310113831.27293-2-kwolf@redhat.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-03-11' ↵Peter Maydell2020-03-111-57/+8Star
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging Block patches for the 5.0 softfreeze: - qemu-img measure for LUKS - Improve block-copy's performance by reducing inter-request dependencies - Make curl's detection of accept-ranges more robust - Memleak fixes - iotest fix # gpg: Signature made Wed 11 Mar 2020 13:19:01 GMT # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-03-11: block/block-copy: hide structure definitions block/block-copy: reduce intersecting request lock block/block-copy: rename start to offset in interfaces block/block-copy: refactor interfaces to use bytes instead of end block/block-copy: factor out find_conflicting_inflight_req block/block-copy: use block_status block/block-copy: specialcase first copy_range request block/block-copy: fix progress calculation job: refactor progress to separate object block/qcow2-threads: fix qcow2_decompress qemu-img: free memory before re-assign block/qcow2: do free crypto_opts in qcow2_close() iotests: Fix nonportable use of od --endian block/curl: HTTP header field names are case insensitive block/curl: HTTP header fields allow whitespace around values iotests: add 288 luks qemu-img measure test qemu-img: allow qemu-img measure --object without a filename luks: implement .bdrv_measure() luks: extract qcrypto_block_calculate_payload_offset() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * | block/block-copy: hide structure definitionsVladimir Sementsov-Ogievskiy2020-03-111-48/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hide structure definitions and add explicit API instead, to keep an eye on the scope of the shared fields. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-10-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
| * | block/block-copy: rename start to offset in interfacesVladimir Sementsov-Ogievskiy2020-03-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | offset/bytes pair is more usual naming in block layer, let's use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
| * | block/block-copy: refactor interfaces to use bytes instead of endVladimir Sementsov-Ogievskiy2020-03-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a lot of "chunk_end - start" invocations, let's switch to bytes/cur_bytes scheme instead. While being here, improve check on block_copy_do_copy parameters to not overflow when calculating nbytes and use int64_t for bytes in block_copy for consistency. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
| * | block/block-copy: fix progress calculationVladimir Sementsov-Ogievskiy2020-03-111-10/+5Star
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assume we have two regions, A and B, and region B is in-flight now, region A is not yet touched, but it is unallocated and should be skipped. Correspondingly, as progress we have total = A + B current = 0 If we reset unallocated region A and call progress_reset_callback, it will calculate 0 bytes dirty in the bitmap and call job_progress_set_remaining, which will set total = current + 0 = 0 + 0 = 0 So, B bytes are actually removed from total accounting. When job finishes we'll have total = 0 current = B , which doesn't sound good. This is because we didn't considered in-flight bytes, actually when calculating remaining, we should have set (in_flight + dirty_bytes) as remaining, not only dirty_bytes. To fix it, let's refactor progress calculation, moving it to block-copy itself instead of fixing callback. And, of course, track in_flight bytes count. We still have to keep one callback, to maintain backup job bytes_read calculation, but it will go on soon, when we turn the whole backup process into one block_copy call. Cc: qemu-stable@nongnu.org Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <20200311103004.7649-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* | Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell2020-03-111-2/+69
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging Pull request # gpg: Signature made Wed 11 Mar 2020 12:40:36 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: aio-posix: remove idle poll handlers to improve scalability aio-posix: support userspace polling of fd monitoring aio-posix: add io_uring fd monitoring implementation aio-posix: simplify FDMonOps->update() prototype aio-posix: extract ppoll(2) and epoll(7) fd monitoring aio-posix: move RCU_READ_LOCK() into run_poll_handlers() aio-posix: completely stop polling when disabled aio-posix: remove confusing QLIST_SAFE_REMOVE() qemu/queue.h: clear linked list pointers on remove Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * aio-posix: remove idle poll handlers to improve scalabilityStefan Hajnoczi2020-03-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there are many poll handlers it's likely that some of them are idle most of the time. Remove handlers that haven't had activity recently so that the polling loop scales better for guests with a large number of devices. This feature only takes effect for the Linux io_uring fd monitoring implementation because it is capable of combining fd monitoring with userspace polling. The other implementations can't do that and risk starving fds in favor of poll handlers, so don't try this optimization when they are in use. IOPS improves from 10k to 105k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe. [Clarified aio_poll_handlers locking discipline explanation in comment after discussion with Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>
| * aio-posix: support userspace polling of fd monitoringStefan Hajnoczi2020-03-091-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled from userspace. Previously userspace polling was only allowed when all AioHandler's had an ->io_poll() callback. This prevented starvation of fds by userspace pollable handlers. Add the FDMonOps->need_wait() callback that enables userspace polling even when some AioHandlers lack ->io_poll(). For example, it's now possible to do userspace polling when a TCP/IP socket is monitored thanks to Linux io_uring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com Message-Id: <20200305170806.1313245-7-stefanha@redhat.com>
| * aio-posix: add io_uring fd monitoring implementationStefan Hajnoczi2020-03-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent Linux io_uring API has several advantages over ppoll(2) and epoll(2). Details are given in the source code. Add an io_uring implementation and make it the default on Linux. Performance is the same as with epoll(7) but later patches add optimizations that take advantage of io_uring. It is necessary to change how aio_set_fd_handler() deals with deleting AioHandlers since removing monitored file descriptors is asynchronous in io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and aio_set_fd_handler() will let it handle deletion in that case. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>
| * aio-posix: simplify FDMonOps->update() prototypeStefan Hajnoczi2020-03-091-7/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The AioHandler *node, bool is_new arguments are more complicated to think about than simply being given AioHandler *old_node, AioHandler *new_node. Furthermore, the new Linux io_uring file descriptor monitoring mechanism added by the new patch requires access to both the old and the new nodes. Make this change now in preparation. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com Message-Id: <20200305170806.1313245-5-stefanha@redhat.com>
| * aio-posix: extract ppoll(2) and epoll(7) fd monitoringStefan Hajnoczi2020-03-091-2/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ppoll(2) and epoll(7) file descriptor monitoring implementations are mixed with the core util/aio-posix.c code. Before adding another implementation for Linux io_uring, extract out the existing ones so there is a clear interface and the core code is simpler. The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps struct. See the patch for details. Semantic changes: 1. ppoll(2) now reflects events from pollfds[] back into AioHandlers while we're still on the clock for adaptive polling. This was already happening for epoll(7), so if it's really an issue then we'll need to fix both in the future. 2. epoll(7)'s fallback to ppoll(2) while external events are disabled was broken when the number of fds exceeded the epoll(7) upgrade threshold. I guess this code path simply wasn't tested and no one noticed the bug. I didn't go out of my way to fix it but the correct code is simpler than preserving the bug. I also took some liberties in removing the unnecessary AioContext->epoll_available (just check AioContext->epollfd != -1 instead) and AioContext->epoll_enabled (it's implicit if our AioContext->fdmon_ops callbacks are being invoked) fields. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>