summaryrefslogtreecommitdiffstats
path: root/block/qcow2.c
Commit message (Collapse)AuthorAgeFilesLines
...
* qcow2: add zstd cluster compressionDenis Plotnikov2020-05-131-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zstd significantly reduces cluster compression time. It provides better compression performance maintaining the same level of the compression ratio in comparison with zlib, which, at the moment, is the only compression method available. The performance test results: Test compresses and decompresses qemu qcow2 image with just installed rhel-7.6 guest. Image cluster size: 64K. Image on disk size: 2.2G The test was conducted with brd disk to reduce the influence of disk subsystem to the test results. The results is given in seconds. compress cmd: time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd] src.img [zlib|zstd]_compressed.img decompress cmd time ./qemu-img convert -O qcow2 [zlib|zstd]_compressed.img uncompressed.img compression decompression zlib zstd zlib zstd ------------------------------------------------------------ real 65.5 16.3 (-75 %) 1.9 1.6 (-16 %) user 65.0 15.8 5.3 2.5 sys 3.3 0.2 2.0 2.0 Both ZLIB and ZSTD gave the same compression ratio: 1.57 compressed image size in both cases: 1.4G Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-4-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: introduce compression type featureDenis Plotnikov2020-05-131-0/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds some preparation parts for incompatible compression type feature to qcow2 allowing the use different compression methods for image clusters (de)compressing. It is implied that the compression type is set on the image creation and can be changed only later by image conversion, thus compression type defines the only compression algorithm used for the image, and thus, for all image clusters. The goal of the feature is to add support of other compression methods to qcow2. For example, ZSTD which is more effective on compression than ZLIB. The default compression is ZLIB. Images created with ZLIB compression type are backward compatible with older qemu versions. Adding of the compression type breaks a number of tests because now the compression type is reported on image creation and there are some changes in the qcow2 header in size and offsets. The tests are fixed in the following ways: * filter out compression_type for many tests * fix header size, feature table size and backing file offset affected tests: 031, 036, 061, 080 header_size +=8: 1 byte compression type 7 bytes padding feature_table += 48: incompatible feature compression type backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change) * add "compression type" for test output matching when it isn't filtered affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206, 242, 255, 274, 280 Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Drop unused .bdrv_has_zero_init_truncateEric Blake2020-05-081-1/+0Star
| | | | | | | | | | | | | | | | | | Now that there are no clients of bdrv_has_zero_init_truncate, none of the drivers need to worry about providing it. What's more, this eliminates a source of some confusion: a literal reading of the documentation as written in ceaca56f and implemented in commit 1dcaf527 claims that a driver which returns 0 for bdrv_has_zero_init_truncate() must not return 1 for bdrv_has_zero_init(); this condition was violated for parallels, qcow, and sometimes for vdi, although in practice it did not matter since those drivers also lacked .bdrv_co_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-10-eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Fix preallocation on block devicesMax Reitz2020-05-081-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling bdrv_getlength() to get the pre-truncate file size will not really work on block devices, because they have always the same length, and trying to write beyond it will fail with a rather cryptic error message. Instead, we should use qcow2_get_last_cluster() and bdrv_getlength() only as a fallback. Before this patch: $ truncate -s 1G test.img $ sudo losetup -f --show test.img /dev/loop0 $ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16 qemu-img: /dev/loop0: Could not resize image: Failed to resize refcount structures: No space left on device With this patch: $ sudo qemu-img create -f qcow2 -o preallocation=full /dev/loop0 64M Formatting '/dev/loop0', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16 qemu-img: /dev/loop0: Could not resize image: Failed to resize underlying file: Preallocation mode 'full' unsupported for this non-regular file So as you can see, it still fails, but now the problem is missing support on the block device level, so we at least get a better error message. Note that we cannot preallocate block devices on truncate by design, because we do not know what area to preallocate. Their length is always the same, the truncate operation does not change it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200505141801.1096763-1-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Avoid integer wraparound in qcow2_co_truncate()Alberto Garcia2020-05-081-5/+7
| | | | | | | | | | | | | | | | | | | | | | After commit f01643fb8b47e8a70c04bbf45e0f12a9e5bc54de when an image is extended and BDRV_REQ_ZERO_WRITE is set then the new clusters are zeroized. The code however does not detect correctly situations when the old and the new end of the image are within the same cluster. The problem can be reproduced with these steps: qemu-img create -f qcow2 backing.qcow2 1M qemu-img create -f qcow2 -F qcow2 -b backing.qcow2 top.qcow2 qemu-img resize --shrink top.qcow2 520k qemu-img resize top.qcow2 567k In the last step offset - zero_start causes an integer wraparound. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200504155217.10325-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Tweak comment about bitmaps vs. resizeEric Blake2020-05-051-1/+1
| | | | | | | | | | | Our comment did not actually match the code. Rewrite the comment to be less sensitive to any future changes to qcow2-bitmap.c that might implement scenarios that we currently reject. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-4-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Allow resize of images with internal snapshotsEric Blake2020-05-051-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We originally refused to allow resize of images with internal snapshots because the v2 image format did not require the tracking of snapshot size, making it impossible to safely revert to a snapshot with a different size than the current view of the image. But the snapshot size tracking was rectified in v3, and our recent fixes to qemu-img amend (see 0a85af35) guarantee that we always have a valid snapshot size. Thus, we no longer need to artificially limit image resizes, but it does become one more thing that would prevent a downgrade back to v2. And now that we support different-sized snapshots, it's also easy to fix reverting to a snapshot to apply the new size. Upgrade iotest 61 to cover this (we previously had NO coverage of refusal to resize while snapshots exist). Note that the amend process can fail but still have effects: in particular, since we break things into upgrade, resize, downgrade, a failure during resize does not roll back changes made during upgrade, nor does failure in downgrade roll back a resize. But this situation is pre-existing even without this patch; and without journaling, the best we could do is minimize the chance of partial failure by collecting all changes prior to doing any writes - which adds a lot of complexity but could still fail with EIO. On the other hand, we are careful that even if we have partial modification but then fail, the image is left viable (that is, we are careful to sequence things so that after each successful cluster write, there may be transient leaked clusters but no corrupt metadata). And complicating the code to make it more transaction-like is not worth the effort: a user can always request multiple 'qemu-img amend' changing one thing each, if they need finer-grained control over detecting the first failure than what they get by letting qemu decide how to sequence multiple changes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-3-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Add blk_new_with_bs() helperEric Blake2020-05-051-10/+8Star
| | | | | | | | | | | | | | | There are several callers that need to create a new block backend from an existing BDS; make the task slightly easier with a common helper routine. Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200424190903.522087-2-eblake@redhat.com> [mreitz: Set @ret only in error paths, see https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html] Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-2-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Forward ZERO_WRITE flag for full preallocationKevin Wolf2020-04-301-3/+19
| | | | | | | | | | | | | | | | | | | | | The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the image is possibly preallocated and then the zero flag is added to all clusters. This means that a copy-on-write operation may be needed when writing to these clusters, despite having used preallocation, negating one of the major benefits of preallocation. Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver, and if the protocol driver can ensure that the new area reads as zeros, we can skip setting the zero flag in the qcow2 layer. Unfortunately, the same approach doesn't work for metadata preallocation, so we'll still set the zero flag there. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424142701.67053-1-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Support BDRV_REQ_ZERO_WRITE for truncateKevin Wolf2020-04-301-0/+34
| | | | | | | | | | | | | | If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't undo any previous preallocation, but just adds the zero flag to all relevant L2 entries. If an external data file is in use, a write_zeroes request to the data file is made instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200424125448.63318-5-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block-backend: Add flags to blk_truncate()Kevin Wolf2020-04-301-2/+2
| | | | | | | | | | | | Now that node level interface bdrv_truncate() supports passing request flags to the block driver, expose this on the BlockBackend level, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Add flags to bdrv(_co)_truncate()Kevin Wolf2020-04-301-6/+9
| | | | | | | | | | | | | Now that block drivers can support flags for .bdrv_co_truncate, expose the parameter in the node level interfaces bdrv_co_truncate() and bdrv_truncate(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Add flags to BlockDriver.bdrv_co_truncate()Kevin Wolf2020-04-301-1/+1
| | | | | | | | | | | | | | | | This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate() driver callbacks, and a supported_truncate_flags field in BlockDriverState that allows drivers to advertise support for request flags in the context of truncate. For now, we always pass 0 and no drivers declare support for any flag. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Check request size in qcow2_co_pwritev_compressed_part()Alberto Garcia2020-04-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | When issuing a compressed write request the number of bytes must be a multiple of the cluster size or reach the end of the last cluster. With the current code such requests are allowed and we hit an assertion: $ qemu-img create -f qcow2 img.qcow2 1M $ qemu-io -c 'write -c 0 32k' img.qcow2 qemu-io: block/qcow2.c:4257: qcow2_co_pwritev_compressed_task: Assertion `bytes == s->cluster_size || (bytes < s->cluster_size && (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS))' failed. Aborted This patch fixes a regression introduced in 0d483dce38 Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200406143401.26854-1-berto@igalia.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Forbid discard in qcow2 v2 images with backing filesAlberto Garcia2020-04-071-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | A discard request deallocates the selected clusters so they read back as zeroes. This is done by clearing the cluster offset field and setting QCOW_OFLAG_ZERO in the L2 entry. This flag is however only supported when qcow_version >= 3. In older images the cluster is simply deallocated, exposing any possible stale data from the backing file. Since discard is an advisory operation it's safer to simply forbid it in this scenario. Note that we are adding this check to qcow2_co_pdiscard() and not to qcow2_cluster_discard() or discard_in_l2_slice() because the last two are also used by qcow2_snapshot_create() to discard the clusters used by the VM state. In this case there's no risk of exposing stale data to the guest and we really want that the clusters are always discarded. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200331114345.29993-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Avoid feature name extension on small cluster sizeEric Blake2020-03-261-2/+9
| | | | | | | | | | | | | | | | | | | As the feature name table can be quite large (over 9k if all 64 bits of all three feature fields have names; a mere 8 features leaves only 8 bytes for a backing file name in a 512-byte cluster), it is unwise to emit this optional header in images with small cluster sizes. Update iotest 036 to skip running on small cluster sizes; meanwhile, note that iotest 061 never passed on alternative cluster sizes (however, I limited this patch to tests with output affected by adding feature names, rather than auditing for other tests that are not robust to alternative cluster sizes). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200324174233.1622067-4-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: List autoclear bit names in headerEric Blake2020-03-261-1/+11
| | | | | | | | | | | | | | | The feature table is supposed to advertise the name of all feature bits that we support; however, we forgot to update the table for autoclear bits. While at it, move the table to read-only memory in code, and tweak the qcow2 spec to name the second autoclear bit. Update iotests that are affected by the longer header length. Fixes: 88ddffae Fixes: 93c24936 Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324174233.1622067-3-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Comment typo fixesEric Blake2020-03-261-3/+3
| | | | | | | | | | Various trivial typos noticed while working on this file. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200324174233.1622067-2-eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: pass BlockDriver reference to the .bdrv_co_createMaxim Levitsky2020-03-261-1/+3
| | | | | | | | | | | This will allow the reuse of a single generic .bdrv_co_create implementation for several drivers. No functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/qcow2: zero data_file child after freeVladimir Sementsov-Ogievskiy2020-03-241-0/+2
| | | | | | | | | | data_file being NULL doesn't seem to be a correct state, but it's better than dead pointer and simpler to debug. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200316060631.30052-3-vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Avoid memleak on qcow2 image info failureEric Blake2020-03-241-0/+1
| | | | | | | | | | | | | If we fail to get bitmap info, we must not leak the encryption info. Fixes: b8968c875f403 Fixes: Coverity CID 1421894 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200320183620.1112123-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Tested-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/qcow2: do free crypto_opts in qcow2_close()Pan Nengyuan2020-03-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'crypto_opts' forgot to free in qcow2_close(), this patch fix the bellow leak stack: Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f0edd81f970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7f0edc6d149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x55d7eaede63d in qobject_input_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qobject-input-visitor.c:295 #3 0x55d7eaed78b8 in visit_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qapi-visit-core.c:49 #4 0x55d7eaf5140b in visit_type_QCryptoBlockOpenOptions qapi/qapi-visit-crypto.c:290 #5 0x55d7eae43af3 in block_crypto_open_opts_init /mnt/sdb/qemu-new/qemu_test/qemu/block/crypto.c:163 #6 0x55d7eacd2924 in qcow2_update_options_prepare /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1148 #7 0x55d7eacd33f7 in qcow2_update_options /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1232 #8 0x55d7eacd9680 in qcow2_do_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1512 #9 0x55d7eacdc55e in qcow2_open_entry /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1792 #10 0x55d7eacdc8fe in qcow2_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1819 #11 0x55d7eac3742d in bdrv_open_driver /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1317 #12 0x55d7eac3e990 in bdrv_open_common /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1575 #13 0x55d7eac4442c in bdrv_open_inherit /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3126 #14 0x55d7eac45c3f in bdrv_open /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3219 #15 0x55d7ead8e8a4 in blk_new_open /mnt/sdb/qemu-new/qemu_test/qemu/block/block-backend.c:397 #16 0x55d7eacde74c in qcow2_co_create /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3534 #17 0x55d7eacdfa6d in qcow2_co_create_opts /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3668 #18 0x55d7eac1c678 in bdrv_create_co_entry /mnt/sdb/qemu-new/qemu_test/qemu/block.c:485 #19 0x55d7eb0024d2 in coroutine_trampoline /mnt/sdb/qemu-new/qemu_test/qemu/util/coroutine-ucontext.c:115 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200227012950.12256-2-pannengyuan@huawei.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* luks: extract qcrypto_block_calculate_payload_offset()Stefan Hajnoczi2020-03-111-55/+19Star
| | | | | | | | | | | | | | | The qcow2 .bdrv_measure() code calculates the crypto payload offset. This logic really belongs in crypto/block.c where it can be reused by other image formats. The "luks" block driver will need this same logic in order to implement .bdrv_measure(), so extract the qcrypto_block_calculate_payload_offset() function now. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200221112522.1497712-2-stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/qcow2: Move bitmap reopen into bdrv_reopen_commit_postPeter Krempa2020-03-061-1/+6
| | | | | | | | | | | | | | | | | | | | | | The bitmap code requires writing the 'file' child when the qcow2 driver is reopened in read-write mode. If the 'file' child is being reopened due to a permissions change, the modification is commited yet when qcow2_reopen_commit is called. This means that any attempt to write the 'file' child will end with EBADFD as the original fd was already closed. Moving bitmap reopening to the new callback which is called after permission modifications are commited fixes this as the file descriptor will be replaced with the correct one. The above problem manifests itself when reopening 'qcow2' format layer which uses a 'file-posix' file child which was opened with the 'auto-read-only' property set. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <db118dbafe1955afbc0a18d3dd220931074ce349.1582893284.git.pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: always fill entire LUKS header space with zerosDaniel P. Berrangé2020-02-201-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When initializing the LUKS header the size with default encryption parameters will currently be 2068480 bytes. This is rounded up to a multiple of the cluster size, 2081792, with 64k sectors. If the end of the header is not the same as the end of the cluster we fill the extra space with zeros. This was forgetting that not even the space allocated for the header will be fully initialized, as we only write key material for the first key slot. The space left for the other 7 slots is never written to. An optimization to the ref count checking code: commit a5fff8d4b4d928311a5005efa12d0991fe3b66f9 (refs/bisect/bad) Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Date: Wed Feb 27 16:14:30 2019 +0300 qcow2-refcount: avoid eating RAM made the assumption that every cluster which was allocated would have at least some data written to it. This was violated by way the LUKS header is only partially written, with much space simply reserved for future use. Depending on the cluster size this problem was masked by the logic which wrote zeros between the end of the LUKS header and the end of the cluster. $ qemu-img create --object secret,id=cluster_encrypt0,data=123456 \ -f qcow2 -o cluster_size=2k,encrypt.iter-time=1,\ encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 \ cluster_size_check.qcow2 100M Formatting 'cluster_size_check.qcow2', fmt=qcow2 size=104857600 encrypt.format=luks encrypt.key-secret=cluster_encrypt0 encrypt.iter-time=1 cluster_size=2048 lazy_refcounts=off refcount_bits=16 $ qemu-img check --object secret,id=cluster_encrypt0,data=redhat \ 'json:{"driver": "qcow2", "encrypt.format": "luks", \ "encrypt.key-secret": "cluster_encrypt0", \ "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x2000 size 0x1f9000 Leaked cluster 4 refcount=1 reference=0 ...snip... Leaked cluster 130 refcount=1 reference=0 1 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 127 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Image end offset: 268288 The problem only exists when the disk image is entirely empty. Writing data to the disk image payload will solve the problem by causing the end of the file to be extended further. The change fixes it by ensuring that the entire allocated LUKS header region is fully initialized with zeros. The qemu-img check will still fail for any pre-existing disk images created prior to this change, unless at least 1 byte of the payload is written to. Fully writing zeros to the entire LUKS header is a good idea regardless as it ensures that space has been allocated on the host filesystem (or whatever block storage backend is used). Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200207135520.2669430-1-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Fix alignment checks in encrypted imagesAlberto Garcia2020-02-181-2/+0Star
| | | | | | | | | | | | | | | | I/O requests to encrypted media should be aligned to the sector size used by the underlying encryption method, not to BDRV_SECTOR_SIZE. Fortunately this doesn't break anything at the moment because both existing QCRYPTO_BLOCK_*_SECTOR_SIZE have the same value as BDRV_SECTOR_SIZE. The checks in qcow2_co_preadv_encrypted() are also unnecessary because they are repeated immediately afterwards in qcow2_co_encdec(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200213171646.15876-1-berto@igalia.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded valueAlberto Garcia2020-02-061-3/+5
| | | | | | | | | This replaces all remaining instances in the qcow2 code. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: b5f74b606c2d9873b12d29acdb7fd498029c4025.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Don't require aligned offsets in qcow2_co_copy_range_from()Alberto Garcia2020-02-061-4/+0Star
| | | | | | | | | | | | qemu-img's convert_co_copy_range() operates at the sector level and block_copy() operates at the cluster level so this condition is always true, but it is not necessary to restrict this here, so let's leave it to the driver implementation return an error if there is any. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: a4264aaee656910c84161a2965f7a501437379ca.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Tighten cluster_offset alignment assertionsAlberto Garcia2020-02-061-6/+3Star
| | | | | | | | | | | | | | qcow2_alloc_cluster_offset() and qcow2_get_cluster_offset() always return offsets that are cluster-aligned so don't just check that they are sector-aligned. The check in qcow2_co_preadv_task() is also replaced by an assertion for the same reason. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 558ba339965f858bede4c73ce3f50f0c0493597d.1579374329.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Don't round the L1 table allocation up to the sector sizeAlberto Garcia2020-02-061-1/+1
| | | | | | | | | | | The L1 table is read from disk using the byte-based bdrv_pread() and is never accessed beyond its last element, so there's no need to allocate more memory than that. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: b2e27214ec7b03a585931bcf383ee1ac3a641a10.1579374329.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Use a GString in report_unsupported_feature()Alberto Garcia2020-02-061-12/+11Star
| | | | | | | | | | | | | | | | This is a bit more efficient than having to allocate and free memory for each item. The default size (60) is enough for all the existing incompatible features or the "Unknown incompatible feature" message. Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20200115135626.19442-1-berto@igalia.com Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Allow writing compressed data of multiple clustersAndrey Shinkevich2020-01-061-27/+75
| | | | | | | | | | | | | | | QEMU currently supports writing compressed data of the size equal to one cluster. This patch allows writing QCOW2 compressed data that exceed one cluster. Now, we split buffered data into separate clusters and write them compressed using the block/aio_task API. Suggested-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1575288906-551879-3-git-send-email-andrey.shinkevich@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Move error check of local_err near its assignmentTuguoyi2019-12-191-5/+5
| | | | | | | | | | | | | | | The local_err check outside of the if block was necessary when it was introduced in commit d1258dd0c87 because it needed to be executed even if qcow2_load_autoloading_dirty_bitmaps() returned false. After some modifications that all required the error check to remain where it is, commit 9c98f145dfb finally moved the qcow2_load_dirty_bitmaps() call into the if block, so now the error check should be there, too. Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Use offset_into_cluster()Alberto Garcia2019-12-181-5/+3Star
| | | | | | | | | There's a couple of places left in the qcow2 code that still do the calculation manually, so let's replace them. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Declare BDRV_REQ_NO_FALLBACK supportedKevin Wolf2019-12-181-1/+2
| | | | | | | | | | | | | | | | | In the common case, qcow2_co_pwrite_zeroes() already only modifies metadata case, so we're fine with or without BDRV_REQ_NO_FALLBACK set. The only exception is when using an external data file, where the request is passed down to the block driver of the external data file. We are forwarding the BDRV_REQ_NO_FALLBACK flag there, though, so this is fine, too. Declare the flag supported therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>
* block: Pass truncate exact=true where reasonableMax Reitz2019-10-281-1/+5
| | | | | | | | | | | | | | | | | This is a change in behavior, so all instances need a good justification. The comments added here should explain my reasoning. qed already had a comment that suggests it always expected bdrv_truncate()/blk_truncate() to behave as if exact=true were passed (c743849bee7 came eight months before 55b949c8476), so it was simply broken until now. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-8-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> [mreitz: Changed comment in qed.c to explain why a new QED file must be empty, as requested and suggested by Maxim] Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Let format drivers pass @exactMax Reitz2019-10-281-1/+14
| | | | | | | | | | | | | | | | | | | | | When truncating a format node, the @exact parameter is generally handled simply by virtue of the format storing the new size in the image metadata. Such formats do not need to pass on the parameter to their file nodes. There are exceptions, though: - raw and crypto cannot store the image size, and thus must pass on @exact. - When using qcow2 with an external data file, it just makes sense to keep its size in sync with the qcow2 virtual disk (because the external data file is the virtual disk). Therefore, we should pass @exact when truncating it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-7-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Add @exact parameter to bdrv_co_truncate()Max Reitz2019-10-281-10/+12
| | | | | | | | | | | | | | | | | | | | We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Do not truncate file node when formattingMax Reitz2019-10-281-6/+0Star
| | | | | | | | | | | | | | | | | | | There is no reason why the format drivers need to truncate the protocol node when formatting it. When using the old .bdrv_co_create_ops() interface, the file will be created with no size option anyway, which generally gives it a size of 0. (Exceptions are block devices, which cannot be truncated anyway.) When using blockdev-create, the user must have given the file node some size anyway, so there is no reason why we should override that. qed is an exception, it needs the file to start completely empty (as explained by c743849bee7333c7ef256b7e12e34ed6f907064f). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-4-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Add qcow2_check_fix_snapshot_table()Max Reitz2019-10-281-1/+8
| | | | | | | | | | | | | | | | | | | qcow2_check_read_snapshot_table() can perform consistency checks, but it cannot fix everything. Specifically, it cannot allocate new clusters, because that should wait until the refcount structures are known to be consistent (i.e., after qcow2_check_refcounts()). Thus, it cannot call qcow2_write_snapshots(). Do that in qcow2_check_fix_snapshot_table(), which is called after qcow2_check_refcounts(). Currently, there is nothing that would set result->corruptions, so this is a no-op. A follow-up patch will change that. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Separate qcow2_check_read_snapshot_table()Max Reitz2019-10-281-18/+58
| | | | | | | | | | | | | | | Reading the snapshot table can fail. That is a problem when we want to repair the image. Therefore, stop reading the snapshot table in qcow2_do_open() in check mode. Instead, add a new function qcow2_check_read_snapshot_table() that reads the snapshot table at a later point. In the future, we want to handle errors here and fix them. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-9-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Write v3-compliant snapshot list on upgradeMax Reitz2019-10-281-2/+30
| | | | | | | | | | | | | | | | qcow2 v3 requires every snapshot table entry to have two extra data fields: The 64-bit VM state size, and the virtual disk size. Both are optional for v2 images, so they may not be present. qcow2_upgrade() therefore should update the snapshot table to ensure all entries have these extra data fields. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1727347 Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Put qcow2_upgrade() into its own functionMax Reitz2019-10-281-5/+38
| | | | | | | | | | This does not make sense right now, but it will make sense once we need to do more than to just update s->qcow_version. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-7-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Add Error ** to qcow2_read_snapshots()Max Reitz2019-10-281-2/+1Star
| | | | | | | Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20191011152814.14791-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()Kevin Wolf2019-10-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which requires s->lock to be taken to protect its accesses to the refcount table and refcount blocks. However, nothing in this code path actually took the lock. This could cause the same cache entry to be used by two requests at the same time, for different tables at different offsets, resulting in image corruption. As it would be preferable to base the detection on consistent data (even though it's just heuristics), let's take the lock not only around the qcow2_get_refcount() calls, but around the whole function. This patch takes the lock in qcow2_co_block_status() earlier and asserts in qcow2_detect_metadata_preallocation() that we hold the lock. Fixes: 69f47505ee66afaa513305de0c1895a224e52c45 Cc: qemu-stable@nongnu.org Reported-by: Michael Weiser <michael.weiser@gmx.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Michael Weiser <michael.weiser@gmx.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
* qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commitVladimir Sementsov-Ogievskiy2019-10-171-1/+14
| | | | | | | | | | | | | | | | | The only reason I can imagine for this strange code at the very-end of bdrv_reopen_commit is the fact that bs->read_only updated after calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong check for being writable, when actually it only need writable file child not self. So, as it's fixed, let's move things to correct place. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block/qcow2-bitmap: do not remove bitmaps on reopen-roVladimir Sementsov-Ogievskiy2019-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | qcow2_reopen_bitmaps_ro wants to store bitmaps and then mark them all readonly. But the latter don't work, as qcow2_store_persistent_dirty_bitmaps removes bitmaps after storing. It's OK for inactivation but bad idea for reopen-ro. And this leads to the following bug: Assume we have persistent bitmap 'bitmap0'. Create external snapshot bitmap0 is stored and therefore removed Commit snapshot now we have no bitmaps Do some writes from guest (*) they are not marked in bitmap Shutdown Start bitmap0 is loaded as valid, but it is actually broken! It misses writes (*) Incremental backup it will be inconsistent So, let's stop removing bitmaps on reopen-ro. But don't rejoice: reopening bitmaps to rw is broken too, so the whole scenario will not work after this patch and we can't enable corresponding test cases in 260 iotests still. Reopening bitmaps rw will be fixed in the following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-7-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block/qcow2: proper locking on bitmap add/remove pathsVladimir Sementsov-Ogievskiy2019-10-171-2/+3
| | | | | | | | | | | | | | | | | | qmp_block_dirty_bitmap_add and do_block_dirty_bitmap_remove do acquire aio context since 0a6c86d024c52b. But this is not enough: we also must lock qcow2 mutex when access in-image metadata. Especially it concerns freeing qcow2 clusters. To achieve this, move qcow2_can_store_new_dirty_bitmap and qcow2_remove_persistent_dirty_bitmap to coroutine context. Since we work in coroutines in correct aio context, we don't need context acquiring in blockdev.c anymore, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block/qcow2: introduce parallel subrequest handling in read and writeVladimir Sementsov-Ogievskiy2019-10-101-12/+113
| | | | | | | | It improves performance for fragmented qcow2 images. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190916175324.18478-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/qcow2: refactor qcow2_co_pwritev_partVladimir Sementsov-Ogievskiy2019-10-101-64/+90
| | | | | | | | | | Similarly to previous commit, prepare for parallelizing write-loop iterations. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190916175324.18478-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>