summaryrefslogtreecommitdiffstats
path: root/block/io.c
Commit message (Collapse)AuthorAgeFilesLines
...
* block: Remove unused bdrv_requests_pendingFam Zheng2017-12-221-18/+0Star
| | | | | Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Unify order in drain functionsKevin Wolf2017-12-221-4/+8
| | | | | | | | | | | | | | | | | | Drain requests are propagated to child nodes, parent nodes and directly to the AioContext. The order in which this happened was different between all combinations of drain/drain_all and begin/end. The correct order is to keep children only drained when their parents are also drained. This means that at the start of a drained section, the AioContext needs to be drained first, the parents second and only then the children. The correct order for the end of a drained section is the opposite. This patch changes the three other functions to follow the example of bdrv_drained_begin(), which is the only one that got it right. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: Don't wait for requests in bdrv_drain*_end()Kevin Wolf2017-12-221-2/+0Star
| | | | | | | | | | | | | | The device is drained, so there is no point in waiting for requests at the end of the drained section. Remove the bdrv_drain_recurse() calls there. The bdrv_drain_recurse() calls were introduced in commit 481cad48e5e in order to call the .bdrv_co_drain_end() driver callback. This is now done by a separate bdrv_drain_invoke() call. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: bdrv_drain_recurse(): Remove unused begin parameterKevin Wolf2017-12-221-6/+6
| | | | | | | | Now that the bdrv_drain_invoke() calls are pulled up to the callers of bdrv_drain_recurse(), the 'begin' parameter isn't needed any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: Call .drain_begin only once in bdrv_drain_all_begin()Kevin Wolf2017-12-221-2/+1Star
| | | | | | | | | | | | | | | | bdrv_drain_all_begin() used to call the .bdrv_co_drain_begin() driver callback inside its polling loop. This means that how many times it got called for each node depended on long it had to poll the event loop. This is obviously not right and results in nodes that stay drained even after bdrv_drain_all_end(), which calls .bdrv_co_drain_begin() once per node. Fix bdrv_drain_all_begin() to call the callback only once, too. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: Make bdrv_drain_invoke() recursiveKevin Wolf2017-12-221-3/+11
| | | | | | | | | | | | | | | | | | | | | This change separates bdrv_drain_invoke(), which calls the BlockDriver drain callbacks, from bdrv_drain_recurse(). Instead, the function performs its own recursion now. One reason for this is that bdrv_drain_recurse() can be called multiple times by bdrv_drain_all_begin(), but the callbacks may only be called once. The separation is necessary to fix this bug. The other reason is that we intend to go to a model where we call all driver callbacks first, and only then start polling. This is not fully achieved yet with this patch, as bdrv_drain_invoke() contains a BDRV_POLL_WHILE() loop for the block driver callbacks, which can still call callbacks for any unrelated event. It's a step in this direction anyway. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: Expect graph changes in bdrv_parent_drained_begin/endKevin Wolf2017-11-291-4/+4
| | | | | | | | | | | | | | | | | | The .drained_begin/end callbacks can (directly or indirectly via aio_poll()) cause block nodes to be removed or the current BdrvChild to point to a different child node. Use QLIST_FOREACH_SAFE() to make sure we don't access invalid BlockDriverStates or accidentally continue iterating the parents of the new child node instead of the node we actually came from. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Guard against NULL bs->drvMax Reitz2017-11-171-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently do not guard everywhere against a NULL bs->drv where we should be doing so. Most of the places fixed here just do not care about that case at all. Some care implicitly, e.g. through a prior function call to bdrv_getlength() which would always fail for an ejected BDS. Add an assert there to make it more obvious. Other places seem to care, but do so insufficiently: Freeing clusters in a qcow2 image is an error-free operation, but it may leave the image in an unusable state anyway. Giving qcow2_free_clusters() an error code is not really viable, it is much easier to note that bs->drv may be NULL even after a successful driver call. This concerns bdrv_co_flush(), and the way the check is added to bdrv_co_pdiscard() (in every iteration instead of only once). Finally, some places employ at least an assert(bs->drv); somewhere, that may be reasonable (such as in the reopen code), but in bdrv_has_zero_init(), it is definitely not. Returning 0 there in case of an ejected BDS saves us much headache instead. Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com> Buglink: https://bugs.launchpad.net/qemu/+bug/1728660 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110203111.7666-4-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Reduce bdrv_aligned_preadv() roundingEric Blake2017-10-261-6/+2Star
| | | | | | | | | Now that bdrv_is_allocated accepts non-aligned inputs, we can remove the TODO added in commit d6a644bb. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Align block status requestsEric Blake2017-10-261-26/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Any device that has request_alignment greater than 512 should be unable to report status at a finer granularity; it may also be simpler for such devices to be guaranteed that the block layer has rounded things out to the granularity boundary (the way the block layer already rounds all other I/O out). Besides, getting the code correct for super-sector alignment also benefits us for the fact that our public interface now has byte granularity, even though none of our drivers have byte-level callbacks. Add an assertion in blkdebug that proves that the block layer never requests status of unaligned sections, similar to what it does on other requests (while still keeping the generic helper in place for when future patches add a throttle driver). Note that iotest 177 already covers this (it would fail if you use just the blkdebug.c hunk without the io.c changes). Meanwhile, we can drop assertions in callers that no longer have to pass in sector-aligned addresses. There is a mid-function scope added for 'count' and 'longret', for a couple of reasons: first, an upcoming patch will add an 'if' statement that checks whether a driver has an old- or new-style callback, and can conveniently use the same scope for less indentation churn at that time. Second, since we are trying to get rid of sector-based computations, wrapping things in a scope makes it easier to group and see what will be deleted in a final cleanup patch once all drivers have been converted to the new-style callback. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Convert bdrv_get_block_status_above() to bytesEric Blake2017-10-261-47/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status_above() to bdrv_block_status_above() ensures that the compiler enforces that all callers are updated. Likewise, since it a byte interface allows an offset mapping that might not be sector aligned, split the mapping out of the return value and into a pass-by-reference parameter. For now, the io.c layer still assert()s that all uses are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), plus updates for the new split return interface. But some code, particularly bdrv_block_status(), gets a lot simpler because it no longer has to mess with sectors. Likewise, mirror code no longer computes s->granularity >> BDRV_SECTOR_BITS, and can therefore drop an assertion about alignment because the loop no longer depends on alignment (never mind that we don't really have a driver that reports sub-sector alignments, so it's not really possible to test the effect of sub-sector mirroring). Fix a neighboring assertion to use is_power_of_2 while there. For ease of review, bdrv_get_block_status() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Switch bdrv_co_get_block_status_above() to byte-basedEric Blake2017-10-261-44/+24Star
| | | | | | | | | | We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal type (no semantic change), and rename it to match the corresponding public function rename. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Switch bdrv_common_block_status_above() to byte-basedEric Blake2017-10-261-31/+30Star
| | | | | | | | | We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Switch BdrvCoGetBlockStatusData to byte-basedEric Blake2017-10-261-18/+38
| | | | | | | | | | We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal type (no semantic change), and rename it to match the corresponding public function rename. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Switch bdrv_co_get_block_status() to byte-basedEric Blake2017-10-261-43/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Convert another internal function (no semantic change); and as with its public counterpart, rename to bdrv_co_block_status() and split the offset return, to make the compiler enforce that we catch all uses. For now, we assert that callers and the return value still use aligned data, but ultimately, this will be the function where we hand off to a byte-based driver callback, and will eventually need to add logic to ensure we round calls according to the driver's request_alignment then touch up the result handed back to the caller, to start permitting a caller to pass unaligned offsets. Note that we are now prepared to accepts 'bytes' larger than INT_MAX; this is okay as long as we clamp things internally before violating any 32-bit limits, and makes no difference to how a client will use the information (clients looping over the entire file must already be prepared for consecutive calls to return the same status, as drivers are already free to return shorter-than-maximal status due to any other convenient split points, such as when the L2 table crosses cluster boundaries in qcow2). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Convert bdrv_get_block_status() to bytesEric Blake2017-10-261-13/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the name of the function from bdrv_get_block_status() to bdrv_block_status() ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status in the drivers. There was an inherent limitation in returning the offset via the return value: we only have room for BDRV_BLOCK_OFFSET_MASK bits, which means an offset can only be mapped for sector-aligned queries (or, if we declare that non-aligned input is at the same relative position modulo 512 of the answer), so the new interface also changes things to return the offset via output through a parameter by reference rather than mashed into the return value. We'll have some glue code that munges between the two styles until we finish converting all uses. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_block_status(), coupled with the tweak in calling convention. But some code, particularly bdrv_is_allocated(), gets a lot simpler because it no longer has to mess with sectors. For ease of review, bdrv_get_block_status_above() will be tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Switch bdrv_make_zero() to byte-basedEric Blake2017-10-261-16/+16
| | | | | | | | | | | | | We are gradually converting to byte-based interfaces, as they are easier to reason about than sector-based. Change the internal loop iteration of zeroing a device to track by bytes instead of sectors (although we are still guaranteed that we iterate by steps that are sector-aligned). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Make bdrv_round_to_clusters() signature more usefulEric Blake2017-10-261-3/+3
| | | | | | | | | | | | | | | | | | In the process of converting sector-based interfaces to bytes, I'm finding it easier to represent a byte count as a 64-bit integer at the block layer (even if we are internally capped by SIZE_MAX or even INT_MAX for individual transactions, it's still nicer to not have to worry about truncation/overflow issues on as many variables). Update the signature of bdrv_round_to_clusters() to uniformly use int64_t, matching the signature already chosen for bdrv_is_allocated and the fact that off_t is also a signed type, then adjust clients according to the required fallout (even where the result could now exceed 32 bits, no client is directly assigning the result into a 32-bit value without breaking things into a loop first). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Add flag to avoid wasted work in bdrv_is_allocated()Eric Blake2017-10-261-16/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Not all callers care about which BDS owns the mapping for a given range of the file, or where the zeroes lie within that mapping. In particular, bdrv_is_allocated() cares more about finding the largest run of allocated data from the guest perspective, whether or not that data is consecutive from the host perspective, and whether or not the data reads as zero. Therefore, doing subsequent refinements such as checking how much of the format-layer allocation also satisfies BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best case, it just costs extra CPU cycles during a single bdrv_is_allocated(), but in the worst case, it results in a smaller *pnum, and forces callers to iterate through more status probes when visiting the entire file for even more extra CPU cycles. This patch only optimizes the block layer (no behavior change when want_zero is true, but skip unnecessary effort when it is false). Then when subsequent patches tweak the driver callback to be byte-based, we can also pass this hint through to the driver. Tweak BdrvCoGetBlockStatusData to declare arguments in parameter order, rather than mixing things up (minimizing padding is not necessary here). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Allow NULL file for bdrv_get_block_status()Eric Blake2017-10-261-22/+27
| | | | | | | | | | | | | | | | | | | | | | | | Not all callers care about which BDS owns the mapping for a given range of the file. This patch merely simplifies the callers by consolidating the logic in the common call point, while guaranteeing a non-NULL file to all the driver callbacks, for no semantic change. The only caller that does not care about pnum is bdrv_is_allocated, as invoked by vvfat; we can likewise add assertions that the rest of the stack does not have to worry about a NULL pnum. Furthermore, this will also set the stage for a future cleanup: when a caller does not care about which BDS owns an offset, it would be nice to allow the driver to optimize things to not have to return BDRV_BLOCK_OFFSET_VALID in the first place. In the case of fragmented allocation (for example, it's fairly easy to create a qcow2 image where consecutive guest addresses are not at consecutive host addresses), the current contract requires bdrv_get_block_status() to clamp *pnum to the limit where host addresses are no longer consecutive, but allowing a NULL file means that *pnum could be set to the full length of known-allocated data. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: rename bdrv_co_drain to bdrv_co_drain_beginManos Pitsidianakis2017-10-131-2/+2
| | | | | | | | Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: add bdrv_co_drain_end callbackManos Pitsidianakis2017-10-131-14/+34
| | | | | | | | | | | | BlockDriverState has a bdrv_co_drain() callback but no equivalent for the end of the drain. The throttle driver (block/throttle.c) needs a way to mark the end of the drain in order to toggle io_limits_disabled correctly, thus bdrv_co_drain_end is needed. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: Perform copy-on-read in loopEric Blake2017-10-061-38/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve our braindead copy-on-read implementation. Pre-patch, we have multiple issues: - we create a bounce buffer and perform a write for the entire request, even if the active image already has 99% of the clusters occupied, and really only needs to copy-on-read the remaining 1% of the clusters - our bounce buffer was as large as the read request, and can needlessly exhaust our memory by using double the memory of the request size (the original request plus our bounce buffer), rather than a capped maximum overhead beyond the original - if a driver has a max_transfer limit, we are bypassing the normal code in bdrv_aligned_preadv() that fragments to that limit, and instead attempt to read the entire buffer from the driver in one go, which some drivers may assert on - a client can request a large request of nearly 2G such that rounding the request out to cluster boundaries results in a byte count larger than 2G. While this cannot exceed 32 bits, it DOES have some follow-on problems: -- the call to bdrv_driver_pread() can assert for exceeding BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks .bdrv_co_preadv -- if the buffer is all zeroes, the subsequent call to bdrv_co_do_pwrite_zeroes is a no-op due to a negative size, which means we did not actually copy on read Fix all of these issues by breaking up the action into a loop, where each iteration is capped to sane limits. Also, querying the allocation status allows us to optimize: when data is already present in the active layer, we don't need to bounce. Note that the code has a telling comment that copy-on-read should probably be a filter driver rather than a bolt-on hack in io.c; but that remains a task for another day. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Add blkdebug hook for copy-on-readEric Blake2017-10-061-0/+1
| | | | | | | | | | | | Make it possible to inject errors on writes performed during a read operation due to copy-on-read semantics. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Uniform handling of 0-length bdrv_get_block_status()Eric Blake2017-10-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Handle a 0-length block status request up front, with a uniform return value claiming the area is not allocated. Most callers don't pass a length of 0 to bdrv_get_block_status() and friends; but it definitely happens with a 0-length read when copy-on-read is enabled. While we could audit all callers to ensure that they never make a 0-length request, and then assert that fact, it was just as easy to fix things to always report success (as long as the callers are careful to not go into an infinite loop). However, we had inconsistent behavior on whether the status is reported as allocated or defers to the backing layer, depending on what callbacks the driver implements, and possibly wasting quite a few CPU cycles to get to that answer. Consistently reporting unallocated up front doesn't really hurt anything, and makes it easier both for callers (0-length requests now have well-defined behavior) and for drivers (drivers don't have to deal with 0-length requests). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* dirty-bitmap: Switch bdrv_set_dirty() to bytesEric Blake2017-10-061-4/+2Star
| | | | | | | | | | | | | | | Both callers already had bytes available, but were scaling to sectors. Move the scaling to internal code. In the case of bdrv_aligned_pwritev(), we are now passing the exact offset rather than a rounded sector-aligned value, but that's okay as long as dirty bitmap widens start/bytes to granularity boundaries. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Typo fix in copy_on_readv()Eric Blake2017-10-061-1/+1
| | | | | Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: add default implementations for bdrv_co_get_block_status()Manos Pitsidianakis2017-09-041-0/+26
| | | | | | | | | | | | | | | bdrv_co_get_block_status_from_file() and bdrv_co_get_block_status_from_backing() set *file to bs->file and bs->backing respectively, so that bdrv_co_get_block_status() can recurse to them. Future block drivers won't have to duplicate code to implement this. Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: move trace probes into bdrv_co_preadv|pwritevDaniel P. Berrange2017-08-071-4/+4
| | | | | | | | | | | | | There are trace probes in bdrv_co_readv|writev, however, the block drivers are being gradually moved over to using the bdrv_co_preadv|pwritev functions instead. As a result some block drivers miss the current probes. Move the probes into bdrv_co_preadv|pwritev instead, so that they are triggered by more (all?) I/O code paths. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170804105036.11879-1-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell2017-07-181-1/+0Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging # gpg: Signature made Mon 17 Jul 2017 16:40:18 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: block: fix shadowed variable in bdrv_co_pdiscard util/aio-win32: Only select on what we are actually waiting for Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * block: fix shadowed variable in bdrv_co_pdiscardDenis V. Lunev2017-07-171-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We've had a shadowed 'ret' variable, which risks returning the wrong value, introduced in commit b9c64947. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170710150559.30163-1-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* | block: invoke .bdrv_drain callback in coroutine context and from AioContextPaolo Bonzini2017-07-171-9/+33
|/ | | | | | | | | | This will let the callback take a CoMutex in the next patch. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-8-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block/dirty-bitmap: add readonly field to BdrvDirtyBitmapVladimir Sementsov-Ogievskiy2017-07-111-0/+8
| | | | | | | | | | | It will be needed in following commits for persistent bitmaps. If bitmap is loaded from read-only storage (and we can't mark it "in use" in this storage) corresponding BdrvDirtyBitmap should be read-only. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-11-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Make bdrv_is_allocated_above() byte-basedEric Blake2017-07-101-20/+18Star
| | | | | | | | | | | | | | | | | | | | | | | | | We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t *pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status. Therefore, for the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly stream_run(), gets a lot simpler because it no longer has to mess with sectors. Leave comments where we can further simplify by switching to byte-based iterations, once later patches eliminate the need for sector-aligned operations. For ease of review, bdrv_is_allocated() was tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Minimize raw use of bds->total_sectorsEric Blake2017-07-101-8/+6Star
| | | | | | | | | | | | | | | | | | bdrv_is_allocated_above() was relying on intermediate->total_sectors, which is a field that can have stale contents depending on the value of intermediate->has_variable_length. An audit shows that we are safe (we were first calling through bdrv_co_get_block_status() which in turn calls bdrv_nb_sectors() and therefore just refreshed the current length), but it's nicer to favor our accessor functions to avoid having to repeat such an audit, even if it means refresh_total_sectors() is called more frequently. Suggested-by: John Snow <jsnow@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Make bdrv_is_allocated() byte-basedEric Blake2017-07-101-18/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t *pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned on input and that *pnum is sector-aligned on return to the caller, but that can be relaxed when a later patch implements byte-based block status. Therefore, this code adds usages like DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned values, where the call might reasonbly give non-aligned results in the future; on the other hand, no rounding is needed for callers that should just continue to work with byte alignment. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly bdrv_commit(), gets a lot simpler because it no longer has to mess with sectors; also, it is now possible to pass NULL if the caller does not care how much of the image is allocated beyond the initial offset. Leave comments where we can further simplify once a later patch eliminates the need for sector-aligned requests through bdrv_is_allocated(). For ease of review, bdrv_is_allocated_above() will be tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Drop unused bdrv_round_sectors_to_clusters()Eric Blake2017-07-101-21/+0Star
| | | | | | | | | | | Now that the last user [mirror_iteration()] has converted to using bytes, we no longer need a function to round sectors to clusters. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Guarantee that *file is set on bdrv_get_block_status()Eric Blake2017-07-101-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We document that *file is valid if the return is not an error and includes BDRV_BLOCK_OFFSET_VALID, but forgot to obey this contract when a driver (such as blkdebug) lacks a callback. Messed up in commit 67a0fd2 (v2.6), when we added the file parameter. Enhance qemu-iotest 177 to cover this, using a sequence that would print garbage or even SEGV, because it was dererefencing through uninitialized memory. [The resulting test output shows that we have less-than-ideal block status from the blkdebug driver, but that's a separate fix coming up soon.] Setting *file on all paths that return BDRV_BLOCK_OFFSET_VALID is enough to fix the crash, but we can go one step further: always setting *file, even on error, means that a broken caller that blindly dereferences file without checking for error is now more likely to get a reliable SEGV instead of randomly acting on garbage, making it easier to diagnose such buggy callers. Adding an assertion that file is set where expected doesn't hurt either. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Exploit BDRV_BLOCK_EOF for larger zero blocksEric Blake2017-06-301-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When we have a BDS with unallocated clusters, but asking the status of its underlying bs->file or backing layer encounters an end-of-file condition, we know that the rest of the unallocated area will read as zeroes. However, pre-patch, this required two separate calls to bdrv_get_block_status(), as the first call stops at the point where the underlying file ends. Thanks to BDRV_BLOCK_EOF, we can now widen the results of the primary status if the secondary status already includes BDRV_BLOCK_ZERO. In turn, this fixes a TODO mentioned in iotest 154, where we can now see that all sectors in a partial cluster at the end of a file read as zero when coupling the shorter backing file's status along with our knowledge that the remaining sectors came from an unallocated cluster. Also, note that the loop in bdrv_co_get_block_status_above() had an inefficent exit: in cases where the active layer sets BDRV_BLOCK_ZERO but does NOT set BDRV_BLOCK_ALLOCATED (namely, where we know we read zeroes merely because our unallocated clusters lie beyond the backing file's shorter length), we still ended up probing the backing layer even though we already had a good answer. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170505021500.19315-3-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block: Add BDRV_BLOCK_EOF to bdrv_get_block_status()Eric Blake2017-06-301-4/+11
| | | | | | | | | | | | | | | | Just as the block layer already sets BDRV_BLOCK_ALLOCATED as a shortcut for subsequent operations, there are also some optimizations that are made easier if we can quickly tell that *pnum will advance us to the end of a file, via a new BDRV_BLOCK_EOF which gets set by the block layer. This just plumbs up the new bit; subsequent patches will make use of it. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170505021500.19315-2-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block: change variable names in BlockDriverStateManos Pitsidianakis2017-06-261-24/+24
| | | | | | | | | | | Change the 'int count' parameter in *pwrite_zeros, *pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Remove bdrv_aio_readv/writev/flush()Kevin Wolf2017-06-261-171/+0Star
| | | | | | | These functions are unused now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()Stefan Hajnoczi2017-06-261-3/+1Star
| | | | | | | | | | | | | | | | | | Calling aio_poll() directly may have been fine previously, but this is the future, man! The difference between an aio_poll() loop and BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext around aio_poll(). This allows the IOThread to run fd handlers or BHs to complete the request. Failure to release the AioContext causes deadlocks. Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object iothread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: count bdrv_co_rw_vmstate() requestsStefan Hajnoczi2017-06-261-5/+12
| | | | | | | | | | | | | | | | Call bdrv_inc/dec_in_flight() for vmstate reads/writes. This seems unnecessary at first glance because vmstate reads/writes are done synchronously while the guest is stopped. But we need the bdrv_wakeup() in bdrv_dec_in_flight() so the main loop sees request completion. Besides, it's cleaner to count vmstate reads/writes like ordinary read/write requests. The bdrv_wakeup() partially fixes a 'savevm' hang with -object iothread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: protect tracked_requests and flush_queue with reqs_lockPaolo Bonzini2017-06-161-2/+14
| | | | | | | Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-14-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block: access write_gen with atomicsPaolo Bonzini2017-06-161-3/+3
| | | | | | | Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-13-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block: use Stat64 for wr_highest_offsetPaolo Bonzini2017-06-161-3/+1Star
| | | | | | | Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-12-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block: access io_plugged with atomic opsPaolo Bonzini2017-06-161-2/+2
| | | | | | | Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-7-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block: access wakeup with atomic opsPaolo Bonzini2017-06-161-1/+2
| | | | | | | Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-6-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block: access serialising_in_flight with atomic opsPaolo Bonzini2017-06-161-3/+3
| | | | | | | Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-5-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>