summaryrefslogtreecommitdiffstats
path: root/block/io.c
Commit message (Collapse)AuthorAgeFilesLines
* block: Start/end drain on correct AioContextHanna Reitz2022-11-101-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | bdrv_parent_drained_{begin,end}_single() are supposed to operate on the parent, not on the child, so they should not attempt to get the context to poll from the child but the parent instead. BDRV_POLL_WHILE(c->bs) does get the context from the child, so we should replace it with AIO_WAIT_WHILE() on the parent's context instead. This problem becomes apparent when bdrv_replace_child_noperm() invokes bdrv_parent_drained_end_single() after removing a child from a subgraph that is in an I/O thread. By the time bdrv_parent_drained_end_single() is called, child->bs is NULL, and so BDRV_POLL_WHILE(c->bs, ...) will poll the main loop instead of the I/O thread; but anything that bdrv_parent_drained_end_single_no_poll() may have scheduled is going to want to run in the I/O thread, but because we poll the main loop, the I/O thread is never unpaused, and nothing is run, resulting in a deadlock. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1215 Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20221107151321.211175-4-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into stagingStefan Hajnoczi2022-10-301-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Block layer patches - Cleanup bs->backing and bs->file handling - Refactor bdrv_try_set_aio_context using transactions - Changes for improved coroutine_fn consistency - vhost-user-blk: fix the resize crash - io_uring: Use of io_uring_register_ring_fd() led to breakage, revert - vvfat: Fix some problems with r/w mode - Code cleanup - MAINTAINERS: Fold "Block QAPI, monitor, ..." into "Block layer core" # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmNazhIRHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9ZyTw/8Dfck/SuxfyeLlnQItkjaV4cnqWOU8vHs # 9x0KhlptCs+HXdF/3iicpA0lHojn7mNnbdFGjPRY4E0LriQv91TQ5ycdEmrseFPf # sgeQlgdKCVU/pHjZ2wYarm2pE43Cx85a5xuufmw+7w49dNNZn14l4t+DgviuClVM # nuVaogfZFbYyetre+Qd2TgLl+gJ+0d4o7Zs5lSWLrT8t0L9AGkcWPA7Nrbl6loIE # dOautV4G7jLjuMiCeJZOGcnuRVe3gCQ5rCGBFzzH4DUtz4BmiYx4hd3LMEsP0PMM # CrsfDZS04Ztybl9M7TmJuwkAm1gx1JDMOuJuh18lbJocIOBvhkKKxY2wI5LIdZVI # ZntmU36RowkX+GGu/PYpYyMjBDClJppZCl7vnjyLYsVt6r0Vu6SmlHpJhcRYabhe # 96Kv1LXH9A6+ogKPU3Layw6JGjg01GNr1ALuT7PO3pGto/JshmOuBEJJDucoF84M # 5AfxFCohMROVldwblA6M0eKnlQBgtr5BvtgbV54BBo88VlFJgDJFQn7R09cTFUEo # UwaJoS+nIaiZ0bQQVZhZloVppUaTdVJojzfVRCZZctga96/tu1HSFnGLnbEFpUN3 # KOf+XnVNS6Ro+nPSDf9bMjbIom2JicGFfV+6yMgIoxY/d5UA2dTZfefil4TAlSod # 6PsTgg+jrm8= # =/Fw0 # -----END PGP SIGNATURE----- # gpg: Signature made Thu 27 Oct 2022 14:29:38 EDT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (58 commits) block/block-backend: blk_set_enable_write_cache is IO_CODE monitor: switch to *_co_* functions vmdk: switch to *_co_* functions vhdx: switch to *_co_* functions vdi: switch to *_co_* functions qed: switch to *_co_* functions qcow2: switch to *_co_* functions qcow: switch to *_co_* functions parallels: switch to *_co_* functions mirror: switch to *_co_* functions block: switch to *_co_* functions commit: switch to *_co_* functions vmdk: manually add more coroutine_fn annotations qcow2: manually add more coroutine_fn annotations qcow: manually add more coroutine_fn annotations blkdebug: add missing coroutine_fn annotation for indirect-called functions qcow2: add coroutine_fn annotation for indirect-called functions block: add missing coroutine_fn annotation to BlockDriverState callbacks coroutine-io: add missing coroutine_fn annotation to prototypes coroutine-lock: add missing coroutine_fn annotation to prototypes ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * block: switch to *_co_* functionsAlberto Faria2022-10-271-2/+2
| | | | | | | | | | | | | | | | Signed-off-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221013123711.620631-16-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * block: remove incorrect coroutine_fn annotationAlberto Faria2022-10-271-2/+2
| | | | | | | | | | | | | | | | Signed-off-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221013123711.620631-3-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: return errors from bdrv_register_buf()Stefan Hajnoczi2022-10-261-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | Registering an I/O buffer is only a performance optimization hint but it is still necessary to return errors when it fails. Later patches will need to detect errors when registering buffers but an immediate advantage is that error_report() calls are no longer needed in block driver .bdrv_register_buf() functions. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-8-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* | block: add BDRV_REQ_REGISTERED_BUF request flagStefan Hajnoczi2022-10-261-23/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Block drivers may optimize I/O requests accessing buffers previously registered with bdrv_register_buf(). Checking whether all elements of a request's QEMUIOVector are within previously registered buffers is expensive, so we need a hint from the user to avoid costly checks. Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all QEMUIOVector elements in an I/O request are known to be within previously registered buffers. Always pass the flag through to driver read/write functions. There is little harm in passing the flag to a driver that does not use it. Passing the flag to drivers avoids changes across many block drivers. Filter drivers would need to explicitly support the flag and pass through to their children when the children support it. That's a lot of code changes and it's hard to remember to do that everywhere, leading to silent reduced performance when the flag is accidentally dropped. The only problematic scenario with the approach in this patch is when a driver passes the flag through to internal I/O requests that don't use the same I/O buffer. In that case the hint may be set when it should actually be clear. This is a rare case though so the risk is low. Some drivers have assert(!flags), which no longer works when BDRV_REQ_REGISTERED_BUF is passed in. These assertions aren't very useful anyway since the functions are called almost exclusively by bdrv_driver_preadv/pwritev() so if we get flags handling right there then the assertion is not needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* | block: pass size to bdrv_unregister_buf()Stefan Hajnoczi2022-10-261-3/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | The only implementor of bdrv_register_buf() is block/nvme.c, where the size is not needed when unregistering a buffer. This is because util/vfio-helpers.c can look up mappings by address. Future block drivers that implement bdrv_register_buf() may not be able to do their job given only the buffer address. Add a size argument to bdrv_unregister_buf(). Also document the assumptions about bdrv_register_buf()/bdrv_unregister_buf() calls. The same <host, size> values that were given to bdrv_register_buf() must be given to bdrv_unregister_buf(). gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local variable might be uninitialized, so it's necessary to silence the compiler. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: add missing coroutine_fn annotationsPaolo Bonzini2022-10-071-11/+11
| | | | | | | | | | | | Callers of coroutine_fn must be coroutine_fn themselves, or the call must be within "if (qemu_in_coroutine())". Apply coroutine_fn to functions where this holds. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220922084924.201610-3-pbonzini@redhat.com> [kwolf: Fixed up coding style] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: move bdrv_qiov_is_aligned to file-posixKeith Busch2022-09-301-21/+0Star
| | | | | | | | | | There is only user of bdrv_qiov_is_aligned(), so move the alignment function to there and make it static. Signed-off-by: Keith Busch <kbusch@kernel.org> Message-Id: <20220929200523.3218710-2-kbusch@meta.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: make serializing requests functions 'void'Denis V. Lunev2022-09-301-16/+7Star
| | | | | | | | | | | | | | | | | | | | | Return codes of the following functions are never used in the code: * bdrv_wait_serialising_requests_locked * bdrv_wait_serialising_requests * bdrv_make_request_serialising Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <fam@euphon.net> CC: Ronnie Sahlberg <ronniesahlberg@gmail.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Peter Lieven <pl@kamp.de> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220817083736.40981-3-den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Add bdrv_co_pwrite_sync()Alberto Faria2022-07-121-4/+5
| | | | | | | | | | | | Also convert bdrv_pwrite_sync() to being implemented using generated_co_wrapper. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-9-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
* block: Implement bdrv_{pread,pwrite,pwrite_zeroes}() using generated_co_wrapperAlberto Faria2022-07-121-41/+0Star
| | | | | | | | | | | | | | | | bdrv_{pread,pwrite}() now return -EIO instead of -EINVAL when 'bytes' is negative, making them consistent with bdrv_{preadv,pwritev}() and bdrv_co_{pread,pwrite,preadv,pwritev}(). bdrv_pwrite_zeroes() now also calls trace_bdrv_co_pwrite_zeroes() and clears the BDRV_REQ_MAY_UNMAP flag when appropriate, which it didn't previously. Signed-off-by: Alberto Faria <afaria@redhat.com> Message-Id: <20220609152744.3891847-8-afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
* block: Make bdrv_{pread,pwrite}() return 0 on successAlberto Faria2022-07-121-8/+2Star
| | | | | | | | | | | | | | | | | | They currently return the value of their 'bytes' parameter on success. Make them return 0 instead, for consistency with other I/O functions and in preparation to implement them using generated_co_wrapper. This also makes it clear that short reads/writes are not possible. The few callers that rely on the previous behavior are adjusted accordingly by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-4-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
* block: Change bdrv_{pread,pwrite,pwrite_sync}() param orderAlberto Faria2022-07-121-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Swap 'buf' and 'bytes' around for consistency with bdrv_co_{pread,pwrite}(), and in preparation to implement these functions using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pread(child, offset, buf, bytes, flags) + bdrv_pread(child, offset, bytes, buf, flags) @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pwrite(child, offset, buf, bytes, flags) + bdrv_pwrite(child, offset, bytes, buf, flags) @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pwrite_sync(child, offset, buf, bytes, flags) + bdrv_pwrite_sync(child, offset, bytes, buf, flags) Resulting overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220609152744.3891847-3-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
* block: Add a 'flags' param to bdrv_{pread,pwrite,pwrite_sync}()Alberto Faria2022-07-121-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For consistency with other I/O functions, and in preparation to implement them using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression child, offset, buf, bytes; @@ - bdrv_pread(child, offset, buf, bytes) + bdrv_pread(child, offset, buf, bytes, 0) @@ expression child, offset, buf, bytes; @@ - bdrv_pwrite(child, offset, buf, bytes) + bdrv_pwrite(child, offset, buf, bytes, 0) @@ expression child, offset, buf, bytes; @@ - bdrv_pwrite_sync(child, offset, buf, bytes) + bdrv_pwrite_sync(child, offset, buf, bytes, 0) Resulting overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220609152744.3891847-2-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
* block: drop unused bdrv_co_drain() APIStefan Hajnoczi2022-06-241-15/+0Star
| | | | | | | | | | | | | | | | | bdrv_co_drain() has not been used since commit 9a0cec664eef ("mirror: use bdrv_drained_begin/bdrv_drained_end") in 2016. Remove it so there are fewer drain scenarios to worry about. Use bdrv_drained_begin()/bdrv_drained_end() instead. They are "mixed" functions that can be called from coroutine context. Unlike bdrv_co_drain(), these functions provide control of the length of the drained section, which is usually the right thing. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220521122714.3837731-1-stefanha@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* coroutine-lock: qemu_co_queue_restart_all is a coroutine-only qemu_co_enter_allPaolo Bonzini2022-05-121-1/+1
| | | | | | | | | | | | | | | | | | | | qemu_co_queue_restart_all is basically the same as qemu_co_enter_all but without a QemuLockable argument. That's perfectly fine, but only as long as the function is marked coroutine_fn. If used outside coroutine context, qemu_co_queue_wait will attempt to take the lock and that is just broken: if you are calling qemu_co_queue_restart_all outside coroutine context, the lock is going to be a QemuMutex which cannot be taken twice by the same thread. The patch adds the marker to qemu_co_queue_restart_all and to its sole non-coroutine_fn caller; it then reimplements the function in terms of qemu_co_enter_all_impl, to remove duplicated code and to clarify that the latter also works in coroutine context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20220427130830.150180-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Replace qemu_real_host_page variables with inlined functionsMarc-André Lureau2022-04-061-1/+1
| | | | | | | | | | | | Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge remote-tracking branch ↵Peter Maydell2022-03-081-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'remotes/pmaydell/tags/pull-target-arm-20220307' into staging target-arm queue: * cleanups of qemu_oom_check() and qemu_memalign() * target/arm/translate-neon: UNDEF if VLD1/VST1 stride bits are non-zero * target/arm/translate-neon: Simplify align field check for VLD3 * GICv3 ITS: add more trace events * GICv3 ITS: implement 8-byte accesses properly * GICv3: fix minor issues with some trace/log messages * ui/cocoa: Use the standard about panel * target/arm: Provide cpu property for controling FEAT_LPA2 * hw/arm/virt: Disable LPA2 for -machine virt-6.2 # gpg: Signature made Mon 07 Mar 2022 16:46:06 GMT # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "peter.maydell@linaro.org" # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate] # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20220307: hw/arm/virt: Disable LPA2 for -machine virt-6.2 target/arm: Provide cpu property for controling FEAT_LPA2 ui/cocoa: Use the standard about panel hw/intc/arm_gicv3_cpuif: Fix register names in ICV_HPPIR read trace event hw/intc/arm_gicv3: Fix missing spaces in error log messages hw/intc/arm_gicv3: Specify valid and impl in MemoryRegionOps hw/intc/arm_gicv3_its: Add trace events for table reads and writes hw/intc/arm_gicv3_its: Add trace events for commands target/arm/translate-neon: Simplify align field check for VLD3 target/arm/translate-neon: UNDEF if VLD1/VST1 stride bits are non-zero osdep: Move memalign-related functions to their own header util: Put qemu_vfree() in memalign.c util: Use meson checks for valloc() and memalign() presence util: Share qemu_try_memalign() implementation between POSIX and Windows meson.build: Don't misdetect posix_memalign() on Windows util: Return valid allocation for qemu_try_memalign() with zero size util: Unify implementations of qemu_memalign() util: Make qemu_oom_check() a static function Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * osdep: Move memalign-related functions to their own headerPeter Maydell2022-03-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | Move the various memalign-related functions out of osdep.h and into their own header, which we include only where they are used. While we're doing this, add some brief documentation comments. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
* | block/io: introduce block driver snapshot-access APIVladimir Sementsov-Ogievskiy2022-03-071-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new block driver handlers and corresponding generic wrappers. It will be used to allow copy-before-write filter to provide reach fleecing interface in further commit. In future this approach may be used to allow reading qcow2 internal snapshots, for example to export them through NBD. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-11-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
* | block: fix preallocate filter: don't do unaligned preallocate requestsVladimir Sementsov-Ogievskiy2022-03-071-0/+4
|/ | | | | | | | | | | | | | | | | | | | | | There is a bug in handling BDRV_REQ_NO_WAIT flag: we still may wait in wait_serialising_requests() if request is unaligned. And this is possible for the only user of this flag (preallocate filter) if underlying file is unaligned to its request_alignment on start. So, we have to fix preallocate filter to do only aligned preallocate requests. Next, we should fix generic block/io.c somehow. Keeping in mind that preallocate is the only user of BDRV_REQ_NO_WAIT and that we have to fix its behavior now, it seems more safe to just assert that we never use BDRV_REQ_NO_WAIT with unaligned requests and add corresponding comment. Let's do so. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20220215121609.38570-1-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
* block: Make bdrv_refresh_limits() non-recursiveHanna Reitz2022-03-041-4/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdrv_refresh_limits() recurses down to the node's children. That does not seem necessary: When we refresh limits on some node, and then recurse down and were to change one of its children's BlockLimits, then that would mean we noticed the changed limits by pure chance. The fact that we refresh the parent's limits has nothing to do with it, so the reason for the change probably happened before this point in time, and we should have refreshed the limits then. Consequently, we should actually propagate block limits changes upwards, not downwards.  That is a separate and pre-existing issue, though, and so will not be addressed in this patch. The problem with recursing is that bdrv_refresh_limits() is not atomic. It begins with zeroing BDS.bl, and only then sets proper, valid limits. If we do not drain all nodes whose limits are refreshed, then concurrent I/O requests can encounter invalid request_alignment values and crash qemu. Therefore, a recursing bdrv_refresh_limits() requires the whole subtree to be drained, which is currently not ensured by most callers. A non-recursive bdrv_refresh_limits() only requires the node in question to not receive I/O requests, and this is done by most callers in some way or another: - bdrv_open_driver() deals with a new node with no parents yet - bdrv_set_file_or_backing_noperm() acts on a drained node - bdrv_reopen_commit() acts only on drained nodes - bdrv_append() should in theory require the node to be drained; in practice most callers just lock the AioContext, which should at least be enough to prevent concurrent I/O requests from accessing invalid limits So we can resolve the bug by making bdrv_refresh_limits() non-recursive. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1879437 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20220216105355.30729-2-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/coroutines: I/O and "I/O or GS" APIEmanuele Giuseppe Esposito2022-03-041-0/+3
| | | | | | | | | | | | block coroutines functions run in different aiocontext, and are not protected by the BQL. Therefore are I/O. On the other side, generated_co_wrapper functions use BDRV_POLL_WHILE, meaning the caller can either be the main loop or a specific iothread. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-25-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* IO_CODE and IO_OR_GS_CODE for block_int I/O APIEmanuele Giuseppe Esposito2022-03-041-0/+13
| | | | | | | | | Mark all I/O functions with IO_CODE, and all "I/O OR GS" with IO_OR_GS_CODE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-14-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* assertions for block_int global state APIEmanuele Giuseppe Esposito2022-03-041-0/+1
| | | | | | Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-13-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* IO_CODE and IO_OR_GS_CODE for block I/O APIEmanuele Giuseppe Esposito2022-03-041-2/+41
| | | | | | | | | Mark all I/O functions with IO_CODE, and all "I/O OR GS" with IO_OR_GS_CODE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-6-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* assertions for block global state APIEmanuele Giuseppe Esposito2022-03-041-0/+11
| | | | | | | | | | | All the global state (GS) API functions will check that qemu_in_main_thread() returns true. If not, it means that the safety of BQL cannot be guaranteed, and they need to be moved to I/O. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-5-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/io: Update BSC only if want_zero is trueHanna Reitz2022-01-281-1/+5
| | | | | | | | | | | | | | | | | | | We update the block-status cache whenever we get new information from a bdrv_co_block_status() call to the block driver. However, if we have passed want_zero=false to that call, it may flag areas containing zeroes as data, and so we would update the block-status cache with wrong information. Therefore, we should not update the cache with want_zero=false. Reported-by: Nir Soffer <nsoffer@redhat.com> Fixes: 0bc329fbb00 ("block: block-status cache for data regions") Reviewed-by: Nir Soffer <nsoffer@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220118170000.49423-2-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* block: introduce max_hw_iov for use in scsi-genericPaolo Bonzini2021-10-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel sources, IOV_MAX in POSIX). Because of this, on some host adapters requests with many iovecs are rejected with -EINVAL by the io_submit() or readv()/writev() system calls. In fact, the same limit applies to SG_IO as well. To fix both the EINVAL and the possible performance issues from using fewer iovecs than allowed by Linux (some HBAs have max_segments as low as 128), introduce a separate entry in BlockLimits to hold the max_segments value from sysfs. This new limit is used only for SG_IO and clamped to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to bs->bl.max_transfer. Reported-by: Halil Pasic <pasic@linux.ibm.com> Cc: Hanna Reitz <hreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-block@nongnu.org Cc: qemu-stable@nongnu.org Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do not round to power of 2", 2021-06-25) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210923130436.1187591-1-pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/io: allow 64bit discard requestsVladimir Sementsov-Ogievskiy2021-09-291-1/+1
| | | | | | | | | | | | | | Now that all drivers are updated by the previous commit, we can drop the last limiter on pdiscard path: INT_MAX in bdrv_co_pdiscard(). Now everything is prepared for implementing incredibly cool and fast big-discard requests in NBD and qcow2. And any other driver which wants it of course. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-12-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* block: make BlockLimits::max_pdiscard 64bitVladimir Sementsov-Ogievskiy2021-09-291-1/+2
| | | | | | | | | | | | | | | | | | | | We are going to support 64 bit discard requests. Now update the limit variable. It's absolutely safe. The variable is set in some drivers, and used in bdrv_co_pdiscard(). Update also max_pdiscard variable in bdrv_co_pdiscard(), so that bdrv_co_pdiscard() is now prepared for 64bit requests. The remaining logic including num, offset and bytes variables is already supporting 64bit requests. So the only thing that prevents 64 bit requests is limiting max_pdiscard variable to INT_MAX in bdrv_co_pdiscard(). We'll drop this limitation after updating all block drivers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-10-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* block/io: allow 64bit write-zeroes requestsVladimir Sementsov-Ogievskiy2021-09-291-2/+7
| | | | | | | | | | | | | | | | Now that all drivers are updated by previous commit, we can drop two last limiters on write-zeroes path: INT_MAX in bdrv_co_do_pwrite_zeroes() and bdrv_check_request32() in bdrv_co_pwritev_part(). Now everything is prepared for implementing incredibly cool and fast big-write-zeroes in NBD and qcow2. And any other driver which wants it of course. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-9-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* block: make BlockLimits::max_pwrite_zeroes 64bitVladimir Sementsov-Ogievskiy2021-09-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | We are going to support 64 bit write-zeroes requests. Now update the limit variable. It's absolutely safe. The variable is set in some drivers, and used in bdrv_co_do_pwrite_zeroes(). Update also max_write_zeroes variable in bdrv_co_do_pwrite_zeroes(), so that bdrv_co_do_pwrite_zeroes() is now prepared to 64bit requests. The remaining logic including num, offset and bytes variables is already supporting 64bit requests. So the only thing that prevents 64 bit requests is limiting max_write_zeroes variable to INT_MAX in bdrv_co_do_pwrite_zeroes(). We'll drop this limitation after updating all block drivers. Ah, we also have bdrv_check_request32() in bdrv_co_pwritev_part(). It will be modified to do bdrv_check_request() for write-zeroes path. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-7-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* block: use int64_t instead of uint64_t in driver write handlersVladimir Sementsov-Ogievskiy2021-09-291-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* qcow2: check request on vmstate save/load pathVladimir Sementsov-Ogievskiy2021-09-291-3/+3
| | | | | | | | | | | We modify the request by adding an offset to vmstate. Let's check the modified request. It will help us to safely move .bdrv_co_preadv_part and .bdrv_co_pwritev_part to int64_t type of offset and bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* block/io: bring request check to bdrv_co_(read,write)v_vmstateVladimir Sementsov-Ogievskiy2021-09-291-2/+16
| | | | | | | | | | | | | | Only qcow2 driver supports vmstate. In qcow2 these requests go through .bdrv_co_p{read,write}v_part handlers. So, let's do our basic check for the request on vmstate generic handlers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210903102807.27127-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* block: block-status cache for data regionsHanna Reitz2021-09-151-3/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we have attempted before (https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html, "file-posix: Cache lseek result for data regions"; https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html, "file-posix: Cache next hole"), this patch seeks to reduce the number of SEEK_DATA/HOLE operations the file-posix driver has to perform. The main difference is that this time it is implemented as part of the general block layer code. The problem we face is that on some filesystems or in some circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the implementation is outside of qemu, there is little we can do about its performance. We have already introduced the want_zero parameter to bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls unless we really want zero information; but sometimes we do want that information, because for files that consist largely of zero areas, special-casing those areas can give large performance boosts. So the real problem is with files that consist largely of data, so that inquiring the block status does not gain us much performance, but where such an inquiry itself takes a lot of time. To address this, we want to cache data regions. Most of the time, when bad performance is reported, it is in places where the image is iterated over from start to end (qemu-img convert or the mirror job), so a simple yet effective solution is to cache only the current data region. (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) We only use the cache for nodes without children (i.e. protocol nodes), because that is where the problem is: Drivers that rely on block-status implementations outside of qemu (e.g. SEEK_DATA/HOLE). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210812084148.14458-3-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [hreitz: Added `local_file == bs` assertion, as suggested by Vladimir] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
* block: Fix in_flight leak in request padding error pathKevin Wolf2021-08-031-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | When bdrv_pad_request() fails in bdrv_co_preadv_part(), bs->in_flight has been increased, but is never decreased again. This leads to a hang when trying to drain the block node. This bug was observed with Windows guests which issue a request that fully uses IOV_MAX during installation, so that when padding is necessary (O_DIRECT with a 4k sector size block device on the host), adding another entry causes failure. Call bdrv_dec_in_flight() to fix this. There is a larger problem to solve here because this request shouldn't even fail, but Windows doesn't seem to care and with this minimal fix the installation succeeds. So given that we're already in freeze, let's take this minimal fix for 6.1. Fixes: 98ca45494fcd6bf0336ecd559e440b6de6ea4cd3 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1972079 Reported-by: Qing Wang <qinwang@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210727154923.91067-1-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/io: Merge discard request alignmentsAkihiko Odaki2021-07-061-0/+2
| | | | | | Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-id: 20210705130458.97642-3-akihiko.odaki@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: Move read-only check during truncation earlierEric Blake2021-06-291-5/+5
| | | | | | | | | | | | No need to start a tracked request that will always fail. The choice to check read-only after bdrv_inc_in_flight() predates 1bc5f09f2e (block: Use tracked request for truncate), but waiting for serializing requests can make the effect more noticeable. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210609163034.997943-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: add max_hw_transfer to BlockLimitsPaolo Bonzini2021-06-251-0/+2
| | | | | | | | | | | | | | | | | | For block host devices, I/O can happen through either the kernel file descriptor I/O system calls (preadv/pwritev, io_submit, io_uring) or the SCSI passthrough ioctl SG_IO. In the latter case, the size of each transfer can be limited by the HBA, while for file descriptor I/O the kernel is able to split and merge I/O in smaller pieces as needed. Applying the HBA limits to file descriptor I/O results in more system calls and suboptimal performance, so this patch splits the max_transfer limit in two: max_transfer remains valid and is used in general, while max_hw_transfer is limited to the maximum hardware size. max_hw_transfer can then be included by the scsi-generic driver in the block limits page, to ensure that the stricter hardware limit is used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* block: consistently use bdrv_is_read_only()Vladimir Sementsov-Ogievskiy2021-06-021-2/+2
| | | | | | | | | | | | | | It's better to use accessor function instead of bs->read_only directly. In some places use bdrv_is_writable() instead of checking both BDRV_O_RDWR set and BDRV_O_INACTIVE not set. In bdrv_open_common() it's a bit strange to add one more variable, but we are going to drop bs->read_only in the next patch, so new ro local variable substitutes it here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210527154056.70294-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: drop write notifiersVladimir Sementsov-Ogievskiy2021-05-141-6/+0Star
| | | | | | | | | | They are unused now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-3-vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/write-threshold: don't use write notifiersVladimir Sementsov-Ogievskiy2021-05-141-2/+3
| | | | | | | | | | | | | | | | | | | | write-notifiers are used only for write-threshold. New code for such purpose should create filters. Let's better special-case write-threshold and drop write notifiers at all. (Actually, write-threshold is special-cased anyway, as the only user of write-notifiers) So, create a new direct interface for bdrv_co_write_req_prepare() and drop all write-notifier related logic from write-threshold.c. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210506090621.11848-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [mreitz: Adjusted comment as per Eric's suggestion] Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: make bdrv_refresh_limits() to be a transaction actionVladimir Sementsov-Ogievskiy2021-04-301-2/+29
| | | | | | | | | To be used in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210428151804.439460-28-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: add new BlockDriver handler: bdrv_cancel_in_flightVladimir Sementsov-Ogievskiy2021-02-121-0/+11
| | | | | | | | | It will be used to stop retrying NBD requests on mirror cancel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210205163720.887197-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* block/io: use int64_t bytes in copy_rangeVladimir Sementsov-Ogievskiy2021-02-031-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert now copy_range parameters which are already 64bit to signed type. It's safe as we don't work with requests overflowing BDRV_MAX_LENGTH (which is less than INT64_MAX), and do check the requests in bdrv_co_copy_range_internal() (by bdrv_check_request32(), which calls bdrv_check_request()). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-17-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* block/io: support int64_t bytes in read/write wrappersVladimir Sementsov-Ogievskiy2021-02-031-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). Now, since bdrv_co_preadv_part() and bdrv_co_pwritev_part() have been updated, update all their wrappers. For all of them type of 'bytes' is widening, so callers are safe. We have update request_fn in blkverify.c simultaneously. Still it's just a pointer to one of bdrv_co_pwritev() or bdrv_co_preadv(), and type is widening for callers of the request_fn anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-16-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>
* block/io: support int64_t bytes in bdrv_co_p{read,write}v_part()Vladimir Sementsov-Ogievskiy2021-02-031-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, prepare bdrv_co_preadv_part() and bdrv_co_pwritev_part() and their remaining dependencies now. bdrv_pad_request() is updated simultaneously, as pointer to bytes passed to it both from bdrv_co_pwritev_part() and bdrv_co_preadv_part(). So, all callers of bdrv_pad_request() are updated to pass 64bit bytes. bdrv_pad_request() is already good for 64bit requests, add corresponding assertion. Look at bdrv_co_preadv_part() and bdrv_co_pwritev_part(). Type is widening, so callers are safe. Let's look inside the functions. In bdrv_co_preadv_part() and bdrv_aligned_pwritev() we only pass bytes to other already int64_t interfaces (and some obviously safe calculations), it's OK. In bdrv_co_do_zero_pwritev() aligned_bytes may become large now, still it's passed to bdrv_aligned_pwritev which supports int64_t bytes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201211183934.169161-15-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>