summaryrefslogtreecommitdiffstats
path: root/include/block
Commit message (Collapse)AuthorAgeFilesLines
...
| * | block/block-copy: rename start to offset in interfacesVladimir Sementsov-Ogievskiy2020-03-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | offset/bytes pair is more usual naming in block layer, let's use it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
| * | block/block-copy: refactor interfaces to use bytes instead of endVladimir Sementsov-Ogievskiy2020-03-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a lot of "chunk_end - start" invocations, let's switch to bytes/cur_bytes scheme instead. While being here, improve check on block_copy_do_copy parameters to not overflow when calculating nbytes and use int64_t for bytes in block_copy for consistency. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200311103004.7649-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
| * | block/block-copy: fix progress calculationVladimir Sementsov-Ogievskiy2020-03-111-10/+5Star
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assume we have two regions, A and B, and region B is in-flight now, region A is not yet touched, but it is unallocated and should be skipped. Correspondingly, as progress we have total = A + B current = 0 If we reset unallocated region A and call progress_reset_callback, it will calculate 0 bytes dirty in the bitmap and call job_progress_set_remaining, which will set total = current + 0 = 0 + 0 = 0 So, B bytes are actually removed from total accounting. When job finishes we'll have total = 0 current = B , which doesn't sound good. This is because we didn't considered in-flight bytes, actually when calculating remaining, we should have set (in_flight + dirty_bytes) as remaining, not only dirty_bytes. To fix it, let's refactor progress calculation, moving it to block-copy itself instead of fixing callback. And, of course, track in_flight bytes count. We still have to keep one callback, to maintain backup job bytes_read calculation, but it will go on soon, when we turn the whole backup process into one block_copy call. Cc: qemu-stable@nongnu.org Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <20200311103004.7649-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* | Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell2020-03-111-2/+69
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging Pull request # gpg: Signature made Wed 11 Mar 2020 12:40:36 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: aio-posix: remove idle poll handlers to improve scalability aio-posix: support userspace polling of fd monitoring aio-posix: add io_uring fd monitoring implementation aio-posix: simplify FDMonOps->update() prototype aio-posix: extract ppoll(2) and epoll(7) fd monitoring aio-posix: move RCU_READ_LOCK() into run_poll_handlers() aio-posix: completely stop polling when disabled aio-posix: remove confusing QLIST_SAFE_REMOVE() qemu/queue.h: clear linked list pointers on remove Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * aio-posix: remove idle poll handlers to improve scalabilityStefan Hajnoczi2020-03-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there are many poll handlers it's likely that some of them are idle most of the time. Remove handlers that haven't had activity recently so that the polling loop scales better for guests with a large number of devices. This feature only takes effect for the Linux io_uring fd monitoring implementation because it is capable of combining fd monitoring with userspace polling. The other implementations can't do that and risk starving fds in favor of poll handlers, so don't try this optimization when they are in use. IOPS improves from 10k to 105k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe. [Clarified aio_poll_handlers locking discipline explanation in comment after discussion with Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>
| * aio-posix: support userspace polling of fd monitoringStefan Hajnoczi2020-03-091-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled from userspace. Previously userspace polling was only allowed when all AioHandler's had an ->io_poll() callback. This prevented starvation of fds by userspace pollable handlers. Add the FDMonOps->need_wait() callback that enables userspace polling even when some AioHandlers lack ->io_poll(). For example, it's now possible to do userspace polling when a TCP/IP socket is monitored thanks to Linux io_uring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com Message-Id: <20200305170806.1313245-7-stefanha@redhat.com>
| * aio-posix: add io_uring fd monitoring implementationStefan Hajnoczi2020-03-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent Linux io_uring API has several advantages over ppoll(2) and epoll(2). Details are given in the source code. Add an io_uring implementation and make it the default on Linux. Performance is the same as with epoll(7) but later patches add optimizations that take advantage of io_uring. It is necessary to change how aio_set_fd_handler() deals with deleting AioHandlers since removing monitored file descriptors is asynchronous in io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and aio_set_fd_handler() will let it handle deletion in that case. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>
| * aio-posix: simplify FDMonOps->update() prototypeStefan Hajnoczi2020-03-091-7/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The AioHandler *node, bool is_new arguments are more complicated to think about than simply being given AioHandler *old_node, AioHandler *new_node. Furthermore, the new Linux io_uring file descriptor monitoring mechanism added by the new patch requires access to both the old and the new nodes. Make this change now in preparation. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com Message-Id: <20200305170806.1313245-5-stefanha@redhat.com>
| * aio-posix: extract ppoll(2) and epoll(7) fd monitoringStefan Hajnoczi2020-03-091-2/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ppoll(2) and epoll(7) file descriptor monitoring implementations are mixed with the core util/aio-posix.c code. Before adding another implementation for Linux io_uring, extract out the existing ones so there is a clear interface and the core code is simpler. The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps struct. See the patch for details. Semantic changes: 1. ppoll(2) now reflects events from pollfds[] back into AioHandlers while we're still on the clock for adaptive polling. This was already happening for epoll(7), so if it's really an issue then we'll need to fix both in the future. 2. epoll(7)'s fallback to ppoll(2) while external events are disabled was broken when the number of fds exceeded the epoll(7) upgrade threshold. I guess this code path simply wasn't tested and no one noticed the bug. I didn't go out of my way to fix it but the correct code is simpler than preserving the bug. I also took some liberties in removing the unnecessary AioContext->epoll_available (just check AioContext->epollfd != -1 instead) and AioContext->epoll_enabled (it's implicit if our AioContext->fdmon_ops callbacks are being invoked) fields. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>
* | monitor/hmp: Move hmp_drive_add_node to block-hmp-cmds.cMaxim Levitsky2020-03-091-2/+3
| | | | | | | | | | | | | | Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-12-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* | monitor/hmp: move hmp_info_block* to block-hmp-cmds.cMaxim Levitsky2020-03-091-0/+4
| | | | | | | | | | | | | | Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-11-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* | monitor/hmp: move remaining hmp_block* functions to block-hmp-cmds.cMaxim Levitsky2020-03-091-0/+9
| | | | | | | | | | | | | | Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-10-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* | monitor/hmp: move hmp_nbd_server* to block-hmp-cmds.cMaxim Levitsky2020-03-091-0/+5
| | | | | | | | | | | | | | Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-9-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* | monitor/hmp: move hmp_snapshot_* to block-hmp-cmds.cMaxim Levitsky2020-03-091-0/+4
| | | | | | | | | | | | | | | | | | | | hmp_snapshot_blkdev is from GPLv2 version of the hmp-cmds.c thus have to change the licence to GPLv2 Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-8-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* | monitor/hmp: move hmp_block_job* to block-hmp-cmds.cMaxim Levitsky2020-03-091-0/+6
| | | | | | | | | | | | | | Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-7-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* | monitor/hmp: move hmp_drive_mirror and hmp_drive_backup to block-hmp-cmds.cMaxim Levitsky2020-03-091-3/+9
| | | | | | | | | | | | | | | | | | | | Moved code was added after 2012-01-13, thus under GPLv2+ Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-6-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed commit message
* | monitor/hmp: move hmp_drive_del and hmp_commit to block-hmp-cmds.cMaxim Levitsky2020-03-091-0/+4
| | | | | | | | | | | | | | Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20200308092440.23564-5-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* | monitor/hmp: rename device-hotplug.c to block/monitor/block-hmp-cmds.cMaxim Levitsky2020-03-091-0/+16
|/ | | | | | | | | | | | | | | | | | These days device-hotplug.c only contains the hmp_drive_add In the next patch, rest of hmp_drive* functions will be moved there. Also add block-hmp-cmds.h to contain prototypes of these functions License for block-hmp-cmds.h since it contains the code moved from sysemu.h which lacks license and thus according to LICENSE is under GPLv2+ Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200308092440.23564-4-mlevitsk@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* qemu-storage-daemon: Add --nbd-server optionKevin Wolf2020-03-061-0/+1
| | | | | | | | | | | | | | | | | Add a --nbd-server option to qemu-storage-daemon to start the built-in NBD server right away. It maps the arguments for nbd-server-start to the command line, with the exception that it uses SocketAddress instead of SocketAddressLegacy: New interfaces shouldn't use legacy types, and the additional nesting would be nasty on the command line. Example (only with required options): --nbd-server addr.type=inet,addr.host=localhost,addr.port=10809 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200224143008.13362-10-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Introduce 'bdrv_reopen_commit_post' stepPeter Krempa2020-03-061-0/+1
| | | | | | | | | Add another step in the reopen process where driver can execute code after permission changes are comitted. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <adc02cf591c3cb34e98e33518eb1c540a0f27db1.1582893284.git.pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* aio-posix: make AioHandler deletion O(1)Stefan Hajnoczi2020-02-221-1/+5
| | | | | | | | | | | | | | It is not necessary to scan all AioHandlers for deletion. Keep a list of deleted handlers instead of scanning the full list of all handlers. The AioHandler->deleted field can be dropped. Let's check if the handler has been inserted into the deleted list instead. Add a new QLIST_IS_INSERTED() API for this check. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* util/async: make bh_aio_poll() O(1)Stefan Hajnoczi2020-02-221-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | The ctx->first_bh list contains all created BHs, including those that are not scheduled. The list is iterated by the event loop and therefore has O(n) time complexity with respected to the number of created BHs. Rewrite BHs so that only scheduled or deleted BHs are enqueued. Only BHs that actually require action will be iterated. One semantic change is required: qemu_bh_delete() enqueues the BH and therefore invokes aio_notify(). The tests/test-aio.c:test_source_bh_delete_from_cb() test case assumed that g_main_context_iteration(NULL, false) returns false after qemu_bh_delete() but it now returns true for one iteration. Fix up the test case. This patch makes aio_compute_timeout() and aio_bh_poll() drop from a CPU profile reported by perf-top(1). Previously they combined to 9% CPU utilization when AioContext polling is commented out and the guest has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20200221093951.1414693-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qapi: Allow getting flat output from 'query-named-block-nodes'Peter Krempa2020-02-202-2/+4
| | | | | | | | | | | | | | When a management application manages node names there's no reason to recurse into backing images in the output of query-named-block-nodes. Add a parameter to the command which will return just the top level structs. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <4470f8c779abc404dcf65e375db195cd91a80651.1579509782.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Fixed coding style] Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Remove bdrv_recurse_is_first_non_filter()Max Reitz2020-02-182-12/+0Star
| | | | | | | | | It no longer has any users. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-11-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Add bdrv_recurse_can_replace()Max Reitz2020-02-181-0/+10
| | | | | | | | | | | | | | | | After a couple of follow-up patches, this function will replace bdrv_recurse_is_first_non_filter() in check_to_replace_node(). bdrv_recurse_is_first_non_filter() is both not sufficiently specific for check_to_replace_node() (it allows cases that should not be allowed, like replacing child nodes of quorum with dissenting data that have more parents than just quorum), and it is too restrictive (it is perfectly fine to replace filters). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Drop bdrv_is_first_non_filter()Max Reitz2020-02-181-1/+0Star
| | | | | | | | | | It is unused now. (And it was ugly because it needed to explore all BDS chains from the top.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* blockdev: adds bdrv_parse_aio to use io_uringAarushi Mehta2020-01-301-0/+1
| | | | | | | | | Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-8-stefanha@redhat.com Message-Id: <20200120141858.587874-8-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/io_uring: implements interfaces for io_uringAarushi Mehta2020-01-302-1/+27
| | | | | | | | | | | | Aborts when sqe fails to be set as sqes cannot be returned to the ring. Adds slow path for short reads for older kernels Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-5-stefanha@redhat.com Message-Id: <20200120141858.587874-5-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/block: add BDRV flag for io_uringAarushi Mehta2020-01-301-0/+1
| | | | | | | | | | Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-4-stefanha@redhat.com Message-Id: <20200120141858.587874-4-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/io: wait for serialising requests when a request becomes serialisingPaolo Bonzini2020-01-301-2/+1Star
| | | | | | | | | | | Marking without waiting would not result in actual serialising behavior. Thus, make a call bdrv_mark_request_serialising sufficient for serialisation to happen. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-3-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-3-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: eliminate BDRV_REQ_NO_SERIALISINGPaolo Bonzini2020-01-301-12/+0Star
| | | | | | | | | | | | It is unused since commit 00e30f0 ("block/backup: use backup-top instead of write notifiers", 2019-10-01), drop it to simplify the code. While at it, drop redundant assertions on flags. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-2-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-2-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: Add bdrv_qapi_perm_to_blk_perm()Max Reitz2020-01-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | We need some way to correlate QAPI BlockPermission values with BLK_PERM_* flags. We could: (1) have the same order in the QAPI definition as the the BLK_PERM_* flags are in LSb-first order. However, then there is no guarantee that they actually match (e.g. when someone modifies the QAPI schema without thinking of the BLK_PERM_* definitions). We could add static assertions, but these would break what’s good about this solution, namely its simplicity. (2) define the BLK_PERM_* flags based on the BlockPermission values. But this way whenever someone were to modify the QAPI order (perfectly sensible in theory), the BLK_PERM_* values would change. Because these values are used for file locking, this might break file locking between different qemu versions. Therefore, go the slightly more cumbersome way: Add a function to translate from the QAPI constants to the BLK_PERM_* flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191108123455.39445-2-mreitz@redhat.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/snapshot: rename Error ** parameter to more common errpVladimir Sementsov-Ogievskiy2019-12-181-1/+1
| | | | | | | Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@Redhat.com> Message-Id: <20191205174635.18758-11-vsementsov@virtuozzo.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
* nbd: Don't send oversize stringsEric Blake2019-11-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Qemu as server currently won't accept export names larger than 256 bytes, nor create dirty bitmap names longer than 1023 bytes, so most uses of qemu as client or server have no reason to get anywhere near the NBD spec maximum of a 4k limit per string. However, we weren't actually enforcing things, ignoring when the remote side violates the protocol on input, and also having several code paths where we send oversize strings on output (for example, qemu-nbd --description could easily send more than 4k). Tighten things up as follows: client: - Perform bounds check on export name and dirty bitmap request prior to handing it to server - Validate that copied server replies are not too long (ignoring NBD_INFO_* replies that are not copied is not too bad) server: - Perform bounds check on export name and description prior to advertising it to client - Reject client name or metadata query that is too long - Adjust things to allow full 4k name limit rather than previous 256 byte limit Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-4-eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
* bitmap: Enforce maximum bitmap name lengthEric Blake2019-11-181-0/+2
| | | | | | | | | | | | | | We document that for qcow2 persistent bitmaps, the name cannot exceed 1023 bytes. It is inconsistent if transient bitmaps do not have to abide by the same limit, and it is unlikely that any existing client even cares about using bitmap names this long. It's time to codify that ALL bitmaps managed by qemu (whether persistent in qcow2 or not) have a documented maximum length. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-3-eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
* nbd/server: Prefer heap over stack for parsing client namesEric Blake2019-11-181-5/+5
| | | | | | | | | | | | | | | | As long as we limit NBD names to 256 bytes (the bare minimum permitted by the standard), stack-allocation works for parsing a name received from the client. But as mentioned in a comment, we eventually want to permit up to the 4k maximum of the NBD standard, which is too large for stack allocation; so switch everything in the server to use heap allocation. For now, there is no change in actually supported name length. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20191114024635.11363-2-eblake@redhat.com> [eblake: fix uninit variable compile failure] Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
* block: Add bdrv_co_get_self_request()Max Reitz2019-11-041-0/+1
| | | | | | | Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Make wait/mark serialising requests publicMax Reitz2019-11-041-0/+3
| | | | | | | | | | | Make both bdrv_mark_request_serialising() and bdrv_wait_serialising_requests() public so they can be used from block drivers. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191101152510.11719-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* nvme: fix NSSRS offset in CAP registerKlaus Jensen2019-11-041-1/+1
| | | | | | | | | | | | | | | | Fix the offset of the NSSRS field the CAP register. From NVME 1.4, section 3 ("Controller Registers"), subsection 3.1.1 ("Offset 0h: CAP – Controller Capabilities") CAP_NSSRS_SHIFT is bit 36, not 33. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reported-by: Javier Gonzalez <javier.gonz@samsung.com> Message-id: 20191023073315.446534-1-its@irrelevant.dk Reviewed-by: John Snow <jsnow@redhat.com> [mreitz: Added John's note on the location in the specification where this information can be found] Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Add @exact parameter to bdrv_co_truncate()Max Reitz2019-10-282-4/+19
| | | | | | | | | | | | | | | | | | | | We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/nvme: add support for write zerosMaxim Levitsky2019-10-281-1/+18
| | | | | | | Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-2-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/block-copy: increase buffered copy requestVladimir Sementsov-Ogievskiy2019-10-281-1/+1
| | | | | | | | | | No reason to limit buffered copy to one cluster. Let's allow up to 1 MiB. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-7-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/block-copy: add memory limitVladimir Sementsov-Ogievskiy2019-10-281-0/+3
| | | | | | | | | | | | | Currently total allocation for parallel requests to block-copy instance is unlimited. Let's limit it to 128 MiB. For now block-copy is used only in backup, so actually we limit total allocation for backup job. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20191022111805.3432-6-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commitVladimir Sementsov-Ogievskiy2019-10-171-6/+0Star
| | | | | | | | | | | | | | | | | The only reason I can imagine for this strange code at the very-end of bdrv_reopen_commit is the fact that bs->read_only updated after calling drv->bdrv_reopen_commit in bdrv_reopen_commit. And in the same time, prior to previous commit, qcow2_reopen_bitmaps_rw did a wrong check for being writable, when actually it only need writable file child not self. So, as it's fixed, let's move things to correct place. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-10-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmapsVladimir Sementsov-Ogievskiy2019-10-171-1/+0Star
| | | | | | | | | | | | | Firstly, no reason to optimize failure path. Then, function name is ambiguous: it checks for readonly and similar things, but someone may think that it will ignore normal bitmaps which was just unchanged, and this is in bad relation with the fact that we should drop IN_USE flag for unchanged bitmaps in the image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190927122355.7344-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block: switch reopen queue from QSIMPLEQ to QTAILQVladimir Sementsov-Ogievskiy2019-10-171-1/+1
| | | | | | | | | | We'll need reverse-foreach in the following commit, QTAILQ support it, so move to QTAILQ. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190927122355.7344-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block/dirty-bitmap: refactor bdrv_dirty_bitmap_nextVladimir Sementsov-Ogievskiy2019-10-171-2/+7
| | | | | | | | | | | bdrv_dirty_bitmap_next is always used in same pattern. So, split it into _next and _first, instead of combining two functions into one and add FOR_EACH_DIRTY_BITMAP macro. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block/dirty-bitmap: add bs linkVladimir Sementsov-Ogievskiy2019-10-171-9/+5Star
| | | | | | | | | | | Add bs field to BdrvDirtyBitmap structure. Drop BlockDriverState parameter from bitmap APIs where possible. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-3-vsementsov@virtuozzo.com [Rebased on top of block-copy. --js] Signed-off-by: John Snow <jsnow@redhat.com>
* block/dirty-bitmap: drop metaVladimir Sementsov-Ogievskiy2019-10-171-5/+0Star
| | | | | | | | | Drop meta bitmaps, as they are unused. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-2-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
* block/qcow2: proper locking on bitmap add/remove pathsVladimir Sementsov-Ogievskiy2019-10-171-5/+5
| | | | | | | | | | | | | | | | | | qmp_block_dirty_bitmap_add and do_block_dirty_bitmap_remove do acquire aio context since 0a6c86d024c52b. But this is not enough: we also must lock qcow2 mutex when access in-image metadata. Especially it concerns freeing qcow2 clusters. To achieve this, move qcow2_can_store_new_dirty_bitmap and qcow2_remove_persistent_dirty_bitmap to coroutine context. Since we work in coroutines in correct aio context, we don't need context acquiring in blockdev.c anymore, drop it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190920082543.23444-4-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>