summaryrefslogtreecommitdiffstats
path: root/block
Commit message (Collapse)AuthorAgeFilesLines
* block: introduce preallocate filterVladimir Sementsov-Ogievskiy2020-12-182-0/+560
| | | | | | | | | | | | | It's intended to be inserted between format and protocol nodes to preallocate additional space (expanding protocol file) on writes crossing EOF. It improves performance for file-systems with slow allocation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201021145859.11201-9-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Two comment fixes, and bumped the version from 5.2 to 6.0] Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: introduce BDRV_REQ_NO_WAIT flagVladimir Sementsov-Ogievskiy2020-12-181-1/+10
| | | | | | | | | | | Add flag to make serialising request no wait: if there are conflicting requests, just return error immediately. It's will be used in upcoming preallocate filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: bdrv_mark_request_serialising: split non-waiting functionVladimir Sementsov-Ogievskiy2020-12-182-13/+24
| | | | | | | | | | | | | | | | | | We'll need a separate function, which will only "mark" request serialising with specified align but not wait for conflicting requests. So, it will be like old bdrv_mark_request_serialising(), before merging bdrv_wait_serialising_requests_locked() into it. To reduce the possible mess, let's do the following: Public function that does both marking and waiting will be called bdrv_make_request_serialising, and private function which will only "mark" will be called tracked_request_set_serialising(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/io: bdrv_wait_serialising_requests_locked: drop extra bs argVladimir Sementsov-Ogievskiy2020-12-181-5/+5
| | | | | | | | | | | | | bs is linked in req, so no needs to pass it separately. Most of tracked-requests API doesn't have bs argument. Actually, after this patch only tracked_request_begin has it, but it's for purpose. While being here, also add a comment about what "_locked" is. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/io: split out bdrv_find_conflicting_requestVladimir Sementsov-Ogievskiy2020-12-181-30/+41
| | | | | | | | | To be reused in separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/io.c: drop assertion on double waiting for request serialisationVladimir Sementsov-Ogievskiy2020-12-181-10/+1Star
| | | | | | | | | | | | | | | | | The comments states, that on misaligned request we should have already been waiting. But for bdrv_padding_rmw_read, we called bdrv_mark_request_serialising with align = request_alignment, and now we serialise with align = cluster_size. So we may have to wait again with larger alignment. Note, that the only user of BDRV_REQ_SERIALISING is backup which issues cluster-aligned requests, so seems the assertion should not fire for now. But it's wrong anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201021145859.11201-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/file-posix: fix a possible undefined behaviorPan Nengyuan2020-12-131-1/+1
| | | | | | | | | | | | | | | local_err is not initialized to NULL, it will cause a assert error as below: qemu/util/error.c:59: error_setv: Assertion `*errp == NULL' failed. Fixes: c6447510690 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Message-Id: <20201023061218.2080844-8-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* block: Fix deadlock in bdrv_co_yield_to_drain()Kevin Wolf2020-12-111-17/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | If bdrv_co_yield_to_drain() is called for draining a block node that runs in a different AioContext, it keeps that AioContext locked while it yields and schedules a BH in the AioContext to do the actual drain. As long as executing the BH is the very next thing that the event loop of the node's AioContext does, this actually happens to work, but when it tries to execute something else that wants to take the AioContext lock, it will deadlock. (In the bug report, this other thing is a virtio-scsi device running virtio_scsi_data_plane_handle_cmd().) Instead, always drop the AioContext lock across the yield and reacquire it only when the coroutine is reentered. The BH needs to unconditionally take the lock for itself now. This fixes the 'block_resize' QMP command on a block node that runs in an iothread. Cc: qemu-stable@nongnu.org Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201203172311.68232-4-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: introduce BDRV_MAX_LENGTHVladimir Sementsov-Ogievskiy2020-12-112-12/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are going to modify block layer to work with 64bit requests. And first step is moving to int64_t type for both offset and bytes arguments in all block request related functions. It's mostly safe (when widening signed or unsigned int to int64_t), but switching from uint64_t is questionable. So, let's first establish the set of requests we want to work with. First signed int64_t should be enough, as off_t is signed anyway. Then, obviously offset + bytes should not overflow. And most interesting: (offset + bytes) being aligned up should not overflow as well. Aligned to what alignment? First thing that comes in mind is bs->bl.request_alignment, as we align up request to this alignment. But there is another thing: look at bdrv_mark_request_serialising(). It aligns request up to some given alignment. And this parameter may be bdrv_get_cluster_size(), which is often a lot greater than bs->bl.request_alignment. Note also, that bdrv_mark_request_serialising() uses signed int64_t for calculations. So, actually, we already depend on some restrictions. Happily, bdrv_get_cluster_size() returns int and bs->bl.request_alignment has 32bit unsigned type, but defined to be a power of 2 less than INT_MAX. So, we may establish, that INT_MAX is absolute maximum for any kind of alignment that may occur with the request. Note, that bdrv_get_cluster_size() is not documented to return power of 2, still bdrv_mark_request_serialising() behaves like it is. Also, backup uses bdi.cluster_size and is not prepared to it not being power of 2. So, let's establish that Qemu supports only power-of-2 clusters and alignments. So, alignment can't be greater than 2^30. Finally to be safe with calculations, to not calculate different maximums for different nodes (depending on cluster size and request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30) as absolute maximum bytes length for Qemu. Actually, it's not much less than INT64_MAX. OK, then, let's apply it to block/io. Let's consider all block/io entry points of offset/bytes: 4 bytes/offset interface functions: bdrv_co_preadv_part(), bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and bdrv_co_pdiscard() and we check them all with bdrv_check_request(). We also have one entry point with only offset: bdrv_co_truncate(). Check the offset. And one public structure: BdrvTrackedRequest. Happily, it has only three external users: file-posix.c: adopted by this patch write-threshold.c: only read fields test-write-threshold.c: sets obviously small constant values Better is to make the structure private and add corresponding interfaces.. Still it's not obvious what kind of interface is needed for file-posix.c. Let's keep it public but add corresponding assertions. After this patch we'll convert functions in block/io.c to int64_t bytes and offset parameters. We can assume that offset/bytes pair always satisfy new restrictions, and make corresponding assertions where needed. If we reach some offset/bytes point in block/io.c missing bdrv_check_request() it is considered a bug. As well, if block/io.c modifies a offset/bytes request, expanding it more then aligning up to request_alignment, it's a bug too. For all io requests except for discard we keep for now old restriction of 32bit request length. iotest 206 output error message changed, as now test disk size is larger than new limit. Add one more test case with new maximum disk size to cover too-big-L1 case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/io: bdrv_check_byte_request(): drop bdrv_is_inserted()Vladimir Sementsov-Ogievskiy2020-12-111-13/+12Star
| | | | | | | | | | | | | | | | | | | | | | | | | Move bdrv_is_inserted() calls into callers. We are going to make bdrv_check_byte_request() a clean thing. bdrv_is_inserted() is not about checking the request, it's about checking the bs. So, it should be separate. With this patch we probably change error path for some failure scenarios. But depending on the fact that querying too big request on empty cdrom (or corrupted qcow2 node with no drv) will result in EIO and not ENOMEDIUM would be very strange. More over, we are going to move to 64bit requests, so larger requests will be allowed anyway. More over, keeping in mind that cdrom is the only driver that has .bdrv_is_inserted() handler it's strange that we should care so much about it in generic block layer, intuitively we should just do read and write, and cdrom driver should return correct errors if it is not inserted. But it's a work for another series. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/io: bdrv_refresh_limits(): use ERRP_GUARDVladimir Sementsov-Ogievskiy2020-12-111-4/+3Star
| | | | | | | | | This simplifies following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/file-posix: fix workaround in raw_do_pwrite_zeroes()Vladimir Sementsov-Ogievskiy2020-12-111-1/+0Star
| | | | | | | | | | | | | | | | We should not set overlap_bytes: 1. Don't worry: it is calculated by bdrv_mark_request_serialising() and will be equal to or greater than bytes anyway. 2. If the request was already aligned up to some greater alignment, than we may break things: we reduce overlap_bytes, and further bdrv_mark_request_serialising() may not help, as it will not restore old bigger alignment. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* file-posix: check the use_lock before setting the file lockLi Feng2020-12-111-1/+1
| | | | | | | | | | | | | The scenario is that when accessing a volume on an NFS filesystem without supporting the file lock, Qemu will complain "Failed to lock byte 100", even when setting the file.locking = off. We should do file lock related operations only when the file.locking is enabled, otherwise, the syscall of 'fcntl' will return non-zero. Signed-off-by: Li Feng <fengli@smartx.com> Message-Id: <1607341446-85506-1-git-send-email-fengli@smartx.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* fuse: Implement hole detection through lseekMax Reitz2020-12-111-0/+77
| | | | | | | | | | This is a relatively new feature in libfuse (available since 3.8.0, which was released in November 2019), so we have to add a dedicated check whether it is available before making use of it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* fuse: (Partially) implement fallocate()Max Reitz2020-12-111-0/+84
| | | | | | | | | This allows allocating areas after the (old) EOF as part of a growing resize, writing zeroes, and discarding. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* fuse: Allow growable exportsMax Reitz2020-12-111-8/+36
| | | | | | | | | | | | | | These will behave more like normal files in that writes beyond the EOF will automatically grow the export size. As an optimization, keep the RESIZE permission for growable exports so we do not have to take it for every post-EOF write. (This permission is not released when the export is destroyed, because at that point the BlockBackend is destroyed altogether anyway.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-5-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* fuse: Implement standard FUSE operationsMax Reitz2020-12-111-0/+242
| | | | | | | | | This makes the export actually useful instead of only producing errors whenever it is accessed. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* fuse: Allow exporting BDSs via FUSEMax Reitz2020-12-113-0/+301
| | | | | | | | | | | | | | | | | | | | | | | | block-export-add type=fuse allows mounting block graph nodes via FUSE on some existing regular file. That file should then appears like a raw disk image, and accesses to it result in accesses to the exported BDS. Right now, we only implement the necessary block export functions to set it up and shut it down. We do not implement any access functions, so accessing the mount point only results in errors. This will be addressed by a followup patch. We keep a hash table of exported mount points, because we want to be able to detect when users try to use a mount point twice. This is because we invoke stat() to check whether the given mount point is a regular file, but if that file is served by ourselves (because it is already used as a mount point), then this stat() would have to be served by ourselves, too, which is impossible to do while we (as the caller) are waiting for it to settle. Therefore, keep track of mount point paths to at least catch the most obvious instances of that problem. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/iscsi: Use lock guard macrosGan Qixin2020-12-111-26/+24Star
| | | | | | | | | Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/iscsi. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Message-Id: <20201203075055.127773-5-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/throttle-groups: Use lock guard macrosGan Qixin2020-12-111-25/+23Star
| | | | | | | | | Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/throttle-groups. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Message-Id: <20201203075055.127773-4-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/curl: Use lock guard macrosGan Qixin2020-12-111-14/+14
| | | | | | | | | | Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/curl. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201203075055.127773-3-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/accounting: Use lock guard macrosGan Qixin2020-12-111-17/+15Star
| | | | | | | | | | Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/accounting. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201203075055.127773-2-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Tweak a few "Parameter 'NAME' expects THING" error messageMarkus Armbruster2020-12-101-1/+1
| | | | | | | Change to "expects a THING" where that's an obvious improvement Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201113082626.2725812-11-armbru@redhat.com>
* block/export: avoid g_return_val_if() input validationStefan Hajnoczi2020-12-091-1/+3
| | | | | | | | | | | | | | Do not validate input with g_return_val_if(). This API is intended for checking programming errors and is compiled out with -DG_DISABLE_CHECKS. Use an explicit if statement for input validation so it cannot accidentally be compiled out. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201118091644.199527-5-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* libvhost-user: make it a meson subprojectMarc-André Lureau2020-12-081-1/+1
| | | | | | | | | | | | | By making libvhost-user a subproject, check it builds standalone (without the global QEMU cflags etc). Note that the library still relies on QEMU include/qemu/atomic.h and linux_headers/. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20201125100640.366523-6-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* qcow2: Fix corruption on write_zeroes with MAY_UNMAPMaxim Levitsky2020-11-241-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") introduced a subtle change to code in zero_in_l2_slice: It swapped the order of 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); To 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); It seems harmless, however the call to qcow2_free_any_clusters can trigger a cache flush which can mark the L2 table as clean, and assuming that this was the last write to it, a stale version of it will remain on the disk. Now we have a valid L2 entry pointing to a freed cluster. Oops. Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [ kwolf: Fixed to restore the correct original order from before 205fa50750; added comments like in discard_in_l2_slice(). ] Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201124092815.39056-1-kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Merge remote-tracking branch ↵Peter Maydell2020-11-232-105/+134
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'remotes/stefanha-gitlab/tags/block-pull-request' into staging Pull request for 5.2 NVMe fixes to solve IOMMU issues on non-x86 and error message/tracing improvements. Elena Afanasova's ioeventfd fixes are also included. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> # gpg: Signature made Wed 04 Nov 2020 15:18:16 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha-gitlab/tags/block-pull-request: (33 commits) util/vfio-helpers: Assert offset is aligned to page size util/vfio-helpers: Convert vfio_dump_mapping to trace events util/vfio-helpers: Improve DMA trace events util/vfio-helpers: Trace where BARs are mapped util/vfio-helpers: Trace PCI BAR region info util/vfio-helpers: Trace PCI I/O config accesses util/vfio-helpers: Improve reporting unsupported IOMMU type block/nvme: Fix nvme_submit_command() on big-endian host block/nvme: Fix use of write-only doorbells page on Aarch64 arch block/nvme: Align iov's va and size on host page size block/nvme: Change size and alignment of prp_list_pages block/nvme: Change size and alignment of queue block/nvme: Change size and alignment of IDENTIFY response buffer block/nvme: Correct minimum device page size block/nvme: Set request_alignment at initialization block/nvme: Simplify nvme_cmd_sync() block/nvme: Simplify ADMIN queue access block/nvme: Correctly initialize Admin Queue Attributes block/nvme: Use definitions instead of magic values in add_io_queue() block/nvme: Introduce Completion Queue definitions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * block/nvme: Fix nvme_submit_command() on big-endian hostPhilippe Mathieu-Daudé2020-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Completion Queue Command Identifier is a 16-bit value, so nvme_submit_command() is unlikely to work on big-endian hosts, as the relevant bits are truncated. Fix by using the correct byte-swap function. Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-25-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Fix use of write-only doorbells page on Aarch64 archPhilippe Mathieu-Daudé2020-11-031-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | qemu_vfio_pci_map_bar() calls mmap(), and mmap(2) states: 'offset' must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE). In commit f68453237b9 we started to use an offset of 4K which broke this contract on Aarch64 arch. Fix by mapping at offset 0, and and accessing doorbells at offset=4K. Fixes: f68453237b9 ("block/nvme: Map doorbells pages write-only") Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-24-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Align iov's va and size on host page sizeEric Auger2020-11-031-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure iov's va and size are properly aligned on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-23-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Change size and alignment of prp_list_pagesEric Auger2020-11-031-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation of 64kB host page support, let's change the size and alignment of the prp_list_pages so that the VFIO DMA MAP succeeds with 64kB host page size. We align on the host page size. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-22-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Change size and alignment of queueEric Auger2020-11-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation of 64kB host page support, let's change the size and alignment of the queue so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-21-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Change size and alignment of IDENTIFY response bufferEric Auger2020-11-031-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation of 64kB host page support, let's change the size and alignment of the IDENTIFY command response buffer so that the VFIO DMA MAP succeeds. We align on the host page size. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-20-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Correct minimum device page sizePhilippe Mathieu-Daudé2020-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While trying to simplify the code using a macro, we forgot the 12-bit shift... Correct that. Fixes: fad1eb68862 ("block/nvme: Use register definitions from 'block/nvme.h'") Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-19-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Set request_alignment at initializationPhilippe Mathieu-Daudé2020-11-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bdd6a90a9e5 ("block: Add VFIO based NVMe driver") sets the request_alignment in nvme_refresh_limits(). For consistency, also set it during initialization. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-18-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Simplify nvme_cmd_sync()Philippe Mathieu-Daudé2020-11-031-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As all commands use the ADMIN queue, it is pointless to pass it as argument each time. Remove the argument, and rename the function as nvme_admin_cmd_sync() to make this new behavior clearer. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-17-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Simplify ADMIN queue accessPhilippe Mathieu-Daudé2020-11-031-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to dereference from BDRVNVMeState each time. Use a NVMeQueuePair pointer on the admin queue. The nvme_init() becomes easier to review, matching the style of nvme_add_io_queue(). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-16-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Correctly initialize Admin Queue AttributesPhilippe Mathieu-Daudé2020-11-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the specification chapter 3.1.8 "AQA - Admin Queue Attributes" the Admin Submission Queue Size field is a 0’s based value: Admin Submission Queue Size (ASQS): Defines the size of the Admin Submission Queue in entries. Enabling a controller while this field is cleared to 00h produces undefined results. The minimum size of the Admin Submission Queue is two entries. The maximum size of the Admin Submission Queue is 4096 entries. This is a 0’s based value. This bug has never been hit because the device initialization uses a single command synchronously :) Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-15-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Use definitions instead of magic values in add_io_queue()Philippe Mathieu-Daudé2020-11-031-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Replace magic values by definitions, and simplifiy since the number of queues will never reach 64K. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-14-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Make nvme_init_queue() return boolean indicating errorPhilippe Mathieu-Daudé2020-11-031-9/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just for consistency, following the example documented since commit e3fe3988d7 ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. This simplifies a bit nvme_create_queue_pair(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-12-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Make nvme_identify() return boolean indicating errorPhilippe Mathieu-Daudé2020-11-031-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just for consistency, following the example documented since commit e3fe3988d7 ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Directly pass errp as the local_err is not requested in our case. Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20201029093306.1063879-11-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Use unsigned integer for queue counter/sizePhilippe Mathieu-Daudé2020-11-032-25/+23Star
| | | | | | | | | | | | | | | | | | | | | | | | | | We can not have negative queue count/size/index, use unsigned type. Rename 'nr_queues' as 'queue_count' to match the spec naming. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-10-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Move definitions before structure declarationsPhilippe Mathieu-Daudé2020-11-031-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | To be able to use some definitions in structure declarations, move them earlier. No logical change. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-9-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Trace queue pair creation/deletionPhilippe Mathieu-Daudé2020-11-032-0/+5
| | | | | | | | | | | | | | | | | | | | Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-8-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Improve nvme_free_req_queue_wait() trace informationPhilippe Mathieu-Daudé2020-11-032-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | What we want to trace is the block driver state and the queue index. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-7-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Trace nvme_poll_queue() per queuePhilippe Mathieu-Daudé2020-11-032-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As we want to enable multiple queues, report the event in each nvme_poll_queue() call, rather than once in the callback calling nvme_poll_queues(). Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-6-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Trace controller capabilitiesPhilippe Mathieu-Daudé2020-11-032-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Controllers have different capabilities and report them in the CAP register. We are particularly interested by the page size limits. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-5-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Report warning with warn_report()Philippe Mathieu-Daudé2020-11-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Instead of displaying warning on stderr, use warn_report() which also displays it on the monitor. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-4-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
| * block/nvme: Use hex format to display offset in trace eventsPhilippe Mathieu-Daudé2020-11-031-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Use the same format used for the hw/vfio/ trace events. Suggested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20201029093306.1063879-3-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com>
* | Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell2020-11-173-1/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patches for 5.2.0-rc2: - quorum: Fix crash with rewrite-corrupted and without read-write user - io_uring: do not use pointer after free - file-posix: Use fallback path for -EBUSY from FALLOC_FL_PUNCH_HOLE - iotests: Fix failure on Python 3.9 due to use of a deprecated function - char-stdio: Fix QMP default for 'signal' # gpg: Signature made Tue 17 Nov 2020 11:43:17 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests/081: Test rewrite-corrupted without WRITE iotests/081: Filter image format after testdir quorum: Require WRITE perm with rewrite-corrupted io_uring: do not use pointer after free file-posix: allow -EBUSY errors during write zeros on raw block devices iotests: Replace deprecated ConfigParser.readfp() char-stdio: Fix QMP default for 'signal' Signed-off-by: Peter Maydell <peter.maydell@linaro.org>