summaryrefslogtreecommitdiffstats
path: root/block/nvme.c
Commit message (Collapse)AuthorAgeFilesLines
...
* hw/block/nvme: bump spec data structures to v1.3Klaus Jensen2020-09-021-9/+9
| | | | | | | | | | | | | | Add missing fields in the Identify Controller and Identify Namespace data structures to bring them in line with NVMe v1.3. This also adds data structures and defines for SGL support which requires a couple of trivial changes to the nvme block driver as well. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Fam Zheng <fam@euphon.net> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-2-its@irrelevant.dk>
* block/nvme: support nested aio_poll()Stefan Hajnoczi2020-06-231-6/+57
| | | | | | | | | | | | | | | | | | | | QEMU block drivers are supposed to support aio_poll() from I/O completion callback functions. This means completion processing must be re-entrant. The standard approach is to schedule a BH during completion processing and cancel it at the end of processing. If aio_poll() is invoked by a callback function then the BH will run. The BH continues the suspended completion processing. All of this means that request A's cb() can synchronously wait for request B to complete. Previously the nvme block driver would hang because it didn't process completions from nested aio_poll(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-8-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/nvme: keep BDRVNVMeState pointer in NVMeQueuePairStefan Hajnoczi2020-06-231-32/+38
| | | | | | | | | | | | | Passing around both BDRVNVMeState and NVMeQueuePair is unwieldy. Reduce the number of function arguments by keeping the BDRVNVMeState pointer in NVMeQueuePair. This will come in handly when a BH is introduced in a later patch and only one argument can be passed to it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/nvme: clarify that free_req_queue is protected by q->lockStefan Hajnoczi2020-06-231-1/+1
| | | | | | | | | | Existing users access free_req_queue under q->lock. Document this. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/nvme: switch to a NVMeRequest freelistStefan Hajnoczi2020-06-231-27/+54
| | | | | | | | | | | | | | | | | | | | | | There are three issues with the current NVMeRequest->busy field: 1. The busy field is accidentally accessed outside q->lock when request submission fails. 2. Waiters on free_req_queue are not woken when a request is returned early due to submission failure. 2. Finding a free request involves scanning all requests. This makes request submission O(n^2). Switch to an O(1) freelist that is always accessed under the lock. Also differentiate between NVME_QUEUE_SIZE, the actual SQ/CQ size, and NVME_NUM_REQS, the number of usable requests. This makes the code simpler than using NVME_QUEUE_SIZE everywhere and having to keep in mind that one slot is reserved. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/nvme: don't access CQE after moving cq.headStefan Hajnoczi2020-06-231-1/+4
| | | | | | | | | | | | | | | | | | | Do not access a CQE after incrementing q->cq.head and releasing q->lock. It is unlikely that this causes problems in practice but it's a latent bug. The reason why it should be safe at the moment is that completion processing is not re-entrant and the CQ doorbell isn't written until the end of nvme_process_completion(). Make this change now because QEMU expects completion processing to be re-entrant and later patches will do that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/nvme: drop tautologous assertionStefan Hajnoczi2020-06-231-1/+0Star
| | | | | | | | | | | | | | | | | nvme_process_completion() explicitly checks cid so the assertion that follows is always true: if (cid == 0 || cid > NVME_QUEUE_SIZE) { ... continue; } assert(cid <= NVME_QUEUE_SIZE); Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/nvme: poll queues without q->lockStefan Hajnoczi2020-06-231-0/+12
| | | | | | | | | | | A lot of CPU time is spent simply locking/unlocking q->lock during polling. Check for completion outside the lock to make q->lock disappear from the profile. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* Compress lines for immediate returnSimran Singhal2020-05-041-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Compress two lines into a single line if immediate return statement is found. It also remove variables progress, val, data, ret and sock as they are no longer needed. Remove space between function "mixer_load" and '(' to fix the checkpatch.pl error:- ERROR: space prohibited between function name and open parenthesis '(' Done using following coccinelle script: @@ local idexpression ret; expression e; @@ -ret = +return e; -return ret; Signed-off-by: Simran Singhal <singhalsimran0@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200401165314.GA3213@simran-Inspiron-5558> [lv: in handle_aiocb_write_zeroes_unmap() move "int ret" inside the #ifdef] Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* block: trickle down the fallback image creation function use to the block ↵Maxim Levitsky2020-03-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers Instead of checking the .bdrv_co_create_opts to see if we need the fallback, just implement the .bdrv_co_create_opts in the drivers that need it. This way we don't break various places that need to know if the underlying protocol/format really supports image creation, and this way we still allow some drivers to not support image creation. Fixes: fd17146cd93d1704cd96d7c2757b325fc7aac6fd Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1816007 Note that technically this driver reverts the image creation fallback for the vxhs driver since I don't have a means to test it, and IMHO it is better to leave it not supported as it was prior to generic image creation patches. Also drop iscsi_create_opts which was left accidentally. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-3-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> [mreitz: Fixed alignment, and moved bdrv_co_create_opts_simple() and bdrv_create_opts_simple from block.h into block_int.h] Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/nvme: add support for discardMaxim Levitsky2019-10-281-0/+83
| | | | | | | Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-3-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/nvme: add support for write zerosMaxim Levitsky2019-10-281-1/+71
| | | | | | | Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190913133627.28450-2-mlevitsk@redhat.com Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* replay: add BH oneshot event for block layerPavel Dovgalyuk2019-10-141-2/+4
| | | | | | | | | | | | Replay is capable of recording normal BH events, but sometimes there are single use callbacks scheduled with aio_bh_schedule_oneshot function. This patch enables recording and replaying such callbacks. Block layer uses these events for calling the completion function. Replaying these calls makes the execution deterministic. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Include qemu/main-loop.h lessMarkus Armbruster2019-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
* nvme: Limit blkshift to 12 (for 4 kB blocks)Max Reitz2019-07-301-11/+11
| | | | | | | | | | | | | Linux does not support blocks greater than 4 kB anyway, so we might as well limit blkshift to 12 and thus save us from some potential trouble. Reported-by: Peter Maydell <peter.maydell@linaro.org> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190730114812.10493-1-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Coverity: CID 1403771 Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/nvme: don't touch the completion entriesMaxim Levitsky2019-07-221-4/+1Star
| | | | | | | | | | | Completion entries are meant to be only read by the host and written by the device. The driver is supposed to scan the completions from the last point where it left, and until it sees a completion with non flipped phase bit. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716163020.13383-4-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/nvme: support larger that 512 bytes sector devicesMaxim Levitsky2019-07-221-5/+40
| | | | | | | | | | | | Currently the driver hardcodes the sector size to 512, and doesn't check the underlying device. Fix that. Also fail if underlying nvme device is formatted with metadata as this needs special support. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-id: 20190716163020.13383-3-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/nvme: fix doorbell strideMaxim Levitsky2019-07-221-1/+1
| | | | | | | | | Fix the math involving non standard doorbell stride Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190716163020.13383-2-mlevitsk@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* nvme: Set number of queues later in nvme_init()Michal Privoznik2019-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating the admin queue in nvme_init() the variable that holds the number of queues created is modified before actual queue creation. This is a problem because if creating the queue fails then the variable is left in inconsistent state. This was actually observed when I tried to hotplug a nvme disk. The control got to nvme_file_open() which called nvme_init() which failed and thus nvme_close() was called which in turn called nvme_free_queue_pair() with queue being NULL. This lead to an instant crash: #0 0x000055d9507ec211 in nvme_free_queue_pair (bs=0x55d952ddb880, q=0x0) at block/nvme.c:164 #1 0x000055d9507ee180 in nvme_close (bs=0x55d952ddb880) at block/nvme.c:729 #2 0x000055d9507ee3d5 in nvme_file_open (bs=0x55d952ddb880, options=0x55d952bb1410, flags=147456, errp=0x7ffd8e19e200) at block/nvme.c:781 #3 0x000055d9507629f3 in bdrv_open_driver (bs=0x55d952ddb880, drv=0x55d95109c1e0 <bdrv_nvme>, node_name=0x0, options=0x55d952bb1410, open_flags=147456, errp=0x7ffd8e19e310) at block.c:1291 #4 0x000055d9507633d6 in bdrv_open_common (bs=0x55d952ddb880, file=0x0, options=0x55d952bb1410, errp=0x7ffd8e19e310) at block.c:1551 #5 0x000055d950766881 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d952bb1410, flags=32768, parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, errp=0x7ffd8e19e510) at block.c:3063 #6 0x000055d950765ae4 in bdrv_open_child_bs (filename=0x0, options=0x55d9541cdff0, bdref_key=0x55d950af33aa "file", parent=0x55d9538ce420, child_role=0x55d950eaade0 <child_file>, allow_none=true, errp=0x7ffd8e19e510) at block.c:2712 #7 0x000055d950766633 in bdrv_open_inherit (filename=0x0, reference=0x0, options=0x55d9541cdff0, flags=0, parent=0x0, child_role=0x0, errp=0x7ffd8e19e908) at block.c:3011 #8 0x000055d950766dba in bdrv_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block.c:3156 #9 0x000055d9507cb635 in blk_new_open (filename=0x0, reference=0x0, options=0x55d953d00390, flags=0, errp=0x7ffd8e19e908) at block/block-backend.c:389 #10 0x000055d950465ec5 in blockdev_init (file=0x0, bs_opts=0x55d953d00390, errp=0x7ffd8e19e908) at blockdev.c:602 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Message-id: 927aae40b617ba7d4b6c7ffe74e6d7a2595f8e86.1562770546.git.mprivozn@redhat.com Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
* Include qemu/module.h where needed, drop it from qemu-common.hMarkus Armbruster2019-06-121-0/+1
| | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]
* block/nvme: Fix bdrv_refresh_filename()Max Reitz2019-02-251-14/+9Star
| | | | | | | | | | | | | | | | Currently, nvme's bdrv_refresh_filename() is an exact copy of null's implementation. However, for null, "null-co://" and "null-aio://" are indeed valid filenames -- for nvme, they are not, as a device address is still required. The correct implementation should generate a filename of the form "nvme://[PCI address]/[namespace]" (as the comment above nvme_parse_filename() describes). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-27-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Purify .bdrv_refresh_filename()Max Reitz2019-02-251-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, BlockDriver.bdrv_refresh_filename() is supposed to both refresh the filename (BDS.exact_filename) and set BDS.full_open_options. Now that we have generic code in the central bdrv_refresh_filename() for creating BDS.full_open_options, we can drop the latter part from all BlockDriver.bdrv_refresh_filename() implementations. This also means that we can drop all of the existing default code for this from the global bdrv_refresh_filename() itself. Furthermore, we now have to call BlockDriver.bdrv_refresh_filename() after having set BDS.full_open_options, because the block driver's implementation should now be allowed to depend on BDS.full_open_options being set correctly. Finally, with this patch we can drop the @options parameter from BlockDriver.bdrv_refresh_filename(); also, add a comment on this function's purpose in block/block_int.h while touching its interface. This completely obsoletes blklogwrite's implementation of .bdrv_refresh_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-25-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* block: Add strong_runtime_opts to BlockDriverMax Reitz2019-02-251-0/+8
| | | | | | | | | | | | | | | | | | This new field can be set by block drivers to list the runtime options they accept that may influence the contents of the respective BDS. As of a follow-up patch, this list will be used by the common bdrv_refresh_filename() implementation to decide which options to put into BDS.full_open_options (and consequently whether a JSON filename has to be created), thus freeing the drivers of having to implement that logic themselves. Additionally, this patch adds the field to all of the block drivers that need it and sets it accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-22-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
* block/nvme: Remove QEMU_PACKED from naturally aligned NVMeRegs structThomas Huth2019-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | The QEMU_PACKED is causing a compiler warning/error with GCC 9: CC block/nvme.o block/nvme.c: In function ‘nvme_create_queue_pair’: block/nvme.c:209:22: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member] 209 | q->sq.doorbell = &s->regs->doorbells[idx * 2 * s->doorbell_scale]; All members of the struct are naturally aligned, so there should not be the need for QEMU_PACKED here, and the following QEMU_BUILD_BUG_ON also ensures that there is no padding. Thus simply remove the QEMU_PACKED here. Buglink: https://bugs.launchpad.net/qemu/+bug/1817525 Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Fix hangs in synchronous APIs with iothreadsKevin Wolf2019-02-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the block layer, synchronous APIs are often implemented by creating a coroutine that calls the asynchronous coroutine-based implementation and then waiting for completion with BDRV_POLL_WHILE(). For this to work with iothreads (more specifically, when the synchronous API is called in a thread that is not the home thread of the block device, so that the coroutine will run in a different thread), we must make sure to call aio_wait_kick() at the end of the operation. Many places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if the condition has long become false. Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is generally not enough for most other operations because they haven't set the return value in the coroutine entry stub yet. To avoid race conditions there, we need to kick after setting the return value. The race window is small enough that the problem doesn't usually surface in the common path. However, it does surface and causes easily reproducible hangs if the operation can return early before even calling bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op success paths). The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is slightly different: These functions even neglected to schedule the coroutine in the home thread of the node. This avoids the hang, but is obviously wrong, too. Fix those to schedule the coroutine in the right AioContext in addition to adding aio_wait_kick() calls. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/nvme: optimize the performance of nvme driver based on vfio-pciLi Feng2019-01-091-10/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the IO size is larger than 2 pages, we move the the pointer one by one in the pagelist, this is inefficient. This is a simple benchmark result: Before: $ qemu-io -c 'write 0 1G' nvme://0000:00:04.0/1 wrote 1073741824/1073741824 bytes at offset 0 1 GiB, 1 ops; 0:00:02.41 (424.504 MiB/sec and 0.4146 ops/sec) $ qemu-io -c 'read 0 1G' nvme://0000:00:04.0/1 read 1073741824/1073741824 bytes at offset 0 1 GiB, 1 ops; 0:00:02.03 (503.055 MiB/sec and 0.4913 ops/sec) After: $ qemu-io -c 'write 0 1G' nvme://0000:00:04.0/1 wrote 1073741824/1073741824 bytes at offset 0 1 GiB, 1 ops; 0:00:02.17 (471.517 MiB/sec and 0.4605 ops/sec) $ qemu-io -c 'read 0 1G' nvme://0000:00:04.0/1 read 1073741824/1073741824 bytes at offset 0 1 GiB, 1 ops; 0:00:01.94 (526.770 MiB/sec and 0.5144 ops/sec) Signed-off-by: Li Feng <lifeng1519@gmail.com> Message-Id: <20181101103807.25862-1-lifeng1519@gmail.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* nvme: correct locking around completionPaolo Bonzini2018-10-121-2/+0Star
| | | | | | | | | | nvme_poll_queues is already protected by q->lock, and AIO callbacks are invoked outside the AioContext lock. So remove the acquire/release pair in nvme_handle_event. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180814062739.19640-1-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* nvme: simplify plug/unplugPaolo Bonzini2018-08-151-10/+10
| | | | | | | | | bdrv_io_plug/bdrv_io_unplug take care of keeping a nesting count, so change s->plugged to just a bool. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180813144320.12382-2-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* nvme: Fix nvme_init error handlingFam Zheng2018-08-151-25/+12Star
| | | | | | | | | | | | | | | | | | | | | It is wrong to leave this field as 1, as nvme_close() called in the error handling code in nvme_file_open() will use it and try to free s->queues again. Another problem is the cleaning ups are duplicated between the fail* labels of nvme_init() and nvme_file_open(), which calls nvme_close(). A third problem is nvme_close() misses g_free() and event_notifier_cleanup(). Fix all of them. Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180712025420.4932-1-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* qobject: Modify qobject_ref() to return objMarc-André Lureau2018-05-041-2/+1Star
| | | | | | | | | | | | | | | For convenience and clarity, make it possible to call qobject_ref() at the time when the reference is associated with a variable, or argument, by making qobject_ref() return the same pointer as given. Use that to simplify the callers. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180419150145.24795-5-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Useless change to qobject_ref_impl() dropped, commit message improved slightly] Signed-off-by: Markus Armbruster <armbru@redhat.com>
* qobject: Replace qobject_incref/QINCREF qobject_decref/QDECREFMarc-André Lureau2018-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | Now that we can safely call QOBJECT() on QObject * as well as its subtypes, we can have macros qobject_ref() / qobject_unref() that work everywhere instead of having to use QINCREF() / QDECREF() for QObject and qobject_incref() / qobject_decref() for its subtypes. The replacement is mechanical, except I broke a long line, and added a cast in monitor_qmp_cleanup_req_queue_locked(). Unlike qobject_decref(), qobject_unref() doesn't accept void *. Note that the new macros evaluate their argument exactly once, thus no need to shout them. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Rebased, semantic conflict resolved, commit message improved] Signed-off-by: Markus Armbruster <armbru@redhat.com>
* qdict: remove useless castLaurent Vivier2018-03-271-6/+5Star
| | | | | | | | | | | Re-run Coccinelle script scripts/coccinelle/qobject.cocci Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180323143202.28879-5-lvivier@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* nvme: Drop pointless .bdrv_co_get_block_status()Eric Blake2018-03-021-14/+0Star
| | | | | | | | | | | | | | | | | | | | Commit bdd6a90 has a bug: drivers should never directly set BDRV_BLOCK_ALLOCATED, but only io.c should do that (as needed). Instead, drivers should report BDRV_BLOCK_DATA if it knows that data comes from this BDS. But let's look at the bigger picture: semantically, the nvme driver is similar to the nbd, null, and raw drivers (no backing file, all data comes from this BDS). But while two of those other drivers have to supply the callback (null because it can special-case BDRV_BLOCK_ZERO, raw because it can special-case a different offset), in this case the block layer defaults are good enough without the callback at all (similar to nbd). So, fix the bug by deletion ;) Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/nvme: fix Coverity reportsPaolo Bonzini2018-03-011-3/+7
| | | | | | | | | | | | 1) string not null terminated in sysfs_find_group_file 2) NULL pointer dereference and dead local variable in nvme_init. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180213015240.9352-1-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* Move include qemu/option.h from qemu-common.h to actual usersMarkus Armbruster2018-02-091-0/+1
| | | | | | | | | | | | | | | | | | qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
* block: Move NVMe constants to a separate headerFam Zheng2018-02-081-6/+1Star
| | | | | | | Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-8-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block/nvme: Implement .bdrv_(un)register_bufFam Zheng2018-02-081-0/+24
| | | | | | | | | Forward these two calls to the IOVA manager. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-6-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* block: Add VFIO based NVMe driverFam Zheng2018-02-081-0/+1182
This is a new protocol driver that exclusively opens a host NVMe controller through VFIO. It achieves better latency than linux-aio by completely bypassing host kernel vfs/block layer. $rw-$bs-$iodepth linux-aio nvme:// ---------------------------------------- randread-4k-1 10.5k 21.6k randread-512k-1 745 1591 randwrite-4k-1 30.7k 37.0k randwrite-512k-1 1945 1980 (unit: IOPS) The driver also integrates with the polling mechanism of iothread. This patch is co-authored by Paolo and me. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20180116060901.17413-4-famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>