summaryrefslogtreecommitdiffstats
path: root/block/linux-aio.c
Commit message (Collapse)AuthorAgeFilesLines
* block: explicitly acquire aiocontext in aio callbacks that need itPaolo Bonzini2017-02-211-4/+1Star
| | | | | | | | | Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-16-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: explicitly acquire aiocontext in bottom halves that need itPaolo Bonzini2017-02-211-6/+9
| | | | | | | | | Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-15-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: explicitly acquire aiocontext in callbacks that need itPaolo Bonzini2017-02-211-0/+4
| | | | | | | | | | | | This covers both file descriptor callbacks and polling callbacks, since they execute related code. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-14-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: poll ring for completionsStefan Hajnoczi2017-01-031-1/+16
| | | | | | | | | | | | | The Linux AIO userspace ABI includes a ring that is shared with the kernel. This allows userspace programs to process completions without system calls. Add an AioContext poll handler to check for completions in the ring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* aio: add AioPollFn and io_poll() interfaceStefan Hajnoczi2017-01-031-2/+2
| | | | | | | | | | | | The new AioPollFn io_poll() argument to aio_set_fd_handler() and aio_set_event_handler() is used in the next patch. Keep this code change separate due to the number of files it touches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: fix re-entrant completion processingStefan Hajnoczi2016-09-281-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process completions from ioq_submit()") added an optimization that processes completions each time ioq_submit() returns with requests in flight. This commit introduces a "Co-routine re-entered recursively" error which can be triggered with -drive format=qcow2,aio=native. Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I debugged the following backtrace: (gdb) bt #0 0x00007ffff0a046f5 in raise () at /lib64/libc.so.6 #1 0x00007ffff0a062fa in abort () at /lib64/libc.so.6 #2 0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113 #3 0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218 #4 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331 #5 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383 #6 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402 #7 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804 #8 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041 #9 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133 #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509 #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804 #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041 #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133 #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783 #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991 #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78 It turned out that re-entrant ioq_submit() and completion processing between three requests caused this error. The following check is not sufficient to prevent recursively entering coroutines: if (laiocb->co != qemu_coroutine_self()) { qemu_coroutine_enter(laiocb->co); } As the following coroutine backtrace shows, not just the current coroutine (self) can be entered. There might also be other coroutines that are currently entered and transferred control due to the qcow2 lock (CoMutex): (gdb) qemu coroutine 0x5555583464d0 #0 0x0000555555ac0c90 in qemu_coroutine_switch (from_=from_@entry=0x5555583464d0, to_=to_@entry=0x5555572f9890, action=action@entry=COROUTINE_ENTER) at util/coroutine-ucontext.c:175 #1 0x0000555555abfe54 in qemu_coroutine_enter (co=0x5555572f9890) at util/qemu-coroutine.c:117 #2 0x0000555555ac031c in qemu_co_queue_run_restart (co=co@entry=0x5555583462c0) at util/qemu-coroutine-lock.c:60 #3 0x0000555555abfe5e in qemu_coroutine_enter (co=0x5555583462c0) at util/qemu-coroutine.c:119 #4 0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218 #5 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331 #6 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x55555a338b40, offset=offset@entry=2911477760, type=type@entry=1) at block/linux-aio.c:383 #7 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2911477760, qiov=0x55555a338e80, type=1) at block/linux-aio.c:402 #8 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2911477760, bytes=bytes@entry=8192, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:804 #9 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x55555a338d80, offset=offset@entry=2911477760, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:1041 #10 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2911477760, bytes=8192, qiov=qiov@entry=0x55555a338e80, flags=flags@entry=0) at block/io.c:1133 #11 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=<optimized out>) at block/qcow2.c:1509 #12 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:804 #13 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x55555a339060, offset=offset@entry=6157475840, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:1041 #14 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=flags@entry=0) at block/io.c:1133 #15 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=0) at block/block-backend.c:783 #16 0x0000555555a45266 in blk_aio_read_entry (opaque=0x555557231aa0) at block/block-backend.c:991 #17 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78 Use the new qemu_coroutine_entered() function instead of comparing against qemu_coroutine_self(). This is correct because: 1. If a coroutine is not entered then it must have yielded to wait for I/O completion. It is therefore safe to enter. 2. If a coroutine is entered then it must be in ioq_submit()/qemu_laio_process_completions() because otherwise it would be yielded while waiting for I/O completion. Therefore it will check laio->ret and return from ioq_submit() instead of yielding, i.e. it's guaranteed not to hang. Reported-by: Fam Zheng <famz@redhat.com> Tested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1474989516-18255-4-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: process completions from ioq_submit()Roman Pen2016-09-131-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to reduce completion latency it makes sense to harvest completed requests ASAP. Very fast backend device can complete requests just after submission, so it is worth trying to check ring buffer in order to peek completed requests directly after io_submit() has been called. Indeed, this patch reduces the completions latencies and increases the overall throughput, e.g. the following is the percentiles of number of completed requests at once: 1th 10th 20th 30th 40th 50th 60th 70th 80th 90th 99.99th Before 2 4 42 112 128 128 128 128 128 128 128 After 1 1 4 14 33 45 47 48 50 51 108 That means, that before the current patch is applied the ring buffer is observed as full (128 requests were consumed at once) in 60% of calls. After patch is applied the distribution of number of completed requests is "smoother" and the queue (requests in-flight) is almost never full. The fio read results are the following (write results are almost the same and are not showed here): Before ------ job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016 Description : [Emulation of Storage Server Access Pattern] read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec slat (usec): min=172, max=16883, avg=338.35, stdev=109.66 clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29 lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73 clat percentiles (usec): | 1.00th=[ 346], 5.00th=[ 596], 10.00th=[ 708], 20.00th=[ 852], | 30.00th=[ 932], 40.00th=[ 996], 50.00th=[ 1048], 60.00th=[ 1112], | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496], | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672], | 99.99th=[ 4704] bw (KB /s): min=205229, max=553181, per=12.50%, avg=233278.26, stdev=18383.51 After ------ job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016 Description : [Emulation of Storage Server Access Pattern] read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec slat (usec): min=169, max=20636, avg=329.61, stdev=124.18 clat (usec): min=2, max=19592, avg=988.78, stdev=251.04 lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58 clat percentiles (usec): | 1.00th=[ 310], 5.00th=[ 580], 10.00th=[ 748], 20.00th=[ 876], | 30.00th=[ 908], 40.00th=[ 948], 50.00th=[ 1012], 60.00th=[ 1064], | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288], | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256], | 99.99th=[ 5408] bw (KB /s): min=212149, max=390160, per=12.49%, avg=245746.04, stdev=11606.75 Throughput increased from 1822MB/s to 1921MB/s, average completion latencies decreased from 1051us to 988us. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Message-id: 1468931263-32667-4-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: split processing events functionRoman Pen2016-09-131-10/+21
| | | | | | | | | | | | | Prepare processing events function to be called from ioq_submit(), thus split function on two parts: the first harvests completed IO requests, the second submits pending requests. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Message-id: 1468931263-32667-3-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: consume events in userspace instead of calling io_geteventsRoman Pen2016-09-131-26/+99
| | | | | | | | | | | | | | | AIO context in userspace is represented as a simple ring buffer, which can be consumed directly without entering the kernel, which obviously can bring some performance gain. QEMU does not use timeout value for waiting for events completions, so we can consume all events from userspace. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Message-id: 1468931263-32667-2-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: Handle io_submit() failure gracefullyKevin Wolf2016-08-111-1/+7
| | | | | | | | | | | It is generally not expected that io_submit() fails other than with -EAGAIN, but corner cases like SELinux refusing I/O when permissions are revoked are still possible. In this case, we shouldn't abort, but just return an I/O error for the request. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1470741619-23231-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: prevent submitting more than MAX_EVENTSRoman Pen2016-07-181-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us with specified number of events. But kernel ring buffer allocation logic is a bit tricky (ring buffer is page size aligned + some percpu allocation are required) so eventually more than requested events number is allocated. From a userspace side we have to follow the convention and should not try to io_submit() more or logic, which consumes completed events, should be changed accordingly. The pitfall is in the following sequence: MAX_EVENTS = 128 io_setup(MAX_EVENTS) io_submit(MAX_EVENTS) io_submit(MAX_EVENTS) /* now 256 events are in-flight */ io_getevents(MAX_EVENTS) = 128 /* we can handle only 128 events at once, to be sure * that nothing is pended the io_getevents(MAX_EVENTS) * call must be invoked once more or hang will happen. */ To prevent the hang or reiteration of io_getevents() call this patch restricts the number of in-flights, which is now limited to MAX_EVENTS. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: share one LinuxAioState within an AioContextPaolo Bonzini2016-07-181-4/+6
| | | | | | | | | | | | | | | | | | This has better performance because it executes fewer system calls and does not use a bottom half per disk. Originally proposed by Ming Lei. [Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix build error as reported by Peter Maydell <peter.maydell@linaro.org>. --Stefan] Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> squash! linux-aio: share one LinuxAioState within an AioContext
* coroutine: move entry argument to qemu_coroutine_createPaolo Bonzini2016-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine *co = qemu_coroutine_create(entry); + Coroutine *co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: fix return code for partial write for Linux AIODenis V. Lunev2016-07-051-1/+1
| | | | | | | | | | | | | | | | Partial write most likely means that there is not space rather than "something wrong happens". Thus it would be more natural to return ENOSPC rather than EINVAL. The problem actually happens with NBD server, which has reported EINVAL rather then ENOSPC on the first error using its protocol, which makes report to the user wrong. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* linux-aio: Cancel BH if not neededKevin Wolf2016-06-161-1/+3
| | | | | | | | | | | | | | | | linux-aio uses a BH in order to make sure that the remaining completions are processed even in nested event loops of completion callbacks in order to avoid deadlocks. There is no need, however, to have the BH overhead for the first call into qemu_laio_completion_bh() or after all pending completions have already been processed. Therefore, this patch calls directly into qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels the BH after qemu_laio_completion_bh() has processed all pending completions. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
* raw-posix: Implement .bdrv_co_preadv/pwritevKevin Wolf2016-06-161-5/+2Star
| | | | | | | | | | | | | The raw-posix block driver actually supports byte-aligned requests now on non-O_DIRECT images, like it already (and previously incorrectly) claimed in bs->request_alignment. For some block drivers this means that a RMW cycle can be avoided when they write sub-sector metadata e.g. for cluster allocation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* raw-posix: Switch to bdrv_co_* interfacesKevin Wolf2016-06-161-22/+65
| | | | | | | | | | | | | | | | | | In order to use the modern byte-based .bdrv_co_preadv/pwritev() interface, this patch switches raw-posix to coroutine-based interfaces as a first step. In terms of semantics and performance, it doesn't make a difference with the existing code whether we go from a coroutine to a callback-based interface already in block/io.c or only in linux-aio.c As there have been concerns in the past that this change may be a step in the wrong direction with respect to a possible AIO fast path, the old callback-based interface for linux-aio is left around and can be reactivated when a fast path (e.g. directly from virtio-blk dataplane, bypassing the whole block layer) is implemented. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: make it more type safePaolo Bonzini2016-05-121-29/+17Star
| | | | | | | | Replace void* with an opaque LinuxAioState type. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: plug whole tree at once, introduce bdrv_io_unplugged_begin/endPaolo Bonzini2016-05-121-8/+5Star
| | | | | | | | | | | | | | Extract the handling of io_plug "depth" from linux-aio.c and let the main bdrv_drain loop do nothing but wait on I/O. Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug now operate on all children. The visit order is now symmetrical between plug and unplug, making it possible for formats to implement plug/unplug. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Clean up includesPeter Maydell2016-01-201-0/+1
| | | | | | | | | | | Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* aio: Add "is_external" flag for event handlersFam Zheng2015-10-231-2/+3
| | | | | | | | | | All callers pass in false, and the real external ones will switch to true in coming patches. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* linux-aio: simplify removal of completed iocbs from the listPaolo Bonzini2014-12-121-6/+6
| | | | | | | | | | | There is no need to do another O(n) pass on the list; the iocb to split the list at is already available through the array we passed to io_submit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-6-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: drop return code from laio_io_unplug and ioq_submitPaolo Bonzini2014-12-121-10/+5Star
| | | | | | | | | | These are unused. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-5-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: rename LaioQueue idx field to "n"Paolo Bonzini2014-12-121-6/+6
| | | | | | | | | It does not identify an index in an array anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-4-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: track whether the queue is blockedPaolo Bonzini2014-12-121-20/+27
| | | | | | | | | | | | | | | | | Avoid that unplug submits requests when io_submit reported that it couldn't accept more; at the same time, try more io_submit calls if it could handle the whole set of requests that were passed, so that the "blocked" flag is reset as soon as possible. After the previous patch, laio_submit already tried to avoid submitting requests to a blocked queue, by comparing s->io_q.idx with "==" instead of the more natural ">=". Switch to the simpler expression now that we have the "blocked" flag. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-3-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: queue requests that cannot be submittedPaolo Bonzini2014-12-121-42/+33Star
| | | | | | | | | | | | | | Keep a queue of requests that were not submitted; pass them to the kernel when a completion is reported, unless the queue is plugged. The array of iocbs is rebuilt every time from scratch. This avoids keeping the iocbs array and list synchronized. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1418305950-30924-2-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: Rename BlockDriverCompletionFunc to BlockCompletionFuncMarkus Armbruster2014-10-201-1/+1
| | | | | | | | | | I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Rename BlockDriverAIOCB* to BlockAIOCB*Markus Armbruster2014-10-201-3/+3
| | | | | | | | | | I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Rename qemu_aio_release -> qemu_aio_unrefFam Zheng2014-09-221-2/+2
| | | | | | Suggested-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: Convert laio_aiocb_info.cancel to .cancel_asyncFam Zheng2014-09-221-22/+8Star
| | | | | | | | | | | | Just call io_cancel (2), if it fails, it means the request is not canceled, so the event loop will eventually call qemu_laio_process_completion. In qemu_laio_process_completion, change to call the cb unconditionally. It is required by bdrv_aio_cancel_async. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: avoid deadlock in nested aio_poll() callsStefan Hajnoczi2014-08-291-16/+55
| | | | | | | | | | | | | | | | | | | If two Linux AIO request completions are fetched in the same io_getevents() call, QEMU will deadlock if request A's callback waits for request B to complete using an aio_poll() loop. This was reported to happen with the mirror blockjob. This patch moves completion processing into a BH and makes it resumable. Nested event loops can resume completion processing so that request B will complete and the deadlock will not occur. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Marcin Gibuła <m.gibula@beyond.pl> Reported-by: Marcin Gibuła <m.gibula@beyond.pl> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Marcin Gibuła <m.gibula@beyond.pl>
* linux-aio: Fix laio resource leakGonglei2014-07-151-0/+5
| | | | | | | | | | | | | | | | | when hotplug virtio-scsi disks using laio, the aio_nr will increase in laio_init() by io_setup(), we can see the number by # cat /proc/sys/fs/aio-nr 128 if the aio_nr attach the maxnum, which found from # cat /proc/sys/fs/aio-max-nr 65536 the hotplug process will fail because of aio context leak. Fix it by io_destroy in laio_cleanup(). Reported-by: daifulai <daifulai@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* linux-aio: implement io plug, unplug and flush io queueMing Lei2014-07-071-2/+94
| | | | | | | | | | | | | This patch implements .bdrv_io_plug, .bdrv_io_unplug and .bdrv_flush_io_queue callbacks for linux-aio Block Drivers, so that submitting I/O as a batch can be supported on linux-aio. [Unprocessed requests are completed with -EIO instead of a bogus ret value. --Stefan] Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/linux-aio: fix memory and fd leakStefan Hajnoczi2014-06-041-0/+8
| | | | | | | | | | | Hot unplugging -drive aio=native,file=test.img,format=raw images leaves the Linux AIO event notifier and struct qemu_laio_state allocated. Luckily nothing will use the event notifier after the BlockDriverState has been closed so the handler function is never called. It's still worth fixing this resource leak. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/raw-posix: implement .bdrv_detach/attach_aio_context()Stefan Hajnoczi2014-06-041-2/+14
| | | | | | | | | | | Drop the assumption that we're using the main AioContext for Linux AIO. Convert the Linux AIO event notifier to use aio_set_event_notifier(). The .bdrv_detach/attach_aio_context() interfaces also need to be implemented to move the event notifier handler from the old to the new AioContext. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* aio: drop io_flush argumentStefan Hajnoczi2013-08-191-2/+1Star
| | | | | | | | | | | The .io_flush() handler no longer exists and has no users. Drop the io_flush argument to aio_set_fd_handler() and related functions. The AioFlushEventNotifierHandler and AioFlushHandler typedefs are no longer used and are dropped too. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/linux-aio: drop qemu_laio_completion_cb()Stefan Hajnoczi2013-08-191-15/+2Star
| | | | | | | .io_flush() is no longer called so drop qemu_laio_completion_cb(). It turns out that count is now unused so drop that too. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* misc: move include files to include/qemu/Paolo Bonzini2012-12-191-2/+2
| | | | Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* block: move include files to include/block/Paolo Bonzini2012-12-191-1/+1
| | | | Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* aio: rename AIOPool to AIOCBInfoStefan Hajnoczi2012-11-141-2/+2
| | | | | | | | | Now that AIOPool no longer keeps a freelist, it isn't really a "pool" anymore. Rename it to AIOCBInfo and make it const since it no longer needs to be modified. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* raw-posix: move linux-aio.c to block/Paolo Bonzini2012-10-311-0/+216
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>