summaryrefslogtreecommitdiffstats
path: root/migration/trace-events
Commit message (Collapse)AuthorAgeFilesLines
* migration: support UFFD write fault processing in ram_save_iterate()Andrey Gruzdev2021-02-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | In this particular implementation the same single migration thread is responsible for both normal linear dirty page migration and procesing UFFD page fault events. Processing write faults includes reading UFFD file descriptor, finding respective RAM block and saving faulting page to the migration stream. After page has been saved, write protection can be removed. Since asynchronous version of qemu_put_buffer() is expected to be used to save pages, we also have to flush migraion stream prior to un-protecting saved memory range. Write protection is being removed for any previously protected memory chunk that has hit the migration stream. That's valid for pages from linear page scan along with write fault pages. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210129101407.103458-4-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> fixup pagefault.address cast for 32bit
* migration: Sync requested pages after postcopy recoveryPeter Xu2020-10-261-0/+1
| | | | | | | | | | | | We synchronize the requested pages right after a postcopy recovery happens. This helps to synchronize the prioritized pages on source so that the faulted threads can be served faster. Reported-by: Xiaohui Li <xiaohli@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-5-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Maintain postcopy faulted addressesPeter Xu2020-10-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Maintain a list of faulted addresses on the destination host for which we're waiting on. This is implemented using a GTree rather than a real list to make sure even there're plenty of vCPUs/threads that are faulting, the lookup will still be fast with O(log(N)) (because we'll do that after placing each page). It should bring a slight overhead, but ideally that shouldn't be a big problem simply because in most cases the requested page list will be short. Actually we did similar things for postcopy blocktime measurements. This patch didn't use that simply because: (1) blocktime measurement is towards vcpu threads only, but here we need to record all faulted addresses, including main thread and external thread (like, DPDK via vhost-user). (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we don't want to add that extra dependency on the kernel version since not necessary. E.g., we don't need to know which thread faulted on which page, we also don't care about multiple threads faulting on the same page. But we only care about what addresses are faulted so waiting for a page copying from src. (3) blocktime measurement is not enabled by default. However we need this by default especially for postcopy recover. Another thing to mention is that this patch introduced a new mutex to serialize the receivedmap and the page_requested tree, however that serialization does not cover other procedures like UFFDIO_COPY. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-4-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: using trace_ to replace DPRINTFBihong Yu2020-10-261-0/+13
| | | | | | | Signed-off-by: Bihong Yu <yubihong@huawei.com> Message-Id: <1603179176-5360-1-git-send-email-yubihong@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/tls: add trace points for multifd-tlsChuan Zheng2020-09-251-0/+4
| | | | | | | | | | add trace points for multifd-tls for debug. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <1600139042-104593-7-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/dirtyrate: Add trace_calls to make it easier to debugChuan Zheng2020-09-251-0/+8
| | | | | | | | | | Add trace_calls to make it easier to debug Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <1600237327-33618-13-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* trace-events: Fix attribution of trace points to sourceMarkus Armbruster2020-09-091-17/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some trace points are attributed to the wrong source file. Happens when we neglect to update trace-events for code motion, or add events in the wrong place, or misspell the file name. Clean up with help of scripts/cleanup-trace-events.pl. Funnies requiring manual post-processing: * accel/tcg/cputlb.c trace points are in trace-events. * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * hw/tpm/tpm_spapr.c uses pseudo trace point tpm_spapr_show_buffer to guard debug code. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * linux-user/trace-events abbreviates a tedious list of filenames to */signal.c. * net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200806141334.3646302-5-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* trace-events: Delete unused trace pointsMarkus Armbruster2020-09-091-1/+0Star
| | | | | | | | Tracked down with the help of scripts/cleanup-trace-events.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200806141334.3646302-4-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* migration: Support QLIST migrationEric Auger2020-01-201-0/+5
| | | | | | | | | | | | | | | | Support QLIST migration using the same principle as QTAILQ: 94869d5c52 ("migration: migrate QTAILQ"). The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V. The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD and QLIST_RAW_REVERSE. Tests also are provided. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Rate limit inside host pagesDr. David Alan Gilbert2020-01-201-2/+2
| | | | | | | | | | | | | When using hugepages, rate limiting is necessary within each huge page, since a 1G huge page can take a significant time to send, so you end up with bursty behaviour. Fixes: 4c011c37ecb3 ("postcopy: Send whole huge pages") Reported-by: Lin Ma <LMa@suse.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Support gtree migrationEric Auger2019-10-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Introduce support for GTree migration. A custom save/restore is implemented. Each item is made of a key and a data. If the key is a pointer to an object, 2 VMSDs are passed into the GTree VMStateField. When putting the items, the tree is traversed in sorted order by g_tree_foreach. On the get() path, gtrees must be allocated using the proper key compare, key destroy and value destroy. This must be handled beforehand, for example in a pre_load method. Tests are added to test save/dump of structs containing gtrees including the virtio-iommu domain/mappings scenario. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20191011121724.433-1-eric.auger@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> uintptr_t fixup for test on 32bit
* migration: remove sent parameter in get_queued_page_not_dirtyWei Yang2019-09-251-1/+1
| | | | | | | | | | | This is a cleanup for previous removal of unsentmap. The sent parameter is not necessary now. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190819061843.28642-4-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: add some multifd tracesJuan Quintela2019-08-141-0/+4
| | | | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190814020218.1868-6-quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Add traces for multifd terminate threadsJuan Quintela2019-08-141-0/+2
| | | | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190814020218.1868-2-quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Split log_clear() into smaller chunksPeter Xu2019-07-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we are doing log_clear() right after log_sync() which mostly keeps the old behavior when log_clear() was still part of log_sync(). This patch tries to further optimize the migration log_clear() code path to split huge log_clear()s into smaller chunks. We do this by spliting the whole guest memory region into memory chunks, whose size is decided by MigrationState.clear_bitmap_shift (an example will be given below). With that, we don't do the dirty bitmap clear operation on the remote node (e.g., KVM) when we fetch the dirty bitmap, instead we explicitly clear the dirty bitmap for the memory chunk for each of the first time we send a page in that chunk. Here comes an example. Assuming the guest has 64G memory, then before this patch the KVM ioctl KVM_CLEAR_DIRTY_LOG will be a single one covering 64G memory. If after the patch, let's assume when the clear bitmap shift is 18, then the memory chunk size on x86_64 will be 1UL<<18 * 4K = 1GB. Then instead of sending a big 64G ioctl, we'll send 64 small ioctls, each of the ioctl will cover 1G of the guest memory. For each of the 64 small ioctls, we'll only send if any of the page in that small chunk was going to be sent right away. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190603065056.25211-12-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: fix multifd_recv event typoJuan Quintela2019-07-151-1/+1
| | | | | | | | | It uses num in multifd_send(). Make it coherent. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* Merge remote-tracking branch ↵Peter Maydell2019-03-251-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'remotes/juanquintela/tags/migration-pull-request' into staging Pull request - Rebase last pull request - Drop multifd - several other minor fixesLaLaLa # gpg: Signature made Mon 25 Mar 2019 17:46:29 GMT # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full] # gpg: aka "Juan Quintela <quintela@trasno.org>" [full] # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration-pull-request: migration/postcopy: Update the bandwidth during postcopy Migration/colo.c: Make user obtain the last COLO mode info after failover Migration/colo.c: Add the necessary checks for colo_do_failover Migration/colo.c: Add new COLOExitReason to handle all failover state Migration/colo.c: Fix COLO failover status error migration/rdma: Check qemu_rdma_init_one_block migration: add support for a "tls-authz" migration parameter multifd: Drop x- multifd: Add some padding multifd: Change default packet size multifd: Be flexible about packet size multifd: Drop x-multifd-page-count parameter multifd: Create new next_packet_size field multifd: Rename "size" member to pages_alloc multifd: Only send pages when packet are not empty Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * multifd: Create new next_packet_size fieldJuan Quintela2019-03-251-2/+2
| | | | | | | | | | | | | | We need to send this field when we add compression support. As we are still on x- stage, we can do this kind of changes. Signed-off-by: Juan Quintela <quintela@redhat.com>
* | trace-events: Fix attribution of trace points to sourceMarkus Armbruster2019-03-221-18/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some trace points are attributed to the wrong source file. Happens when we neglect to update trace-events for code motion, or add events in the wrong place, or misspell the file name. Clean up with help of cleanup-trace-events.pl. Same funnies as in the previous commit, of course. Manually shorten its change to linux-user/trace-events to */signal.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190314180929.27722-6-armbru@redhat.com Message-Id: <20190314180929.27722-6-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* | trace-events: Shorten file names in commentsMarkus Armbruster2019-03-221-13/+13
|/ | | | | | | | | | | | | | | We spell out sub/dir/ in sub/dir/trace-events' comments pointing to source files. That's because when trace-events got split up, the comments were moved verbatim. Delete the sub/dir/ part from these comments. Gets rid of several misspellings. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190314180929.27722-3-armbru@redhat.com Message-Id: <20190314180929.27722-3-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* migration: Switch to using announce timerDr. David Alan Gilbert2019-03-051-1/+0Star
| | | | | | | | | | | | | | Switch the announcements to using the new announce timer. Move the code that does it to announce.c rather than savevm because it really has nothing to do with the actual migration. Migration starts the announce from bh's and so they're all in the main thread/bql, and so there's never any racing with the timers themselves. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* COLO: Flush memory data from ram cacheZhang Chen2018-10-191-0/+2
| | | | | | | | | | | | | | | | | | | During the time of VM's running, PVM may dirty some pages, we will transfer PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint time. So, the content of SVM's RAM cache will always be same with PVM's memory after checkpoint. Instead of flushing all content of PVM's RAM cache into SVM's MEMORY, we do this in a more efficient way: Only flush any page that dirtied by PVM since last checkpoint. In this way, we can ensure SVM's memory same with PVM's. Besides, we must ensure flush RAM cache before load device state. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* COLO: Remove colo_state migration structZhang Chen2018-10-191-0/+1
| | | | | | | | | | | | | | | We need to know if migration is going into COLO state for incoming side before start normal migration. Instead by using the VMStateDescription to send colo_state from source side to destination side, we use MIG_CMD_ENABLE_COLO to indicate whether COLO is enabled or not. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20180627' ↵Peter Maydell2018-06-281-0/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging migration/next for 20180627 # gpg: Signature made Wed 27 Jun 2018 13:53:53 BST # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20180627: migration: fix crash in when incoming client channel setup fails postcopy: drop ram_pages parameter from postcopy_ram_incoming_init() migration: Stop sending whole pages through main channel migration: Remove not needed semaphore and quit migration: Wait for blocking IO migration: Start sending messages migration: Create ram_save_multifd_page migration: Create multifd_bytes ram_counter migration: Synchronize multifd threads with main thread migration: Add block where to send/receive packets migration: Multifd channels always wait on the sem migration: Add multifd traces for start/end thread migration: Abstract the number of bytes sent migration: Calculate mbps only during transfer time migration: Create multifd packet migration: Create multipage support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * migration: Synchronize multifd threads with main threadJuan Quintela2018-06-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We synchronize all threads each RAM_SAVE_FLAG_EOS. Bitmap synchronizations don't happen inside a ram section, so we are safe about two channels trying to overwrite the same memory. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- seq needs to be atomic now, will also be accessed from main thread. Fix the if (true || ...) leftover We are back to non-atomics
| * migration: Add block where to send/receive packetsJuan Quintela2018-06-271-0/+2
| | | | | | | | | | | | | | Once there add tracepoints. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
| * migration: Add multifd traces for start/end threadJuan Quintela2018-06-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | We want to know how many pages/packets each channel has sent. Add counters for those. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- sort trace-events (dave)
* | trace: forbid floating point typesStefan Hajnoczi2018-06-271-1/+1
|/ | | | | | | | | | | | | | | | | | | Only one existing trace event uses a floating point type. Unfortunately float and double cannot be supported since SystemTap does not have floating point types. Remove float and double from the whitelist and document this limitation. Update the migrate_transferred trace event to use uint64_t instead of double. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20180621150254.4922-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* migration: Wake rate limiting for urgent requestsDr. David Alan Gilbert2018-06-151-0/+2
| | | | | | | | | | | | | | Rate limiting sleeps the migration thread for a while when it runs out of bandwidth; but sometimes we want to wake up to get on with something more urgent (like a postcopy request). Here we use a semaphore with a timedwait instead of a simple sleep; Incrementing the sempahore will wake it up sooner. Anything that consumes these urgent events must decrement the sempahore. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnectLidong Chen2018-06-041-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cancel migration during RDMA precopy, the source qemu main thread hangs sometime. The backtrace is: (gdb) bt #0 0x00007f249eabd43d in write () from /lib64/libpthread.so.0 #1 0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189 #2 0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296 #3 0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999 #4 0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273 #5 0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98 #6 0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334 #7 0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162 #8 0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90 #9 0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118 #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436 #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0) at util/async.c:261 #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215 #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263 #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522 #16 0x00000000005bc6a5 in main_loop () at vl.c:1944 #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752 It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime. According to IB Spec once active side send DREQ message, it should wait for DREP message and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped due to network issues. For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger. Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events. So it should not invoke rdma_get_cm_event which may hang forever, and the event channel is also destroyed in qemu_rdma_cleanup. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: setup ramstate for resumePeter Xu2018-05-151-0/+1
| | | | | | | | | | After we updated the dirty bitmaps of ramblocks, we also need to update the critical fields in RAMState to make sure it is ready for a resume. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-18-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: synchronize dirty bitmap for resumePeter Xu2018-05-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | This patch implements the first part of core RAM resume logic for postcopy. ram_resume_prepare() is provided for the work. When the migration is interrupted by network failure, the dirty bitmap on the source side will be meaningless, because even the dirty bit is cleared, it is still possible that the sent page was lost along the way to destination. Here instead of continue the migration with the old dirty bitmap on source, we ask the destination side to send back its received bitmap, then invert it to be our initial dirty bitmap. The source side send thread will issue the MIG_CMD_RECV_BITMAP requests, once per ramblock, to ask for the received bitmap. On destination side, MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap. Data will be received on the return-path thread of source, and the main migration thread will be notified when all the ramblock bitmaps are synchronized. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-17-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: introduce SaveVMHandlers.resume_preparePeter Xu2018-05-151-0/+1
| | | | | | | | | | | This is hook function to be called when a postcopy migration wants to resume from a failure. For each module, it should provide its own recovery logic before we switch to the postcopy-active state. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-16-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: new message MIG_RP_MSG_RESUME_ACKPeter Xu2018-05-151-0/+1
| | | | | | | | | | | Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t is used as payload to let the source know whether destination is ready to continue the migration. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-15-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: new cmd MIG_CMD_POSTCOPY_RESUMEPeter Xu2018-05-151-0/+2
| | | | | | | | | | | Introducing this new command to be sent when the source VM is ready to resume the paused migration. What the destination does here is basically release the fault thread to continue service page faults. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-14-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: new message MIG_RP_MSG_RECV_BITMAPPeter Xu2018-05-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send received bitmap of ramblock back to source. This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only the header (including the ramblock name), and it was appended with the whole ramblock received bitmap on the destination side. When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP), it parses it, convert it to the dirty bitmap by inverting the bits. One thing to mention is that, when we send the recv bitmap, we are doing these things in extra: - converting the bitmap to little endian, to support when hosts are using different endianess on src/dst. - do proper alignment for 8 bytes, to support when hosts are using different word size (32/64 bits) on src/dst. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-13-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: new cmd MIG_CMD_RECV_BITMAPPeter Xu2018-05-151-0/+2
| | | | | | | | | | Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for one ramblock. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-12-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: allow fault thread to pausePeter Xu2018-05-151-0/+2
| | | | | | | | | | | | | | | Allows the fault thread to stop handling page faults temporarily. When network failure happened (and if we expect a recovery afterwards), we should not allow the fault thread to continue sending things to source, instead, it should halt for a while until the connection is rebuilt. When the dest main thread noticed the failure, it kicks the fault thread to switch to pause state. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-7-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: allow src return path to pausePeter Xu2018-05-151-0/+2
| | | | | | | | | Let the thread pause for network issues. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-6-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: allow dst vm pause on postcopyPeter Xu2018-05-151-0/+2
| | | | | | | | | | | | | | | | | When there is IO error on the incoming channel (e.g., network down), instead of bailing out immediately, we allow the dst vm to switch to the new POSTCOPY_PAUSE state. Currently it is still simple - it waits the new semaphore, until someone poke it for another attempt. One note is that here on ram loading thread we cannot detect the POSTCOPY_ACTIVE state, but we need to detect the more specific POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all the device states. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-5-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: implement "postcopy-pause" src logicPeter Xu2018-05-151-0/+1
| | | | | | | | | | | | | | Now when network down for postcopy, the source side will not fail the migration. Instead we convert the status into this new paused state, and we will try to wait for a rescue in the future. If a recovery is detected, migration_thread() will reset its local variables to prepare for that. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-4-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: add postcopy total blocktime into query-migrateAlexey Perevalov2018-04-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Postcopy total blocktime is available on destination side only. But query-migrate was possible only for source. This patch adds ability to call query-migrate on destination. To be able to see postcopy blocktime, need to request postcopy-blocktime capability. The query-migrate command will show following sample result: {"return": "postcopy-vcpu-blocktime": [115, 100], "status": "completed", "postcopy-blocktime": 100 }} postcopy_vcpu_blocktime contains list, where the first item is the first vCPU in QEMU. This patch has a drawback, it combines states of incoming and outgoing migration. Ongoing migration state will overwrite incoming state. Looks like better to separate query-migrate for incoming and outgoing migration or add parameter to indicate type of migration. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <1521742647-25550-7-git-send-email-a.perevalov@samsung.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: calculate vCPU blocktime on dst sideAlexey Perevalov2018-04-251-1/+4
| | | | | | | | | | | | | | | | | | | | | This patch provides blocktime calculation per vCPU, as a summary and as a overlapped value for all vCPUs. This approach was suggested by Peter Xu, as an improvements of previous approch where QEMU kept tree with faulted page address and cpus bitmask in it. Now QEMU is keeping array with faulted page address as value and vCPU as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps list for blocktime per vCPU (could be traced with page_fault_addr) Blocktime will not calculated if postcopy_blocktime field of MigrationIncomingState wasn't initialized. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <1521742647-25550-4-git-send-email-a.perevalov@samsung.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell2018-03-201-0/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 20 Mar 2018 14:40:43 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (51 commits) postcopy shared docs libvhost-user: Claim support for postcopy postcopy: Allow shared memory vhost: Huge page align and merge vhost+postcopy: Wire up POSTCOPY_END notify vhost-user: Add VHOST_USER_POSTCOPY_END message libvhost-user: mprotect & madvises for postcopy vhost+postcopy: Call wakeups vhost+postcopy: Add vhost waker postcopy: postcopy_notify_shared_wake postcopy: helper for waking shared vhost+postcopy: Resolve client address postcopy-ram: add a stub for postcopy_request_shared_page vhost+postcopy: Helper to send requests to source for shared pages vhost+postcopy: Stash RAMBlock and offset vhost+postcopy: Send address back to qemu libvhost-user+postcopy: Register new regions with the ufd migration/ram: ramblock_recv_bitmap_test_byte_offset postcopy+vhost-user: Split set_mem_table for postcopy vhost+postcopy: Transmit 'listen' to slave ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # scripts/update-linux-headers.sh
| * vhost+postcopy: Call wakeupsDr. David Alan Gilbert2018-03-201-0/+1
| | | | | | | | | | | | | | | | | | | | Cause the vhost-user client to be woken up whenever: a) We place a page in postcopy mode b) We get a fault and the page has already been received Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * postcopy: helper for waking sharedDr. David Alan Gilbert2018-03-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Provide a helper to send a 'wake' request on a userfaultfd for a shared process. The address in the clients address space is specified together with the RAMBlock it was resolved to. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost+postcopy: Helper to send requests to source for shared pagesDr. David Alan Gilbert2018-03-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Provide a helper to be used by shared waker functions to request shared pages from the source. The last_rb pointer is moved into the incoming state since this helper can update it as well as the main fault thread function. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * postcopy: Allow registering of fd handlerDr. David Alan Gilbert2018-03-201-0/+2
| | | | | | | | | | | | | | | | | | | | Allow other userfaultfd's to be registered into the fault thread so that handlers for shared memory can get responses. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | migration: add postcopy migration of dirty bitmapsVladimir Sementsov-Ogievskiy2018-03-131-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Postcopy migration of dirty bitmaps. Only named dirty bitmaps are migrated. If destination qemu is already containing a dirty bitmap with the same name as a migrated bitmap (for the same node), then, if their granularities are the same the migration will be done, otherwise the error will be generated. If destination qemu doesn't contain such bitmap it will be created. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20180313180320.339796-12-vsementsov@virtuozzo.com [Changed '+' to '*' as per list discussion. --js] Signed-off-by: John Snow <jsnow@redhat.com>
* | migration: introduce postcopy-only pendingVladimir Sementsov-Ogievskiy2018-03-131-1/+1
|/ | | | | | | | | There would be savevm states (dirty-bitmap) which can migrate only in postcopy stage. The corresponding pending is introduced here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20180313180320.339796-6-vsementsov@virtuozzo.com