summaryrefslogtreecommitdiffstats
path: root/migration/ram.c
Commit message (Collapse)AuthorAgeFilesLines
* migrate: remove QMP/HMP commands for speed, downtime and cache sizeDaniel P. Berrangé2021-03-181-1/+1
| | | | | | | | | | | | The generic 'migrate_set_parameters' command handle all types of param. Only the QMP commands were documented in the deprecations page, but the rationale for deprecating applies equally to HMP, and the replacements exist. Furthermore the HMP commands are just shims to the QMP commands, so removing the latter breaks the former unless they get re-implemented. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
* migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARDMahmoud Mandour2021-03-151-4/+2Star
| | | | | | | | | | | Replaced various qemu_mutex_lock calls and their respective qemu_mutex_unlock calls with QEMU_LOCK_GUARD macro. This simplifies the code by eliminating the respective qemu_mutex_unlock calls. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Message-Id: <20210311031538.5325-7-ma.mandourr@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: only check page size match if RAM postcopy is enabledStefan Reiter2021-02-081-1/+1
| | | | | | | | | | | | | Postcopy may also be advised for dirty-bitmap migration only, in which case the remote page size will not be available and we'll instead read bogus data, blocking migration with a mismatch error if the VM uses hugepages. Fixes: 58110f0acb ("migration: split common postcopy out of ram postcopy") Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20210204163522.13291-1-s.reiter@proxmox.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Clean up signed vs. unsigned XBZRLE cache-sizeMarkus Armbruster2021-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | 73af8dd8d7 "migration: Make xbzrle_cache_size a migration parameter" (v2.11.0) made the new parameter unsigned (QAPI type 'size', uint64_t in C). It neglected to update existing code, which continues to use int64_t. migrate_xbzrle_cache_size() returns the new parameter. Adjust its return type. QMP query-migrate-cache-size returns migrate_xbzrle_cache_size(). Adjust its return type. migrate-set-parameters passes the new parameter to xbzrle_cache_resize(). Adjust its parameter type. xbzrle_cache_resize() passes it on to cache_init(). Adjust its parameter type. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-3-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: support UFFD write fault processing in ram_save_iterate()Andrey Gruzdev2021-02-081-29/+295
| | | | | | | | | | | | | | | | | | | | | | | | In this particular implementation the same single migration thread is responsible for both normal linear dirty page migration and procesing UFFD page fault events. Processing write faults includes reading UFFD file descriptor, finding respective RAM block and saving faulting page to the migration stream. After page has been saved, write protection can be removed. Since asynchronous version of qemu_put_buffer() is expected to be used to save pages, we also have to flush migraion stream prior to un-protecting saved memory range. Write protection is being removed for any previously protected memory chunk that has hit the migration stream. That's valid for pages from linear page scan along with write fault pages. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210129101407.103458-4-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> fixup pagefault.address cast for 32bit
* migration: introduce 'background-snapshot' migration capabilityAndrey Gruzdev2021-02-081-0/+21
| | | | | | | | | | | | | Add new capability to 'qapi/migration.json' schema. Update migrate_caps_check() to validate enabled capability set against introduced one. Perform checks for required kernel features and compatibility with guest memory backends. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210129101407.103458-2-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/ram: Fix hexadecimal format string specifierPhilippe Mathieu-Daudé2020-11-121-1/+1
| | | | | | | | | | | | The '%u' conversion specifier is for decimal notation. When prefixing a format with '0x', we want the hexadecimal specifier ('%x'). Inspired-by: Dov Murik <dovmurik@linux.vnet.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201103112558.2554390-5-philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* Reduce the time of checkpoint for COLORao, Lei2020-11-111-1/+13
| | | | | | | | | | | | | | we should set ram_bulk_stage to false after ram_state_init, otherwise the bitmap will be unused in migration_bitmap_find_dirty. all pages in ram cache will be flushed to the ram of secondary guest for each checkpoint. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Derek Su <dereksu@qnap.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* migration: Do not initialise statics and globals to 0 or NULLBihong Yu2020-10-261-1/+1
| | | | | | | | | Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-7-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Add braces {} for if statementBihong Yu2020-10-261-2/+4
| | | | | | | | | Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-6-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Add spaces around operatorBihong Yu2020-10-261-1/+1
| | | | | | | | | Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-4-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Don't use '#' flag of printf formatBihong Yu2020-10-261-2/+2
| | | | | | | | | Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1603163448-27122-3-git-send-email-yubihong@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/dirtyrate: move RAMBLOCK_FOREACH_MIGRATABLE into ram.hChuan Zheng2020-09-251-10/+1Star
| | | | | | | | | | | | RAMBLOCK_FOREACH_MIGRATABLE is need in dirtyrate measure, move the existing definition up into migration/ram.h Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <1600237327-33618-6-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/: fix some comment spelling errorszhaolichang2020-09-171-5/+5
| | | | | | | | | | | I found that there are many spelling errors in the comments of qemu, so I used the spellcheck tool to check the spelling errors and finally found some spelling errors in the migration folder. Signed-off-by: zhaolichang <zhaolichang@huawei.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20200917075029.313-3-zhaolichang@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* cpu-throttle: new module, extracted from cpus.cClaudio Fontana2020-07-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | move the vcpu throttling functionality into its own module. This functionality is not specific to any accelerator, and it is used currently by migration to slow down guests to try to have migrations converge, and by the cocoa MacOS UI to throttle speed. cpu-throttle contains the controls to adjust and inspect throttle settings, start (set) and stop vcpu throttling, and the throttling function itself that is run periodically on vcpus to make them take a nap. Execution of the throttling function on all vcpus is triggered by a timer, registered at module initialization. No functionality change. Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20200629093504.3228-3-cfontana@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* migration: Count new_dirty instead of real_dirtyKeqian Zhu2020-07-031-3/+5
| | | | | | | | | | | | | | real_dirty_pages becomes equal to total ram size after dirty log sync in ram_init_bitmaps, the reason is that the bitmap of ramblock is initialized to be all set, so old path counts them as "real dirty" at beginning. This causes wrong dirty rate and false positive throttling. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Message-Id: <20200622032037.31112-1-zhukeqian1@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: fix xbzrle encoding rate calculationWei Wang2020-06-181-3/+1Star
| | | | | | | | | | | | | | | | It's reported an error of implicit conversion from "unsigned long" to "double" when compiling with Clang 10. Simply make the encoding rate 0 when the encoded_size is 0. Fixes: e460a4b1a4 Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reported-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20200617201309.1640952-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* migration/colo.c: Flush ram cache only after receiving device stateLukas Straub2020-06-011-4/+1Star
| | | | | | | | | | | | | If we suceed in receiving ram state, but fail receiving the device state, there will be a mismatch between the two. Fix this by flushing the ram cache only after the vmstate has been received. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <3289d007d494cb0e2f05b1cf4ae6a78d300fede3.1589193382.git.lukasstraub2@web.de> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/xbzrle: add encoding rateWei Wang2020-05-071-2/+37
| | | | | | | | | | | | Users may need to check the xbzrle encoding rate to know if the guest memory is xbzrle encoding-friendly, and dynamically turn off the encoding if the encoding rate is low. Signed-off-by: Yi Sun <yi.y.sun@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <1588208375-19556-1-git-send-email-wei.w.wang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/ram: Consolidate variable reset after placement in ram_load_postcopy()David Hildenbrand2020-05-071-5/+5
| | | | | | | | | | | | | Let's consolidate resetting the variables. Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20200421085300.7734-10-david@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixup for context conflicts with 91ba442
* migration/throttle: Add cpu-throttle-tailslow migration parameterKeqian Zhu2020-05-071-5/+20
| | | | | | | | | | | | | | | | | | | | | At the tail stage of throttling, the Guest is very sensitive to CPU percentage while the @cpu-throttle-increment is excessive usually at tail stage. If this parameter is true, we will compute the ideal CPU percentage used by the Guest, which may exactly make the dirty rate match the dirty rate threshold. Then we will choose a smaller throttle increment between the one specified by @cpu-throttle-increment and the one generated by ideal CPU percentage. Therefore, it is compatible to traditional throttling, meanwhile the throttle increment won't be excessive at tail stage. This may make migration time longer, and is disabled by default. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Message-Id: <20200413101508.54793-1-zhukeqian1@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* Merge remote-tracking branch ↵Peter Maydell2020-05-051-3/+1Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'remotes/vivier2/tags/trivial-branch-for-5.1-pull-request' into staging trivial patches (20200504) Silent static analyzer warning Remove dead assignments Support -chardev serial on macOS Update MAINTAINERS Some cosmetic changes # gpg: Signature made Mon 04 May 2020 16:45:18 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.1-pull-request: hw/timer/pxa2xx_timer: Add assertion to silent static analyzer warning hw/timer/stm32f2xx_timer: Remove dead assignment hw/gpio/aspeed_gpio: Remove dead assignment hw/isa/i82378: Remove dead assignment hw/ide/sii3112: Remove dead assignment hw/input/adb-kbd: Remove dead assignment hw/i2c/pm_smbus: Remove dead assignment blockdev: Remove dead assignment block: Avoid dead assignment Compress lines for immediate return chardev: Add macOS to list of OSes that support -chardev serial MAINTAINERS: Update Keith Busch's email address elf_ops: Don't try to g_mapped_file_unref(NULL) hw/mem/pc-dimm: Fix line over 80 characters warning hw/mem/pc-dimm: Print slot number on error at pc_dimm_pre_plug() MAINTAINERS: Mark the LatticeMico32 target as orphan timer/exynos4210_mct: Remove redundant statement in exynos4210_mct_write() display/blizzard: use extract16() for fix clang analyzer warning in blizzard_draw_line16_32() scsi/esp-pci: add g_assert() for fix clang analyzer warning in esp_pci_io_write() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * Compress lines for immediate returnSimran Singhal2020-05-041-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compress two lines into a single line if immediate return statement is found. It also remove variables progress, val, data, ret and sock as they are no longer needed. Remove space between function "mixer_load" and '(' to fix the checkpatch.pl error:- ERROR: space prohibited between function name and open parenthesis '(' Done using following coccinelle script: @@ local idexpression ret; expression e; @@ -ret = +return e; -return ret; Signed-off-by: Simran Singhal <singhalsimran0@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200401165314.GA3213@simran-Inspiron-5558> [lv: in handle_aiocb_write_zeroes_unmap() move "int ret" inside the #ifdef] Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* | lockable: replaced locks with lock guard macros where appropriateDaniel Brodsky2020-05-041-2/+1Star
|/ | | | | | | | | | | - ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets - replaced result with QEMU_LOCK_GUARD if all unlocks at function end - replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* migration/ram: fix use after free of local_errVladimir Sementsov-Ogievskiy2020-03-251-0/+1
| | | | | | | | | | | local_err is used again in migration_bitmap_sync_precopy() after precopy_notify(), so we must zero it. Otherwise try to set non-NULL local_err will crash. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324153630.11882-6-vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* ram/colo: only record bitmap of dirty pages in COLO stagezhanghailiang2020-03-131-4/+5
| | | | | | | | | | It is only need to record bitmap of dirty pages while goes into COLO stage. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <20200224065414.36524-6-zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* COLO: Optimize memory back-up processzhanghailiang2020-03-131-17/+49
| | | | | | | | | | | | | | | This patch will reduce the downtime of VM for the initial process, Previously, we copied all these memory in preparing stage of COLO while we need to stop VM, which is a time-consuming process. Here we optimize it by a trick, back-up every page while in migration process while COLO is enabled, though it affects the speed of the migration, but it obviously reduce the downtime of back-up all SVM'S memory in COLO preparing stage. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> minor typo fixes
* migration/throttle: Add throttle-trig-thres migration parameterKeqian Zhu2020-03-131-22/+30
| | | | | | | | | | | | | | | | | | Currently, if the bytes_dirty_period is more than the 50% of bytes_xfer_period, we start or increase throttling. If we make this percentage higher, then we can tolerate higher dirty rate during migration, which means less impact on guest. The side effect of higher percentage is longer migration time. We can make this parameter configurable to switch between mig- ration time first or guest performance first. The default value is 50 and valid range is 1 to 100. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Message-Id: <20200224023142.39360-1-zhukeqian1@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Add zstd compression multifd supportJuan Quintela2020-02-281-1/+0Star
| | | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Make no compression operations into its own structureJuan Quintela2020-02-281-0/+1
| | | | | | | | | | | It will be used later. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- No comp value needs to be zero.
* multifd: Split multifd code into its own fileJuan Quintela2020-01-291-987/+1Star
| | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Make multifd_load_setup() get an Error parameterJuan Quintela2020-01-291-1/+1
| | | | | | | We need to change the full chain to pass the Error parameter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Make multifd_save_setup() get an Error parameterJuan Quintela2020-01-291-1/+1
| | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Make checkpatch happy with commentsJuan Quintela2020-01-291-4/+6
| | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Use qemu_target_page_size()Juan Quintela2020-01-291-4/+5
| | | | | | | We will make it cpu independent. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: multifd_send_sync_main only needs the qemufileJuan Quintela2020-01-291-6/+6
| | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: multifd_queue_page only needs the qemufileJuan Quintela2020-01-291-4/+4
| | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: multifd_send_pages only needs the qemufileJuan Quintela2020-01-291-4/+4
| | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/multifd: fix nullptr access in multifd_send_terminate_threadsZhimin Feng2020-01-291-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the multifd_send_threads is not created when migration is failed, multifd_save_cleanup would be called twice. In this senario, the multifd_send_state is accessed after it has been released, the result is that the source VM is crashing down. Here is the coredump stack: Program received signal SIGSEGV, Segmentation fault. 0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012 1012 MultiFDSendParams *p = &multifd_send_state->params[i]; #0 0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012 #1 0x00005629333ab8a9 in multifd_save_cleanup () at migration/ram.c:1028 #2 0x00005629333abaea in multifd_new_send_channel_async (task=0x562935450e70, opaque=<optimized out>) at migration/ram.c:1202 #3 0x000056293373a562 in qio_task_complete (task=task@entry=0x562935450e70) at io/task.c:196 #4 0x000056293373a6e0 in qio_task_thread_result (opaque=0x562935450e70) at io/task.c:111 #5 0x00007f475d4d75a7 in g_idle_dispatch () from /usr/lib64/libglib-2.0.so.0 #6 0x00007f475d4da9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #7 0x0000562933785b33 in glib_pollfds_poll () at util/main-loop.c:219 #8 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #9 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518 #10 0x00005629334c5acf in main_loop () at vl.c:1810 #11 0x000056293334d7bb in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4471 If the multifd_send_threads is not created when migration is failed. In this senario, we don't call multifd_save_cleanup in multifd_new_send_channel_async. Signed-off-by: Zhimin Feng <fengzhimin1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Don't send data if we have stoppedJuan Quintela2020-01-291-1/+2
| | | | | | | | If we do a cancel, we got out without one error, but we can't do the rest of the output as in a normal situation. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Make sure that we don't do any IO after an errorJuan Quintela2020-01-291-9/+13
| | | | | Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Be consistent about using uint64_tJuan Quintela2020-01-201-5/+8
| | | | | | | | | We transmit ram_addr_t always as uint64_t. Be consistent in its use (on 64bit system, it is always uint64_t problem is 32bits). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* Bug #1829242 correction.Alexey Romko2020-01-201-11/+18
| | | | | | | | | | | Added type conversions to ram_addr_t before all left shifts of page indexes to TARGET_PAGE_BITS, to correct overflows when the page address was 4Gb and more. Signed-off-by: Alexey Romko <nevilad@yahoo.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: fix destroyed mutex access in terminating multifd threadsJiahui Cen2020-01-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | One multifd will lock all the other multifds' IOChannel mutex to inform them to quit by setting p->quit or shutting down p->c. In this senario, if some multifds had already been terminated and multifd_load_cleanup/multifd_save_cleanup had destroyed their mutex, it could cause destroyed mutex access when trying lock their mutex. Here is the coredump stack: #0 0x00007f81a2794437 in raise () from /usr/lib64/libc.so.6 #1 0x00007f81a2795b28 in abort () from /usr/lib64/libc.so.6 #2 0x00007f81a278d1b6 in __assert_fail_base () from /usr/lib64/libc.so.6 #3 0x00007f81a278d262 in __assert_fail () from /usr/lib64/libc.so.6 #4 0x000055eb1bfadbd3 in qemu_mutex_lock_impl (mutex=0x55eb1e2d1988, file=<optimized out>, line=<optimized out>) at util/qemu-thread-posix.c:64 #5 0x000055eb1bb4564a in multifd_send_terminate_threads (err=<optimized out>) at migration/ram.c:1015 #6 0x000055eb1bb4bb7f in multifd_send_thread (opaque=0x55eb1e2d19f8) at migration/ram.c:1171 #7 0x000055eb1bfad628 in qemu_thread_start (args=0x55eb1e170450) at util/qemu-thread-posix.c:502 #8 0x00007f81a2b36df5 in start_thread () from /usr/lib64/libpthread.so.0 #9 0x00007f81a286048d in clone () from /usr/lib64/libc.so.6 To fix it up, let's destroy the mutex after all the other multifd threads had been terminated. Signed-off-by: Jiahui Cen <cenjiahui@huawei.com> Signed-off-by: Ying Fang <fangying1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: fix nullptr access in terminating multifd threadsJiahui Cen2020-01-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One multifd channel will shutdown all the other multifd's IOChannel when it fails to receive an IOChannel. In this senario, if some multifds had not received its IOChannel yet, it would try to shutdown its IOChannel which could cause nullptr access at qio_channel_shutdown. Here is the coredump stack: #0 object_get_class (obj=obj@entry=0x0) at qom/object.c:908 #1 0x00005563fdbb8f4a in qio_channel_shutdown (ioc=0x0, how=QIO_CHANNEL_SHUTDOWN_BOTH, errp=0x0) at io/channel.c:355 #2 0x00005563fd7b4c5f in multifd_recv_terminate_threads (err=<optimized out>) at migration/ram.c:1280 #3 0x00005563fd7bc019 in multifd_recv_new_channel (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce00) at migration/ram.c:1478 #4 0x00005563fda82177 in migration_ioc_process_incoming (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce30) at migration/migration.c:605 #5 0x00005563fda8567d in migration_channel_process_incoming (ioc=0x556400255610) at migration/channel.c:44 #6 0x00005563fda83ee0 in socket_accept_incoming_migration (listener=0x5563fff6b920, cioc=0x556400255610, opaque=<optimized out>) at migration/socket.c:166 #7 0x00005563fdbc25cd in qio_net_listener_channel_func (ioc=<optimized out>, condition=<optimized out>, opaque=<optimized out>) at io/net-listener.c:54 #8 0x00007f895b6fe9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #9 0x00005563fdc18136 in glib_pollfds_poll () at util/main-loop.c:218 #10 0x00005563fdc181b5 in os_host_main_loop_wait (timeout=1000000000) at util/main-loop.c:241 #11 0x00005563fdc183a2 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517 #12 0x00005563fd8edb37 in main_loop () at vl.c:1791 #13 0x00005563fd74fd45 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4473 To fix it up, let's check p->c before calling qio_channel_shutdown. Signed-off-by: Jiahui Cen <cenjiahui@huawei.com> Signed-off-by: Ying Fang <fangying1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: not use multifd during postcopyWei Yang2020-01-201-3/+6
| | | | | | | | | | | We don't support multifd during postcopy, but user still could enable both multifd and postcopy. This leads to migration failure. Skip multifd during postcopy. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: clean pages after filling packetWei Yang2020-01-201-2/+4
| | | | | | | | | | | | | | | | | This is a preparation for the next patch: not use multifd during postcopy. Without enabling postcopy, everything looks good. While after enabling postcopy, migration may fail even not use multifd during postcopy. The reason is the pages is not properly cleared and *old* target page will continue to be transferred. After clean pages, migration succeeds. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/postcopy: enable compress during postcopyWei Yang2020-01-201-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | postcopy requires to place a whole host page, while migration thread migrate memory in target page size. This makes postcopy need to collect all target pages in one host page before placing via userfaultfd. To enable compress during postcopy, there are two problems to solve: 1. Random order for target page arrival 2. Target pages in one host page arrives without interrupt by target page from other host page The first one is handled by previous cleanup patch. This patch handles the second one by: 1. Flush compress thread for each host page 2. Wait for decompress thread for before placing host page Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/postcopy: enable random order target page arrivalWei Yang2020-01-201-8/+10
| | | | | | | | | | | | | After using number of target page received to track one host page, we could have the capability to handle random order target page arrival in one host page. This is a preparation for enabling compress during postcopy. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/postcopy: set all_zero to true on the first target pageWei Yang2020-01-201-1/+1
| | | | | | | | | | | | For the first target page, all_zero is set to true for this round check. After target_pages introduced, we could leverage this variable instead of checking the address offset. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>