<feed xmlns='http://www.w3.org/2005/Atom'>
<title>bwlp/qemu.git/migration/trace-events, branch master</title>
<subtitle>Experimental fork of QEMU with video encoding patches</subtitle>
<id>https://git.openslx.org/bwlp/qemu.git/atom/migration/trace-events?h=master</id>
<link rel='self' href='https://git.openslx.org/bwlp/qemu.git/atom/migration/trace-events?h=master'/>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/'/>
<updated>2022-08-02T15:46:52+00:00</updated>
<entry>
<title>Revert "migration: Simplify unqueue_page()"</title>
<updated>2022-08-02T15:46:52+00:00</updated>
<author>
<name>Thomas Huth</name>
</author>
<published>2022-08-02T06:19:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=777f53c75983dd10756f5dbfc8af50fe11da81c1'/>
<id>urn:sha1:777f53c75983dd10756f5dbfc8af50fe11da81c1</id>
<content type='text'>
This reverts commit cfd66f30fb0f735df06ff4220e5000290a43dad3.

The simplification of unqueue_page() introduced a bug that sometimes
breaks migration on s390x hosts.

The problem is not fully understood yet, but since we are already in
the freeze for QEMU 7.1 and we need something working there, let's
revert this patch for the upcoming release. The optimization can be
redone later again in a proper way if necessary.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2099934
Signed-off-by: Thomas Huth &lt;thuth@redhat.com&gt;
Message-Id: &lt;20220802061949.331576-1-thuth@redhat.com&gt;
Reviewed-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
Signed-off-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
</content>
</entry>
<entry>
<title>migration: Enable TLS for preempt channel</title>
<updated>2022-07-20T11:15:09+00:00</updated>
<author>
<name>Peter Xu</name>
</author>
<published>2022-07-07T18:55:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=f0afaf6ce4995d37cd411ae26cf8ca1d6dde0f93'/>
<id>urn:sha1:f0afaf6ce4995d37cd411ae26cf8ca1d6dde0f93</id>
<content type='text'>
This patch is based on the async preempt channel creation.  It continues
wiring up the new channel with TLS handshake to destionation when enabled.

Note that only the src QEMU needs such operation; the dest QEMU does not
need any change for TLS support due to the fact that all channels are
established synchronously there, so all the TLS magic is already properly
handled by migration_tls_channel_process_incoming().

Reviewed-by: Daniel P. Berrange &lt;berrange@redhat.com&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Message-Id: &lt;20220707185518.27529-1-peterx@redhat.com&gt;
Signed-off-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
</content>
</entry>
<entry>
<title>migration: Postcopy recover with preempt enabled</title>
<updated>2022-07-20T11:15:08+00:00</updated>
<author>
<name>Peter Xu</name>
</author>
<published>2022-07-07T18:55:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=60bb3c5871a7f7b7cfff5d0a30a035e30cce8e42'/>
<id>urn:sha1:60bb3c5871a7f7b7cfff5d0a30a035e30cce8e42</id>
<content type='text'>
To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Reviewed-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Message-Id: &lt;20220707185506.27257-1-peterx@redhat.com&gt;
Signed-off-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
</content>
</entry>
<entry>
<title>migration: Postcopy preemption enablement</title>
<updated>2022-07-20T11:15:08+00:00</updated>
<author>
<name>Peter Xu</name>
</author>
<published>2022-07-07T18:55:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=c01b16edf6a22f28c2a943652c82d18fccc527b7'/>
<id>urn:sha1:c01b16edf6a22f28c2a943652c82d18fccc527b7</id>
<content type='text'>
This patch enables postcopy-preempt feature.

It contains two major changes to the migration logic:

(1) Postcopy requests are now sent via a different socket from precopy
    background migration stream, so as to be isolated from very high page
    request delays.

(2) For huge page enabled hosts: when there's postcopy requests, they can now
    intercept a partial sending of huge host pages on src QEMU.

After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.

There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested.  In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.

On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.

On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).

This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed.  Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).

Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages).  So that locality
hint is much meaningless if huge pages are used.

Reviewed-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Message-Id: &lt;20220707185504.27203-1-peterx@redhat.com&gt;
Signed-off-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
</content>
</entry>
<entry>
<title>migration: Postcopy preemption preparation on channel creation</title>
<updated>2022-07-20T11:15:08+00:00</updated>
<author>
<name>Peter Xu</name>
</author>
<published>2022-07-07T18:55:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=36f62f11e44b8a0a653998da100be307ac98704d'/>
<id>urn:sha1:36f62f11e44b8a0a653998da100be307ac98704d</id>
<content type='text'>
Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.

A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.

The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.

Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.

Reviewed-by: Daniel P. Berrangé &lt;berrange@redhat.com&gt;
Reviewed-by: Juan Quintela &lt;quintela@redhat.com&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Message-Id: &lt;20220707185502.27149-1-peterx@redhat.com&gt;
Signed-off-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
  dgilbert: With Peter's fix to quieten compiler warning on
       start_migration
</content>
</entry>
<entry>
<title>migration: Tracepoint change in postcopy-run bottom half</title>
<updated>2022-03-02T18:17:56+00:00</updated>
<author>
<name>Peter Xu</name>
</author>
<published>2022-03-01T08:39:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=b9a040b93536b6b89a569b577ff22317c0287879'/>
<id>urn:sha1:b9a040b93536b6b89a569b577ff22317c0287879</id>
<content type='text'>
Remove the old two tracepoints and they're even near each other:

    trace_loadvm_postcopy_handle_run_cpu_sync()
    trace_loadvm_postcopy_handle_run_vmstart()

Add trace_loadvm_postcopy_handle_run_bh() with a finer granule trace.

Reviewed-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Message-Id: &lt;20220301083925.33483-4-peterx@redhat.com&gt;
Signed-off-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
</content>
</entry>
<entry>
<title>migration: Finer grained tracepoints for POSTCOPY_LISTEN</title>
<updated>2022-03-02T18:17:56+00:00</updated>
<author>
<name>Peter Xu</name>
</author>
<published>2022-03-01T08:39:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=c84f976e913187eecdd9db5e45cae2ea6dd1a82b'/>
<id>urn:sha1:c84f976e913187eecdd9db5e45cae2ea6dd1a82b</id>
<content type='text'>
The enablement of postcopy listening has a few steps, add a few tracepoints to
be there ready for some basic measurements for them.

Reviewed-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Message-Id: &lt;20220301083925.33483-3-peterx@redhat.com&gt;
Signed-off-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
</content>
</entry>
<entry>
<title>migration: Dump sub-cmd name in loadvm_process_command tp</title>
<updated>2022-03-02T18:17:56+00:00</updated>
<author>
<name>Peter Xu</name>
</author>
<published>2022-03-01T08:39:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=a7060ba3ccc0ec6b8674988b46fc131d2f299458'/>
<id>urn:sha1:a7060ba3ccc0ec6b8674988b46fc131d2f299458</id>
<content type='text'>
It'll be easier to read the name rather than index of sub-cmd when debugging.

Reviewed-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Message-Id: &lt;20220301083925.33483-2-peterx@redhat.com&gt;
Signed-off-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
</content>
</entry>
<entry>
<title>migration: Simplify unqueue_page()</title>
<updated>2022-01-28T14:38:23+00:00</updated>
<author>
<name>Peter Xu</name>
</author>
<published>2022-01-19T08:09:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=cfd66f30fb0f735df06ff4220e5000290a43dad3'/>
<id>urn:sha1:cfd66f30fb0f735df06ff4220e5000290a43dad3</id>
<content type='text'>
This patch simplifies unqueue_page() on both sides of it (itself, and caller).

Firstly, due to the fact that right after unqueue_page() returned true, we'll
definitely send a huge page (see ram_save_huge_page() call - it will _never_
exit before finish sending that huge page), so unqueue_page() does not need to
jump in small page size if huge page is enabled on the ramblock.  IOW, it's
destined that only the 1st 4K page will be valid, when unqueue the 2nd+ time
we'll notice the whole huge page has already been sent anyway.  Switching to
operating on huge page reduces a lot of the loops of redundant unqueue_page().

Meanwhile, drop the dirty check.  It's not helpful to call test_bit() every
time to jump over clean pages, as ram_save_host_page() has already done so,
while in a faster way (see commit ba1b7c812c ("migration/ram: Optimize
ram_save_host_page()", 2021-05-13)).  So that's not necessary too.

Drop the two tracepoints along the way - based on above analysis it's very
possible that no one is really using it..

Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Reviewed-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;
Reviewed-by: Juan Quintela &lt;quintela@redhat.com&gt;
Signed-off-by: Juan Quintela &lt;quintela@redhat.com&gt;
</content>
</entry>
<entry>
<title>multifd: Use normal pages array on the send side</title>
<updated>2022-01-28T14:38:23+00:00</updated>
<author>
<name>Juan Quintela</name>
</author>
<published>2021-11-22T12:26:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.openslx.org/bwlp/qemu.git/commit/?id=815956f03902980c771da64b17f7f791c1cb57b0'/>
<id>urn:sha1:815956f03902980c771da64b17f7f791c1cb57b0</id>
<content type='text'>
We are only sending normal pages through multifd channels.
Later on this series, we are going to also send zero pages.
We are going to detect if a page is zero or non zero in the multifd
channel thread, not on the main thread.

So we receive an array of pages page-&gt;offset[N]

And we will end with:

p-&gt;normal[N - zero_pages]
p-&gt;zero[zero_pages].

In this patch, we just copy all the pages in offset to normal.

for (i = 0; i &lt; pages-&gt;num; i++) {
    p-&gt;narmal[p-&gt;normal_num] = pages-&gt;offset[i];
    p-&gt;normal_num++:
}

Later in the series this becomes:

for (i = 0; i &lt; pages-&gt;num; i++) {
    if (buffer_is_zero(page-&gt;offset[i])) {
        p-&gt;zerol[p-&gt;zero_num] = pages-&gt;offset[i];
        p-&gt;zero_num++:
    } else {
        p-&gt;narmal[p-&gt;normal_num] = pages-&gt;offset[i];
        p-&gt;normal_num++:
    }
}

Signed-off-by: Juan Quintela &lt;quintela@redhat.com&gt;
Reviewed-by: Dr. David Alan Gilbert &lt;dgilbert@redhat.com&gt;

---

Improving comment (dave)
Renaming num_normal_pages to total_normal_pages (peter)
</content>
</entry>
</feed>
