summaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
authorPeter Maydell2017-09-23 13:55:40 +0200
committerPeter Maydell2017-09-23 13:55:40 +0200
commit460b6c8e581aa06b86f59eebd9e52edfe7adf417 (patch)
tree9115b412fa484e5394ec05e27137849be5037daa /docs
parentMerge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request'... (diff)
parentchardev: remove context in chr_update_read_handler (diff)
downloadqemu-460b6c8e581aa06b86f59eebd9e52edfe7adf417.tar.gz
qemu-460b6c8e581aa06b86f59eebd9e52edfe7adf417.tar.xz
qemu-460b6c8e581aa06b86f59eebd9e52edfe7adf417.zip
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Speed up AddressSpaceDispatch creation (Alexey) * Fix kvm.c assert (David) * Memory fixes and further speedup (me) * Persistent reservation manager infrastructure (me) * virtio-serial: add enable_backend callback (Pavel) * chardev GMainContext fixes (Peter) # gpg: Signature made Fri 22 Sep 2017 20:07:33 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (32 commits) chardev: remove context in chr_update_read_handler chardev: use per-dev context for io_add_watch_poll chardev: add Chardev.gcontext field chardev: new qemu_chr_be_update_read_handlers() scsi: add persistent reservation manager using qemu-pr-helper scsi: add multipath support to qemu-pr-helper scsi: build qemu-pr-helper scsi, file-posix: add support for persistent reservation management memory: Share special empty FlatView memory: seek FlatView sharing candidates among children subregions memory: trace FlatView creation and destruction memory: Create FlatView directly memory: Get rid of address_space_init_shareable memory: Rework "info mtree" to print flat views and dispatch trees memory: Do not allocate FlatView in address_space_init memory: Share FlatView's and dispatch trees between address spaces memory: Move address_space_update_ioeventfds memory: Alloc dispatch tree where topology is generared memory: Store physical root MR in FlatView memory: Rename mem_begin/mem_commit/mem_add helpers ... # Conflicts: # configure
Diffstat (limited to 'docs')
-rw-r--r--docs/devel/atomics.txt14
-rw-r--r--docs/interop/pr-helper.rst83
-rw-r--r--docs/pr-manager.rst111
3 files changed, 207 insertions, 1 deletions
diff --git a/docs/devel/atomics.txt b/docs/devel/atomics.txt
index 3ef5d85b1b..10c5fa37e8 100644
--- a/docs/devel/atomics.txt
+++ b/docs/devel/atomics.txt
@@ -63,11 +63,23 @@ operations:
typeof(*ptr) atomic_fetch_sub(ptr, val)
typeof(*ptr) atomic_fetch_and(ptr, val)
typeof(*ptr) atomic_fetch_or(ptr, val)
+ typeof(*ptr) atomic_fetch_xor(ptr, val)
+ typeof(*ptr) atomic_fetch_inc_nonzero(ptr)
typeof(*ptr) atomic_xchg(ptr, val)
typeof(*ptr) atomic_cmpxchg(ptr, old, new)
all of which return the old value of *ptr. These operations are
-polymorphic; they operate on any type that is as wide as an int.
+polymorphic; they operate on any type that is as wide as a pointer.
+
+Similar operations return the new value of *ptr:
+
+ typeof(*ptr) atomic_inc_fetch(ptr)
+ typeof(*ptr) atomic_dec_fetch(ptr)
+ typeof(*ptr) atomic_add_fetch(ptr, val)
+ typeof(*ptr) atomic_sub_fetch(ptr, val)
+ typeof(*ptr) atomic_and_fetch(ptr, val)
+ typeof(*ptr) atomic_or_fetch(ptr, val)
+ typeof(*ptr) atomic_xor_fetch(ptr, val)
Sequentially consistent loads and stores can be done using:
diff --git a/docs/interop/pr-helper.rst b/docs/interop/pr-helper.rst
new file mode 100644
index 0000000000..9f76d5bcf9
--- /dev/null
+++ b/docs/interop/pr-helper.rst
@@ -0,0 +1,83 @@
+..
+
+======================================
+Persistent reservation helper protocol
+======================================
+
+QEMU's SCSI passthrough devices, ``scsi-block`` and ``scsi-generic``,
+can delegate implementation of persistent reservations to an external
+(and typically privileged) program. Persistent Reservations allow
+restricting access to block devices to specific initiators in a shared
+storage setup.
+
+For a more detailed reference please refer the the SCSI Primary
+Commands standard, specifically the section on Reservations and the
+"PERSISTENT RESERVE IN" and "PERSISTENT RESERVE OUT" commands.
+
+This document describes the socket protocol used between QEMU's
+``pr-manager-helper`` object and the external program.
+
+.. contents::
+
+Connection and initialization
+-----------------------------
+
+All data transmitted on the socket is big-endian.
+
+After connecting to the helper program's socket, the helper starts a simple
+feature negotiation process by writing four bytes corresponding to
+the features it exposes (``supported_features``). QEMU reads it,
+then writes four bytes corresponding to the desired features of the
+helper program (``requested_features``).
+
+If a bit is 1 in ``requested_features`` and 0 in ``supported_features``,
+the corresponding feature is not supported by the helper and the connection
+is closed. On the other hand, it is acceptable for a bit to be 0 in
+``requested_features`` and 1 in ``supported_features``; in this case,
+the helper will not enable the feature.
+
+Right now no feature is defined, so the two parties always write four
+zero bytes.
+
+Command format
+--------------
+
+It is invalid to send multiple commands concurrently on the same
+socket. It is however possible to connect multiple sockets to the
+helper and send multiple commands to the helper for one or more
+file descriptors.
+
+A command consists of a request and a response. A request consists
+of a 16-byte SCSI CDB. A file descriptor must be passed to the helper
+together with the SCSI CDB using ancillary data.
+
+The CDB has the following limitations:
+
+- the command (stored in the first byte) must be one of 0x5E
+ (PERSISTENT RESERVE IN) or 0x5F (PERSISTENT RESERVE OUT).
+
+- the allocation length (stored in bytes 7-8 of the CDB for PERSISTENT
+ RESERVE IN) or parameter list length (stored in bytes 5-8 of the CDB
+ for PERSISTENT RESERVE OUT) is limited to 8 KiB.
+
+For PERSISTENT RESERVE OUT, the parameter list is sent right after the
+CDB. The length of the parameter list is taken from the CDB itself.
+
+The helper's reply has the following structure:
+
+- 4 bytes for the SCSI status
+
+- 4 bytes for the payload size (nonzero only for PERSISTENT RESERVE IN
+ and only if the SCSI status is 0x00, i.e. GOOD)
+
+- 96 bytes for the SCSI sense data
+
+- if the size is nonzero, the payload follows
+
+The sense data is always sent to keep the protocol simple, even though
+it is only valid if the SCSI status is CHECK CONDITION (0x02).
+
+The payload size is always less than or equal to the allocation length
+specified in the CDB for the PERSISTENT RESERVE IN command.
+
+If the protocol is violated, the helper closes the socket.
diff --git a/docs/pr-manager.rst b/docs/pr-manager.rst
new file mode 100644
index 0000000000..9b1de198b1
--- /dev/null
+++ b/docs/pr-manager.rst
@@ -0,0 +1,111 @@
+======================================
+Persistent reservation managers
+======================================
+
+SCSI persistent Reservations allow restricting access to block devices
+to specific initiators in a shared storage setup. When implementing
+clustering of virtual machines, it is a common requirement for virtual
+machines to send persistent reservation SCSI commands. However,
+the operating system restricts sending these commands to unprivileged
+programs because incorrect usage can disrupt regular operation of the
+storage fabric.
+
+For this reason, QEMU's SCSI passthrough devices, ``scsi-block``
+and ``scsi-generic`` (both are only available on Linux) can delegate
+implementation of persistent reservations to a separate object,
+the "persistent reservation manager". Only PERSISTENT RESERVE OUT and
+PERSISTENT RESERVE IN commands are passed to the persistent reservation
+manager object; other commands are processed by QEMU as usual.
+
+-----------------------------------------
+Defining a persistent reservation manager
+-----------------------------------------
+
+A persistent reservation manager is an instance of a subclass of the
+"pr-manager" QOM class.
+
+Right now only one subclass is defined, ``pr-manager-helper``, which
+forwards the commands to an external privileged helper program
+over Unix sockets. The helper program only allows sending persistent
+reservation commands to devices for which QEMU has a file descriptor,
+so that QEMU will not be able to effect persistent reservations
+unless it has access to both the socket and the device.
+
+``pr-manager-helper`` has a single string property, ``path``, which
+accepts the path to the helper program's Unix socket. For example,
+the following command line defines a ``pr-manager-helper`` object and
+attaches it to a SCSI passthrough device::
+
+ $ qemu-system-x86_64
+ -device virtio-scsi \
+ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
+ -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
+ -device scsi-block,drive=hd
+
+Alternatively, using ``-blockdev``::
+
+ $ qemu-system-x86_64
+ -device virtio-scsi \
+ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
+ -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
+ -device scsi-block,drive=hd
+
+----------------------------------
+Invoking :program:`qemu-pr-helper`
+----------------------------------
+
+QEMU provides an implementation of the persistent reservation helper,
+called :program:`qemu-pr-helper`. The helper should be started as a
+system service and supports the following option:
+
+-d, --daemon run in the background
+-q, --quiet decrease verbosity
+-v, --verbose increase verbosity
+-f, --pidfile=path PID file when running as a daemon
+-k, --socket=path path to the socket
+-T, --trace=trace-opts tracing options
+
+By default, the socket and PID file are placed in the runtime state
+directory, for example :file:`/var/run/qemu-pr-helper.sock` and
+:file:`/var/run/qemu-pr-helper.pid`. The PID file is not created
+unless :option:`-d` is passed too.
+
+:program:`qemu-pr-helper` can also use the systemd socket activation
+protocol. In this case, the systemd socket unit should specify a
+Unix stream socket, like this::
+
+ [Socket]
+ ListenStream=/var/run/qemu-pr-helper.sock
+
+After connecting to the socket, :program:`qemu-pr-helper`` can optionally drop
+root privileges, except for those capabilities that are needed for
+its operation. To do this, add the following options:
+
+-u, --user=user user to drop privileges to
+-g, --group=group group to drop privileges to
+
+---------------------------------------------
+Multipath devices and persistent reservations
+---------------------------------------------
+
+Proper support of persistent reservation for multipath devices requires
+communication with the multipath daemon, so that the reservation is
+registered and applied when a path is newly discovered or becomes online
+again. :command:`qemu-pr-helper` can do this if the ``libmpathpersist``
+library was available on the system at build time.
+
+As of August 2017, a reservation key must be specified in ``multipath.conf``
+for ``multipathd`` to check for persistent reservation for newly
+discovered paths or reinstated paths. The attribute can be added
+to the ``defaults`` section or the ``multipaths`` section; for example::
+
+ multipaths {
+ multipath {
+ wwid XXXXXXXXXXXXXXXX
+ alias yellow
+ reservation_key 0x123abc
+ }
+ }
+
+Linking :program:`qemu-pr-helper` to ``libmpathpersist`` does not impede
+its usage on regular SCSI devices.