From 8915106c6950e43f0cf0dd5d3092450f22ce5418 Mon Sep 17 00:00:00 2001 From: Leonardo Garcia Date: Tue, 18 Jan 2022 12:56:30 +0100 Subject: docs: rSTify ppc-spapr-hotplug.txt. While working on this file, also removed and unused reference in the end of the file. The reference in the text was removed by commit 9f992cca93d (spapr: update spapr hotplug documentation), but the link in the end of the document was not removed then. Signed-off-by: Leonardo Garcia Reviewed-by: Daniel Henrique Barboza Message-Id: <50ed30232e0e6eafb580c17adec3fba17b873014.1641995058.git.lagarcia@br.ibm.com> Signed-off-by: Cédric Le Goater --- docs/specs/ppc-spapr-hotplug.txt | 759 ++++++++++++++++++++++----------------- 1 file changed, 430 insertions(+), 329 deletions(-) (limited to 'docs') diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt index d4fb2d46d9..f84dc55ad9 100644 --- a/docs/specs/ppc-spapr-hotplug.txt +++ b/docs/specs/ppc-spapr-hotplug.txt @@ -1,224 +1,316 @@ -= sPAPR Dynamic Reconfiguration = +============================= +sPAPR Dynamic Reconfiguration +============================= -sPAPR/"pseries" guests make use of a facility called dynamic-reconfiguration -to handle hotplugging of dynamic "physical" resources like PCI cards, or -"logical"/paravirtual resources like memory, CPUs, and "physical" +sPAPR or pSeries guests make use of a facility called dynamic reconfiguration +to handle hot plugging of dynamic "physical" resources like PCI cards, or +"logical"/para-virtual resources like memory, CPUs, and "physical" host-bridges, which are generally managed by the host/hypervisor and provided -to guests as virtualized resources. The specifics of dynamic-reconfiguration -are documented extensively in PAPR+ v2.7, Section 13.1. This document -provides a summary of that information as it applies to the implementation -within QEMU. +to guests as virtualized resources. The specifics of dynamic reconfiguration +are documented extensively in section 13 of the Linux on Power Architecture +Reference document ([LoPAR]_). This document provides a summary of that +information as it applies to the implementation within QEMU. -== Dynamic-reconfiguration Connectors == +Dynamic-reconfiguration Connectors +================================== -To manage hotplug/unplug of these resources, a firmware abstraction known as +To manage hot plug/unplug of these resources, a firmware abstraction known as a Dynamic Resource Connector (DRC) is used to assign a particular dynamic resource to the guest, and provide an interface for the guest to manage configuration/removal of the resource associated with it. -== Device-tree description of DRCs == +Device tree description of DRCs +=============================== -A set of 4 Open Firmware device tree array properties are used to describe +A set of four Open Firmware device tree array properties are used to describe the name/index/power-domain/type of each DRC allocated to a guest at -boot-time. There may be multiple sets of these arrays, rooted at different +boot time. There may be multiple sets of these arrays, rooted at different paths in the device tree depending on the type of resource the DRCs manage. In some cases, the DRCs themselves may be provided by a dynamic resource, -such as the DRCs managing PCI slots on a hotplugged PHB. In this case the +such as the DRCs managing PCI slots on a hot plugged PHB. In this case the arrays would be fetched as part of the device tree retrieval interfaces -for hotplugged resources described under "Guest->Host interface". +for hot plugged resources described under :ref:`guest-host-interface`. The array properties are described below. Each entry/element in an array describes the DRC identified by the element in the corresponding position -of ibm,drc-indexes: - -ibm,drc-names: - first 4-bytes: BE-encoded integer denoting the number of entries - each entry: a NULL-terminated string encoded as a byte array - - values for logical/virtual resources are defined in PAPR+ v2.7, - Section 13.5.2.4, and basically consist of the type of the resource - followed by a space and a numerical value that's unique across resources - of that type. - - values for "physical" resources such as PCI or VIO devices are - defined as being "location codes", which are the "location labels" of - each encapsulating device, starting from the chassis down to the - individual slot for the device, concatenated by a hyphen. This provides - a mapping of resources to a physical location in a chassis for debugging - purposes. For QEMU, this mapping is less important, so we assign a - location code that conforms to naming specifications, but is simply a - location label for the slot by itself to simplify the implementation. - The naming convention for location labels is documented in detail in - PAPR+ v2.7, Section 12.3.1.5, and in our case amounts to using "C" - for PCI/VIO device slots, where is unique across all PCI/VIO - device slots. - -ibm,drc-indexes: - first 4-bytes: BE-encoded integer denoting the number of entries - each 4-byte entry: BE-encoded integer that is unique across all DRCs - in the machine - - is arbitrary, but in the case of QEMU we try to maintain the - convention used to assign them to pSeries guests on pHyp: - - bit[31:28]: integer encoding of , where is: - 1 for CPU resource - 2 for PHB resource - 3 for VIO resource - 4 for PCI resource - 8 for Memory resource - bit[27:0]: integer encoding of , where is unique across - all resources of specified type - -ibm,drc-power-domains: - first 4-bytes: BE-encoded integer denoting the number of entries - each 4-byte entry: 32-bit, BE-encoded integer that specifies the - power domain the resource will be assigned to. In the case of QEMU - we associated all resources with a "live insertion" domain, where the - power is assumed to be managed automatically. The integer value for - this domain is a special value of -1. - - -ibm,drc-types: - first 4-bytes: BE-encoded integer denoting the number of entries - each entry: a NULL-terminated string encoded as a byte array - - is assigned as follows: - "CPU" for a CPU - "PHB" for a physical host-bridge - "SLOT" for a VIO slot - "28" for a PCI slot - "MEM" for memory resource - -== Guest->Host interface to manage dynamic resources == - -Each DRC is given a globally unique DRC Index, and resources associated with -a particular DRC are configured/managed by the guest via a number of RTAS -calls which reference individual DRCs based on the DRC index. This can be -considered the guest->host interface. - -rtas-set-power-level: - arg[0]: integer identifying power domain - arg[1]: new power level for the domain, 0-100 - output[0]: status, 0 on success - output[1]: power level after command - - Set the power level for a specified power domain - -rtas-get-power-level: - arg[0]: integer identifying power domain - output[0]: status, 0 on success - output[1]: current power level - - Get the power level for a specified power domain - -rtas-set-indicator: - arg[0]: integer identifying sensor/indicator type - arg[1]: index of sensor, for DR-related sensors this is generally the - DRC index - arg[2]: desired sensor value - output[0]: status, 0 on success - - Set the state of an indicator or sensor. For the purpose of this document we - focus on the indicator/sensor types associated with a DRC. The types are: - - 9001: isolation-state, controls/indicates whether a device has been made - accessible to a guest - - supported sensor values: - 0: isolate, device is made unaccessible by guest OS - 1: unisolate, device is made available to guest OS - - 9002: dr-indicator, controls "visual" indicator associated with device - - supported sensor values: - 0: inactive, resource may be safely removed - 1: active, resource is in use and cannot be safely removed - 2: identify, used to visually identify slot for interactive hotplug - 3: action, in most cases, used in the same manner as identify - - 9003: allocation-state, generally only used for "logical" DR resources to - request the allocation/deallocation of a resource prior to acquiring - it via isolation-state->unisolate, or after releasing it via - isolation-state->isolate, respectively. for "physical" DR (like PCI - hotplug/unplug) the pre-allocation of the resource is implied and - this sensor is unused. - - supported sensor values: - 0: unusable, tell firmware/system the resource can be - unallocated/reclaimed and added back to the system resource pool - 1: usable, request the resource be allocated/reserved for use by - guest OS - 2: exchange, used to allocate a spare resource to use for fail-over - in certain situations. unused in QEMU - 3: recover, used to reclaim a previously allocated resource that's - not currently allocated to the guest OS. unused in QEMU - -rtas-get-sensor-state: - arg[0]: integer identifying sensor/indicator type - arg[1]: index of sensor, for DR-related sensors this is generally the - DRC index - output[0]: status, 0 on success - - Used to read an indicator or sensor value. - - For DR-related operations, the only noteworthy sensor is dr-entity-sense, - which has a type value of 9003, as allocation-state does in the case of - rtas-set-indicator. The semantics/encodings of the sensor values are distinct - however: - - supported sensor values for dr-entity-sense (9003) sensor: - 0: empty, - for physical resources: DRC/slot is empty - for logical resources: unused - 1: present, - for physical resources: DRC/slot is populated with a device/resource - for logical resources: resource has been allocated to the DRC - 2: unusable, - for physical resources: unused - for logical resources: DRC has no resource allocated to it - 3: exchange, - for physical resources: unused - for logical resources: resource available for exchange (see - allocation-state sensor semantics above) - 4: recovery, - for physical resources: unused - for logical resources: resource available for recovery (see - allocation-state sensor semantics above) - -rtas-ibm-configure-connector: - arg[0]: guest physical address of 4096-byte work area buffer - arg[1]: 0, or address of additional 4096-byte work area buffer. only non-zero - if a prior RTAS response indicated a need for additional memory - output[0]: status: - 0: completed transmittal of device-tree node - 1: instruct guest to prepare for next DT sibling node - 2: instruct guest to prepare for next DT child node - 3: instruct guest to prepare for next DT property - 4: instruct guest to ascend to parent DT node - 5: instruct guest to provide additional work-area buffer - via arg[1] - 990x: instruct guest that operation took too long and to try - again later - - Used to fetch an OF device-tree description of the resource associated with - a particular DRC. The DRC index is encoded in the first 4-bytes of the first - work area buffer. - - Work area layout, using 4-byte offsets: - wa[0]: DRC index of the DRC to fetch device-tree nodes from - wa[1]: 0 (hard-coded) - wa[2]: for next-sibling/next-child response: - wa offset of null-terminated string denoting the new node's name - for next-property response: - wa offset of null-terminated string denoting new property's name - wa[3]: for next-property response (unused otherwise): - byte-length of new property's value - wa[4]: for next-property response (unused otherwise): - new property's value, encoded as an OFDT-compatible byte array - -== hotplug/unplug events == +of ``ibm,drc-indexes``: + +``ibm,drc-names`` +----------------- + + First 4-bytes: big-endian (BE) encoded integer denoting the number of entries. + + Each entry: a NULL-terminated ```` string encoded as a byte array. + + ```` values for logical/virtual resources are defined in the Linux on + Power Architecture Reference ([LoPAR]_) section 13.5.2.4, and basically + consist of the type of the resource followed by a space and a numerical + value that's unique across resources of that type. + + ```` values for "physical" resources such as PCI or VIO devices are + defined as being "location codes", which are the "location labels" of each + encapsulating device, starting from the chassis down to the individual slot + for the device, concatenated by a hyphen. This provides a mapping of + resources to a physical location in a chassis for debugging purposes. For + QEMU, this mapping is less important, so we assign a location code that + conforms to naming specifications, but is simply a location label for the + slot by itself to simplify the implementation. The naming convention for + location labels is documented in detail in the [LoPAR]_ section 12.3.1.5, + and in our case amounts to using ``C`` for PCI/VIO device slots, where + ```` is unique across all PCI/VIO device slots. + +``ibm,drc-indexes`` +------------------- + + First 4-bytes: BE-encoded integer denoting the number of entries. + + Each 4-byte entry: BE-encoded ```` integer that is unique across all + DRCs in the machine. + + ```` is arbitrary, but in the case of QEMU we try to maintain the + convention used to assign them to pSeries guests on pHyp (the hypervisor + portion of PowerVM): + + ``bit[31:28]``: integer encoding of ````, where ```` is: + + ``1`` for CPU resource. + + ``2`` for PHB resource. + + ``3`` for VIO resource. + + ``4`` for PCI resource. + + ``8`` for memory resource. + + ``bit[27:0]``: integer encoding of ````, where ```` is unique + across all resources of specified type. + +``ibm,drc-power-domains`` +------------------------- + + First 4-bytes: BE-encoded integer denoting the number of entries. + + Each 4-byte entry: 32-bit, BE-encoded ```` integer that specifies the + power domain the resource will be assigned to. In the case of QEMU we + associated all resources with a "live insertion" domain, where the power is + assumed to be managed automatically. The integer value for this domain is a + special value of ``-1``. + + +``ibm,drc-types`` +----------------- + + First 4-bytes: BE-encoded integer denoting the number of entries. + + Each entry: a NULL-terminated ```` string encoded as a byte array. + ```` is assigned as follows: + + "CPU" for a CPU. + + "PHB" for a physical host-bridge. + + "SLOT" for a VIO slot. + + "28" for a PCI slot. + + "MEM" for memory resource. + +.. _guest-host-interface: + +Guest->Host interface to manage dynamic resources +================================================= + +Each DRC is given a globally unique DRC index, and resources associated with a +particular DRC are configured/managed by the guest via a number of RTAS calls +which reference individual DRCs based on the DRC index. This can be considered +the guest->host interface. + +``rtas-set-power-level`` +------------------------ + +Set the power level for a specified power domain. + + ``arg[0]``: integer identifying power domain. + + ``arg[1]``: new power level for the domain, ``0-100``. + + ``output[0]``: status, ``0`` on success. + + ``output[1]``: power level after command. + +``rtas-get-power-level`` +------------------------ + +Get the power level for a specified power domain. + + ``arg[0]``: integer identifying power domain. + + ``output[0]``: status, ``0`` on success. + + ``output[1]``: current power level. + +``rtas-set-indicator`` +---------------------- + +Set the state of an indicator or sensor. + + ``arg[0]``: integer identifying sensor/indicator type. + + ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC + index. + + ``arg[2]``: desired sensor value. + + ``output[0]``: status, ``0`` on success. + +For the purpose of this document we focus on the indicator/sensor types +associated with a DRC. The types are: + +* ``9001``: ``isolation-state``, controls/indicates whether a device has been + made accessible to a guest. Supported sensor values: + + ``0``: ``isolate``, device is made inaccessible by guest OS. + + ``1``: ``unisolate``, device is made available to guest OS. + +* ``9002``: ``dr-indicator``, controls "visual" indicator associated with + device. Supported sensor values: + + ``0``: ``inactive``, resource may be safely removed. + + ``1``: ``active``, resource is in use and cannot be safely removed. + + ``2``: ``identify``, used to visually identify slot for interactive hot plug. + + ``3``: ``action``, in most cases, used in the same manner as identify. + +* ``9003``: ``allocation-state``, generally only used for "logical" DR resources + to request the allocation/deallocation of a resource prior to acquiring it via + ``isolation-state->unisolate``, or after releasing it via + ``isolation-state->isolate``, respectively. For "physical" DR (like PCI + hot plug/unplug) the pre-allocation of the resource is implied and this sensor + is unused. Supported sensor values: + + ``0``: ``unusable``, tell firmware/system the resource can be + unallocated/reclaimed and added back to the system resource pool. + + ``1``: ``usable``, request the resource be allocated/reserved for use by + guest OS. + + ``2``: ``exchange``, used to allocate a spare resource to use for fail-over + in certain situations. Unused in QEMU. + + ``3``: ``recover``, used to reclaim a previously allocated resource that's + not currently allocated to the guest OS. Unused in QEMU. + +``rtas-get-sensor-state:`` +-------------------------- + +Used to read an indicator or sensor value. + + ``arg[0]``: integer identifying sensor/indicator type. + + ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC + index + + ``output[0]``: status, 0 on success + +For DR-related operations, the only noteworthy sensor is ``dr-entity-sense``, +which has a type value of ``9003``, as ``allocation-state`` does in the case of +``rtas-set-indicator``. The semantics/encodings of the sensor values are +distinct however. + +Supported sensor values for ``dr-entity-sense`` (``9003``) sensor: + + ``0``: empty. + + For physical resources: DRC/slot is empty. + + For logical resources: unused. + + ``1``: present. + + For physical resources: DRC/slot is populated with a device/resource. + + For logical resources: resource has been allocated to the DRC. + + ``2``: unusable. + + For physical resources: unused. + + For logical resources: DRC has no resource allocated to it. + + ``3``: exchange. + + For physical resources: unused. + + For logical resources: resource available for exchange (see + ``allocation-state`` sensor semantics above). + + ``4``: recovery. + + For physical resources: unused. + + For logical resources: resource available for recovery (see + ``allocation-state`` sensor semantics above). + +``rtas-ibm-configure-connector`` +-------------------------------- + +Used to fetch an OpenFirmware device tree description of the resource associated +with a particular DRC. + + ``arg[0]``: guest physical address of 4096-byte work area buffer. + + ``arg[1]``: 0, or address of additional 4096-byte work area buffer; only + non-zero if a prior RTAS response indicated a need for additional memory. + + ``output[0]``: status: + + ``0``: completed transmittal of device tree node. + + ``1``: instruct guest to prepare for next device tree sibling node. + + ``2``: instruct guest to prepare for next device tree child node. + + ``3``: instruct guest to prepare for next device tree property. + + ``4``: instruct guest to ascend to parent device tree node. + + ``5``: instruct guest to provide additional work-area buffer via ``arg[1]``. + + ``990x``: instruct guest that operation took too long and to try again + later. + +The DRC index is encoded in the first 4-bytes of the first work area buffer. +Work area (``wa``) layout, using 4-byte offsets: + + ``wa[0]``: DRC index of the DRC to fetch device tree nodes from. + + ``wa[1]``: ``0`` (hard-coded). + + ``wa[2]``: + + For next-sibling/next-child response: + + ``wa`` offset of null-terminated string denoting the new node's name. + + For next-property response: + + ``wa`` offset of null-terminated string denoting new property's name. + + ``wa[3]``: for next-property response (unused otherwise): + + Byte-length of new property's value. + + ``wa[4]``: for next-property response (unused otherwise): + + New property's value, encoded as an OFDT-compatible byte array. + +Hot plug/unplug events +====================== For most DR operations, the hypervisor will issue host->guest add/remove events using the EPOW/check-exception notification framework, where the host issues a @@ -230,130 +322,140 @@ requests via EPOW events. For DR, this framework has been extended to include hotplug events, which were previously unneeded due to direct manipulation of DR-related guest userspace tools by host-level management such as an HMC. This level of management is not -applicable to PowerKVM, hence the reason for extending the notification +applicable to KVM on Power, hence the reason for extending the notification framework to support hotplug events. The format for these EPOW-signalled events is described below under -"hotplug/unplug event structure". Note that these events are not -formally part of the PAPR+ specification, and have been superseded by a -newer format, also described below under "hotplug/unplug event structure", -and so are now deemed a "legacy" format. The formats are similar, but the -"modern" format contains additional fields/flags, which are denoted for the -purposes of this documentation with "#ifdef GUEST_SUPPORTS_MODERN" guards. +:ref:`hot-plug-unplug-event-structure`. Note that these events are not formally +part of the PAPR+ specification, and have been superseded by a newer format, +also described below under :ref:`hot-plug-unplug-event-structure`, and so are +now deemed a "legacy" format. The formats are similar, but the "modern" format +contains additional fields/flags, which are denoted for the purposes of this +documentation with ``#ifdef GUEST_SUPPORTS_MODERN`` guards. QEMU should assume support only for "legacy" fields/flags unless the guest -advertises support for the "modern" format via ibm,client-architecture-support -hcall by setting byte 5, bit 6 of it's ibm,architecture-vec-5 option vector -structure (as described by LoPAPR v11, B.6.2.3). As with "legacy" format events, -"modern" format events are surfaced to the guest via check-exception RTAS calls, -but use a dedicated event source to signal the guest. This event source is -advertised to the guest by the addition of a "hot-plug-events" node under -"/event-sources" node of the guest's device tree using the standard format -described in LoPAPR v11, B.6.12.1. - -== hotplug/unplug event structure == - -The hotplug-specific payload in QEMU is implemented as follows (with all values +advertises support for the "modern" format via +``ibm,client-architecture-support`` hcall by setting byte 5, bit 6 of it's +``ibm,architecture-vec-5`` option vector structure (as described by [LoPAR]_, +section B.5.2.3). As with "legacy" format events, "modern" format events are +surfaced to the guest via check-exception RTAS calls, but use a dedicated event +source to signal the guest. This event source is advertised to the guest by the +addition of a ``hot-plug-events`` node under ``/event-sources`` node of the +guest's device tree using the standard format described in [LoPAR]_, +section B.5.12.2. + +.. _hot-plug-unplug-event-structure: + +Hot plug/unplug event structure +=============================== + +The hot plug specific payload in QEMU is implemented as follows (with all values encoded in big-endian format): -struct rtas_event_log_v6_hp { -#define SECTION_ID_HOTPLUG 0x4850 /* HP */ - struct section_header { - uint16_t section_id; /* set to SECTION_ID_HOTPLUG */ - uint16_t section_length; /* sizeof(rtas_event_log_v6_hp), - * plus the length of the DRC name - * if a DRC name identifier is - * specified for hotplug_identifier - */ - uint8_t section_version; /* version 1 */ - uint8_t section_subtype; /* unused */ - uint16_t creator_component_id; /* unused */ - } hdr; -#define RTAS_LOG_V6_HP_TYPE_CPU 1 -#define RTAS_LOG_V6_HP_TYPE_MEMORY 2 -#define RTAS_LOG_V6_HP_TYPE_SLOT 3 -#define RTAS_LOG_V6_HP_TYPE_PHB 4 -#define RTAS_LOG_V6_HP_TYPE_PCI 5 - uint8_t hotplug_type; /* type of resource/device */ -#define RTAS_LOG_V6_HP_ACTION_ADD 1 -#define RTAS_LOG_V6_HP_ACTION_REMOVE 2 - uint8_t hotplug_action; /* action (add/remove) */ -#define RTAS_LOG_V6_HP_ID_DRC_NAME 1 -#define RTAS_LOG_V6_HP_ID_DRC_INDEX 2 -#define RTAS_LOG_V6_HP_ID_DRC_COUNT 3 -#ifdef GUEST_SUPPORTS_MODERN -#define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4 -#endif - uint8_t hotplug_identifier; /* type of the resource identifier, - * which serves as the discriminator - * for the 'drc' union field below - */ -#ifdef GUEST_SUPPORTS_MODERN - uint8_t capabilities; /* capability flags, currently unused - * by QEMU - */ -#else - uint8_t reserved; -#endif - union { - uint32_t index; /* DRC index of resource to take action - * on - */ - uint32_t count; /* number of DR resources to take - * action on (guest chooses which) - */ -#ifdef GUEST_SUPPORTS_MODERN - struct { - uint32_t count; /* number of DR resources to take - * action on - */ - uint32_t index; /* DRC index of first resource to take - * action on. guest will take action - * on DRC index through - * DRC index in - * sequential order - */ - } count_indexed; -#endif - char name[1]; /* string representing the name of the - * DRC to take action on - */ - } drc; -} QEMU_PACKED; - -== ibm,lrdr-capacity == - -ibm,lrdr-capacity is a property in the /rtas device tree node that identifies -the dynamic reconfiguration capabilities of the guest. It consists of a triple -consisting of , and . - - , encoded in BE format represents the maximum address in bytes and +.. code-block:: c + + struct rtas_event_log_v6_hp { + #define SECTION_ID_HOTPLUG 0x4850 /* HP */ + struct section_header { + uint16_t section_id; /* set to SECTION_ID_HOTPLUG */ + uint16_t section_length; /* sizeof(rtas_event_log_v6_hp), + * plus the length of the DRC name + * if a DRC name identifier is + * specified for hotplug_identifier + */ + uint8_t section_version; /* version 1 */ + uint8_t section_subtype; /* unused */ + uint16_t creator_component_id; /* unused */ + } hdr; + #define RTAS_LOG_V6_HP_TYPE_CPU 1 + #define RTAS_LOG_V6_HP_TYPE_MEMORY 2 + #define RTAS_LOG_V6_HP_TYPE_SLOT 3 + #define RTAS_LOG_V6_HP_TYPE_PHB 4 + #define RTAS_LOG_V6_HP_TYPE_PCI 5 + uint8_t hotplug_type; /* type of resource/device */ + #define RTAS_LOG_V6_HP_ACTION_ADD 1 + #define RTAS_LOG_V6_HP_ACTION_REMOVE 2 + uint8_t hotplug_action; /* action (add/remove) */ + #define RTAS_LOG_V6_HP_ID_DRC_NAME 1 + #define RTAS_LOG_V6_HP_ID_DRC_INDEX 2 + #define RTAS_LOG_V6_HP_ID_DRC_COUNT 3 + #ifdef GUEST_SUPPORTS_MODERN + #define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4 + #endif + uint8_t hotplug_identifier; /* type of the resource identifier, + * which serves as the discriminator + * for the 'drc' union field below + */ + #ifdef GUEST_SUPPORTS_MODERN + uint8_t capabilities; /* capability flags, currently unused + * by QEMU + */ + #else + uint8_t reserved; + #endif + union { + uint32_t index; /* DRC index of resource to take action + * on + */ + uint32_t count; /* number of DR resources to take + * action on (guest chooses which) + */ + #ifdef GUEST_SUPPORTS_MODERN + struct { + uint32_t count; /* number of DR resources to take + * action on + */ + uint32_t index; /* DRC index of first resource to take + * action on. guest will take action + * on DRC index through + * DRC index in + * sequential order + */ + } count_indexed; + #endif + char name[1]; /* string representing the name of the + * DRC to take action on + */ + } drc; + } QEMU_PACKED; + +``ibm,lrdr-capacity`` +===================== + +``ibm,lrdr-capacity`` is a property in the /rtas device tree node that +identifies the dynamic reconfiguration capabilities of the guest. It consists +of a triple consisting of ````, ```` and ````. + + ````, encoded in BE format represents the maximum address in bytes and hence the maximum memory that can be allocated to the guest. - , encoded in BE format represents the size increments in which + ````, encoded in BE format represents the size increments in which memory can be hot-plugged to the guest. - , a BE-encoded integer, represents the maximum number of + ````, a BE-encoded integer, represents the maximum number of processors that the guest can have. -pseries guests use this property to note the maximum allowed CPUs for the +``pseries`` guests use this property to note the maximum allowed CPUs for the guest. -== ibm,dynamic-reconfiguration-memory == +``ibm,dynamic-reconfiguration-memory`` +====================================== -ibm,dynamic-reconfiguration-memory is a device tree node that represents -dynamically reconfigurable logical memory blocks (LMB). This node -is generated only when the guest advertises the support for it via -ibm,client-architecture-support call. Memory that is not dynamically -reconfigurable is represented by /memory nodes. The properties of this -node that are of interest to the sPAPR memory hotplug implementation -in QEMU are described here. +``ibm,dynamic-reconfiguration-memory`` is a device tree node that represents +dynamically reconfigurable logical memory blocks (LMB). This node is generated +only when the guest advertises the support for it via +``ibm,client-architecture-support`` call. Memory that is not dynamically +reconfigurable is represented by ``/memory`` nodes. The properties of this node +that are of interest to the sPAPR memory hotplug implementation in QEMU are +described here. -ibm,lmb-size +``ibm,lmb-size`` +---------------- -This 64bit integer defines the size of each dynamically reconfigurable LMB. +This 64-bit integer defines the size of each dynamically reconfigurable LMB. -ibm,associativity-lookup-arrays +``ibm,associativity-lookup-arrays`` +----------------------------------- This property defines a lookup array in which the NUMA associativity information for each LMB can be found. It is a property encoded array @@ -361,13 +463,14 @@ that begins with an integer M, the number of associativity lists followed by an integer N, the number of entries per associativity list and terminated by M associativity lists each of length N integers. -This property provides the same information as given by ibm,associativity -property in a /memory node. Each assigned LMB has an index value between +This property provides the same information as given by ``ibm,associativity`` +property in a ``/memory`` node. Each assigned LMB has an index value between 0 and M-1 which is used as an index into this table to select which -associativity list to use for the LMB. This index value for each LMB -is defined in ibm,dynamic-memory property. +associativity list to use for the LMB. This index value for each LMB is defined +in ``ibm,dynamic-memory`` property. -ibm,dynamic-memory +``ibm,dynamic-memory`` +---------------------- This property describes the dynamically reconfigurable memory. It is a property encoded array that has an integer N, the number of LMBs followed @@ -375,19 +478,19 @@ by N LMB list entries. Each LMB list entry consists of the following elements: -- Logical address of the start of the LMB encoded as a 64bit integer. This - corresponds to reg property in /memory node. -- DRC index of the LMB that corresponds to ibm,my-drc-index property - in a /memory node. +- Logical address of the start of the LMB encoded as a 64-bit integer. This + corresponds to ``reg`` property in ``/memory`` node. +- DRC index of the LMB that corresponds to ``ibm,my-drc-index`` property + in a ``/memory`` node. - Four bytes reserved for expansion. - Associativity list index for the LMB that is used as an index into - ibm,associativity-lookup-arrays property described earlier. This - is used to retrieve the right associativity list to be used for this - LMB. -- A 32bit flags word. The bit at bit position 0x00000008 defines whether + ``ibm,associativity-lookup-arrays`` property described earlier. This is used + to retrieve the right associativity list to be used for this LMB. +- A 32-bit flags word. The bit at bit position ``0x00000008`` defines whether the LMB is assigned to the partition as of boot time. -ibm,dynamic-memory-v2 +``ibm,dynamic-memory-v2`` +------------------------- This property describes the dynamically reconfigurable memory. This is an alternate and newer way to describe dynamically reconfigurable memory. @@ -397,13 +500,11 @@ for each sequential group of LMBs that share common attributes. Each LMB set entry consists of the following elements: -- Number of sequential LMBs in the entry represented by a 32bit integer. -- Logical address of the first LMB in the set encoded as a 64bit integer. +- Number of sequential LMBs in the entry represented by a 32-bit integer. +- Logical address of the first LMB in the set encoded as a 64-bit integer. - DRC index of the first LMB in the set. - Associativity list index that is used as an index into - ibm,associativity-lookup-arrays property described earlier. This + ``ibm,associativity-lookup-arrays`` property described earlier. This is used to retrieve the right associativity list to be used for all the LMBs in this set. -- A 32bit flags word that applies to all the LMBs in the set. - -[1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867 +- A 32-bit flags word that applies to all the LMBs in the set. -- cgit v1.2.3-55-g7522 From 55ff468f7816ff40e4058153127c9d19ffd36261 Mon Sep 17 00:00:00 2001 From: Leonardo Garcia Date: Tue, 18 Jan 2022 12:56:30 +0100 Subject: docs: Rename ppc-spapr-hotplug.txt to ppc-spapr-hotplug.rst. Signed-off-by: Leonardo Garcia Reviewed-by: Daniel Henrique Barboza Message-Id: <1f5860217273f272fddadc68b5d205b4090f6b04.1641995058.git.lagarcia@br.ibm.com> Signed-off-by: Cédric Le Goater --- docs/specs/ppc-spapr-hotplug.rst | 510 +++++++++++++++++++++++++++++++++++++++ docs/specs/ppc-spapr-hotplug.txt | 510 --------------------------------------- 2 files changed, 510 insertions(+), 510 deletions(-) create mode 100644 docs/specs/ppc-spapr-hotplug.rst delete mode 100644 docs/specs/ppc-spapr-hotplug.txt (limited to 'docs') diff --git a/docs/specs/ppc-spapr-hotplug.rst b/docs/specs/ppc-spapr-hotplug.rst new file mode 100644 index 0000000000..f84dc55ad9 --- /dev/null +++ b/docs/specs/ppc-spapr-hotplug.rst @@ -0,0 +1,510 @@ +============================= +sPAPR Dynamic Reconfiguration +============================= + +sPAPR or pSeries guests make use of a facility called dynamic reconfiguration +to handle hot plugging of dynamic "physical" resources like PCI cards, or +"logical"/para-virtual resources like memory, CPUs, and "physical" +host-bridges, which are generally managed by the host/hypervisor and provided +to guests as virtualized resources. The specifics of dynamic reconfiguration +are documented extensively in section 13 of the Linux on Power Architecture +Reference document ([LoPAR]_). This document provides a summary of that +information as it applies to the implementation within QEMU. + +Dynamic-reconfiguration Connectors +================================== + +To manage hot plug/unplug of these resources, a firmware abstraction known as +a Dynamic Resource Connector (DRC) is used to assign a particular dynamic +resource to the guest, and provide an interface for the guest to manage +configuration/removal of the resource associated with it. + +Device tree description of DRCs +=============================== + +A set of four Open Firmware device tree array properties are used to describe +the name/index/power-domain/type of each DRC allocated to a guest at +boot time. There may be multiple sets of these arrays, rooted at different +paths in the device tree depending on the type of resource the DRCs manage. + +In some cases, the DRCs themselves may be provided by a dynamic resource, +such as the DRCs managing PCI slots on a hot plugged PHB. In this case the +arrays would be fetched as part of the device tree retrieval interfaces +for hot plugged resources described under :ref:`guest-host-interface`. + +The array properties are described below. Each entry/element in an array +describes the DRC identified by the element in the corresponding position +of ``ibm,drc-indexes``: + +``ibm,drc-names`` +----------------- + + First 4-bytes: big-endian (BE) encoded integer denoting the number of entries. + + Each entry: a NULL-terminated ```` string encoded as a byte array. + + ```` values for logical/virtual resources are defined in the Linux on + Power Architecture Reference ([LoPAR]_) section 13.5.2.4, and basically + consist of the type of the resource followed by a space and a numerical + value that's unique across resources of that type. + + ```` values for "physical" resources such as PCI or VIO devices are + defined as being "location codes", which are the "location labels" of each + encapsulating device, starting from the chassis down to the individual slot + for the device, concatenated by a hyphen. This provides a mapping of + resources to a physical location in a chassis for debugging purposes. For + QEMU, this mapping is less important, so we assign a location code that + conforms to naming specifications, but is simply a location label for the + slot by itself to simplify the implementation. The naming convention for + location labels is documented in detail in the [LoPAR]_ section 12.3.1.5, + and in our case amounts to using ``C`` for PCI/VIO device slots, where + ```` is unique across all PCI/VIO device slots. + +``ibm,drc-indexes`` +------------------- + + First 4-bytes: BE-encoded integer denoting the number of entries. + + Each 4-byte entry: BE-encoded ```` integer that is unique across all + DRCs in the machine. + + ```` is arbitrary, but in the case of QEMU we try to maintain the + convention used to assign them to pSeries guests on pHyp (the hypervisor + portion of PowerVM): + + ``bit[31:28]``: integer encoding of ````, where ```` is: + + ``1`` for CPU resource. + + ``2`` for PHB resource. + + ``3`` for VIO resource. + + ``4`` for PCI resource. + + ``8`` for memory resource. + + ``bit[27:0]``: integer encoding of ````, where ```` is unique + across all resources of specified type. + +``ibm,drc-power-domains`` +------------------------- + + First 4-bytes: BE-encoded integer denoting the number of entries. + + Each 4-byte entry: 32-bit, BE-encoded ```` integer that specifies the + power domain the resource will be assigned to. In the case of QEMU we + associated all resources with a "live insertion" domain, where the power is + assumed to be managed automatically. The integer value for this domain is a + special value of ``-1``. + + +``ibm,drc-types`` +----------------- + + First 4-bytes: BE-encoded integer denoting the number of entries. + + Each entry: a NULL-terminated ```` string encoded as a byte array. + ```` is assigned as follows: + + "CPU" for a CPU. + + "PHB" for a physical host-bridge. + + "SLOT" for a VIO slot. + + "28" for a PCI slot. + + "MEM" for memory resource. + +.. _guest-host-interface: + +Guest->Host interface to manage dynamic resources +================================================= + +Each DRC is given a globally unique DRC index, and resources associated with a +particular DRC are configured/managed by the guest via a number of RTAS calls +which reference individual DRCs based on the DRC index. This can be considered +the guest->host interface. + +``rtas-set-power-level`` +------------------------ + +Set the power level for a specified power domain. + + ``arg[0]``: integer identifying power domain. + + ``arg[1]``: new power level for the domain, ``0-100``. + + ``output[0]``: status, ``0`` on success. + + ``output[1]``: power level after command. + +``rtas-get-power-level`` +------------------------ + +Get the power level for a specified power domain. + + ``arg[0]``: integer identifying power domain. + + ``output[0]``: status, ``0`` on success. + + ``output[1]``: current power level. + +``rtas-set-indicator`` +---------------------- + +Set the state of an indicator or sensor. + + ``arg[0]``: integer identifying sensor/indicator type. + + ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC + index. + + ``arg[2]``: desired sensor value. + + ``output[0]``: status, ``0`` on success. + +For the purpose of this document we focus on the indicator/sensor types +associated with a DRC. The types are: + +* ``9001``: ``isolation-state``, controls/indicates whether a device has been + made accessible to a guest. Supported sensor values: + + ``0``: ``isolate``, device is made inaccessible by guest OS. + + ``1``: ``unisolate``, device is made available to guest OS. + +* ``9002``: ``dr-indicator``, controls "visual" indicator associated with + device. Supported sensor values: + + ``0``: ``inactive``, resource may be safely removed. + + ``1``: ``active``, resource is in use and cannot be safely removed. + + ``2``: ``identify``, used to visually identify slot for interactive hot plug. + + ``3``: ``action``, in most cases, used in the same manner as identify. + +* ``9003``: ``allocation-state``, generally only used for "logical" DR resources + to request the allocation/deallocation of a resource prior to acquiring it via + ``isolation-state->unisolate``, or after releasing it via + ``isolation-state->isolate``, respectively. For "physical" DR (like PCI + hot plug/unplug) the pre-allocation of the resource is implied and this sensor + is unused. Supported sensor values: + + ``0``: ``unusable``, tell firmware/system the resource can be + unallocated/reclaimed and added back to the system resource pool. + + ``1``: ``usable``, request the resource be allocated/reserved for use by + guest OS. + + ``2``: ``exchange``, used to allocate a spare resource to use for fail-over + in certain situations. Unused in QEMU. + + ``3``: ``recover``, used to reclaim a previously allocated resource that's + not currently allocated to the guest OS. Unused in QEMU. + +``rtas-get-sensor-state:`` +-------------------------- + +Used to read an indicator or sensor value. + + ``arg[0]``: integer identifying sensor/indicator type. + + ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC + index + + ``output[0]``: status, 0 on success + +For DR-related operations, the only noteworthy sensor is ``dr-entity-sense``, +which has a type value of ``9003``, as ``allocation-state`` does in the case of +``rtas-set-indicator``. The semantics/encodings of the sensor values are +distinct however. + +Supported sensor values for ``dr-entity-sense`` (``9003``) sensor: + + ``0``: empty. + + For physical resources: DRC/slot is empty. + + For logical resources: unused. + + ``1``: present. + + For physical resources: DRC/slot is populated with a device/resource. + + For logical resources: resource has been allocated to the DRC. + + ``2``: unusable. + + For physical resources: unused. + + For logical resources: DRC has no resource allocated to it. + + ``3``: exchange. + + For physical resources: unused. + + For logical resources: resource available for exchange (see + ``allocation-state`` sensor semantics above). + + ``4``: recovery. + + For physical resources: unused. + + For logical resources: resource available for recovery (see + ``allocation-state`` sensor semantics above). + +``rtas-ibm-configure-connector`` +-------------------------------- + +Used to fetch an OpenFirmware device tree description of the resource associated +with a particular DRC. + + ``arg[0]``: guest physical address of 4096-byte work area buffer. + + ``arg[1]``: 0, or address of additional 4096-byte work area buffer; only + non-zero if a prior RTAS response indicated a need for additional memory. + + ``output[0]``: status: + + ``0``: completed transmittal of device tree node. + + ``1``: instruct guest to prepare for next device tree sibling node. + + ``2``: instruct guest to prepare for next device tree child node. + + ``3``: instruct guest to prepare for next device tree property. + + ``4``: instruct guest to ascend to parent device tree node. + + ``5``: instruct guest to provide additional work-area buffer via ``arg[1]``. + + ``990x``: instruct guest that operation took too long and to try again + later. + +The DRC index is encoded in the first 4-bytes of the first work area buffer. +Work area (``wa``) layout, using 4-byte offsets: + + ``wa[0]``: DRC index of the DRC to fetch device tree nodes from. + + ``wa[1]``: ``0`` (hard-coded). + + ``wa[2]``: + + For next-sibling/next-child response: + + ``wa`` offset of null-terminated string denoting the new node's name. + + For next-property response: + + ``wa`` offset of null-terminated string denoting new property's name. + + ``wa[3]``: for next-property response (unused otherwise): + + Byte-length of new property's value. + + ``wa[4]``: for next-property response (unused otherwise): + + New property's value, encoded as an OFDT-compatible byte array. + +Hot plug/unplug events +====================== + +For most DR operations, the hypervisor will issue host->guest add/remove events +using the EPOW/check-exception notification framework, where the host issues a +check-exception interrupt, then provides an RTAS event log via an +rtas-check-exception call issued by the guest in response. This framework is +documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown +requests via EPOW events. + +For DR, this framework has been extended to include hotplug events, which were +previously unneeded due to direct manipulation of DR-related guest userspace +tools by host-level management such as an HMC. This level of management is not +applicable to KVM on Power, hence the reason for extending the notification +framework to support hotplug events. + +The format for these EPOW-signalled events is described below under +:ref:`hot-plug-unplug-event-structure`. Note that these events are not formally +part of the PAPR+ specification, and have been superseded by a newer format, +also described below under :ref:`hot-plug-unplug-event-structure`, and so are +now deemed a "legacy" format. The formats are similar, but the "modern" format +contains additional fields/flags, which are denoted for the purposes of this +documentation with ``#ifdef GUEST_SUPPORTS_MODERN`` guards. + +QEMU should assume support only for "legacy" fields/flags unless the guest +advertises support for the "modern" format via +``ibm,client-architecture-support`` hcall by setting byte 5, bit 6 of it's +``ibm,architecture-vec-5`` option vector structure (as described by [LoPAR]_, +section B.5.2.3). As with "legacy" format events, "modern" format events are +surfaced to the guest via check-exception RTAS calls, but use a dedicated event +source to signal the guest. This event source is advertised to the guest by the +addition of a ``hot-plug-events`` node under ``/event-sources`` node of the +guest's device tree using the standard format described in [LoPAR]_, +section B.5.12.2. + +.. _hot-plug-unplug-event-structure: + +Hot plug/unplug event structure +=============================== + +The hot plug specific payload in QEMU is implemented as follows (with all values +encoded in big-endian format): + +.. code-block:: c + + struct rtas_event_log_v6_hp { + #define SECTION_ID_HOTPLUG 0x4850 /* HP */ + struct section_header { + uint16_t section_id; /* set to SECTION_ID_HOTPLUG */ + uint16_t section_length; /* sizeof(rtas_event_log_v6_hp), + * plus the length of the DRC name + * if a DRC name identifier is + * specified for hotplug_identifier + */ + uint8_t section_version; /* version 1 */ + uint8_t section_subtype; /* unused */ + uint16_t creator_component_id; /* unused */ + } hdr; + #define RTAS_LOG_V6_HP_TYPE_CPU 1 + #define RTAS_LOG_V6_HP_TYPE_MEMORY 2 + #define RTAS_LOG_V6_HP_TYPE_SLOT 3 + #define RTAS_LOG_V6_HP_TYPE_PHB 4 + #define RTAS_LOG_V6_HP_TYPE_PCI 5 + uint8_t hotplug_type; /* type of resource/device */ + #define RTAS_LOG_V6_HP_ACTION_ADD 1 + #define RTAS_LOG_V6_HP_ACTION_REMOVE 2 + uint8_t hotplug_action; /* action (add/remove) */ + #define RTAS_LOG_V6_HP_ID_DRC_NAME 1 + #define RTAS_LOG_V6_HP_ID_DRC_INDEX 2 + #define RTAS_LOG_V6_HP_ID_DRC_COUNT 3 + #ifdef GUEST_SUPPORTS_MODERN + #define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4 + #endif + uint8_t hotplug_identifier; /* type of the resource identifier, + * which serves as the discriminator + * for the 'drc' union field below + */ + #ifdef GUEST_SUPPORTS_MODERN + uint8_t capabilities; /* capability flags, currently unused + * by QEMU + */ + #else + uint8_t reserved; + #endif + union { + uint32_t index; /* DRC index of resource to take action + * on + */ + uint32_t count; /* number of DR resources to take + * action on (guest chooses which) + */ + #ifdef GUEST_SUPPORTS_MODERN + struct { + uint32_t count; /* number of DR resources to take + * action on + */ + uint32_t index; /* DRC index of first resource to take + * action on. guest will take action + * on DRC index through + * DRC index in + * sequential order + */ + } count_indexed; + #endif + char name[1]; /* string representing the name of the + * DRC to take action on + */ + } drc; + } QEMU_PACKED; + +``ibm,lrdr-capacity`` +===================== + +``ibm,lrdr-capacity`` is a property in the /rtas device tree node that +identifies the dynamic reconfiguration capabilities of the guest. It consists +of a triple consisting of ````, ```` and ````. + + ````, encoded in BE format represents the maximum address in bytes and + hence the maximum memory that can be allocated to the guest. + + ````, encoded in BE format represents the size increments in which + memory can be hot-plugged to the guest. + + ````, a BE-encoded integer, represents the maximum number of + processors that the guest can have. + +``pseries`` guests use this property to note the maximum allowed CPUs for the +guest. + +``ibm,dynamic-reconfiguration-memory`` +====================================== + +``ibm,dynamic-reconfiguration-memory`` is a device tree node that represents +dynamically reconfigurable logical memory blocks (LMB). This node is generated +only when the guest advertises the support for it via +``ibm,client-architecture-support`` call. Memory that is not dynamically +reconfigurable is represented by ``/memory`` nodes. The properties of this node +that are of interest to the sPAPR memory hotplug implementation in QEMU are +described here. + +``ibm,lmb-size`` +---------------- + +This 64-bit integer defines the size of each dynamically reconfigurable LMB. + +``ibm,associativity-lookup-arrays`` +----------------------------------- + +This property defines a lookup array in which the NUMA associativity +information for each LMB can be found. It is a property encoded array +that begins with an integer M, the number of associativity lists followed +by an integer N, the number of entries per associativity list and terminated +by M associativity lists each of length N integers. + +This property provides the same information as given by ``ibm,associativity`` +property in a ``/memory`` node. Each assigned LMB has an index value between +0 and M-1 which is used as an index into this table to select which +associativity list to use for the LMB. This index value for each LMB is defined +in ``ibm,dynamic-memory`` property. + +``ibm,dynamic-memory`` +---------------------- + +This property describes the dynamically reconfigurable memory. It is a +property encoded array that has an integer N, the number of LMBs followed +by N LMB list entries. + +Each LMB list entry consists of the following elements: + +- Logical address of the start of the LMB encoded as a 64-bit integer. This + corresponds to ``reg`` property in ``/memory`` node. +- DRC index of the LMB that corresponds to ``ibm,my-drc-index`` property + in a ``/memory`` node. +- Four bytes reserved for expansion. +- Associativity list index for the LMB that is used as an index into + ``ibm,associativity-lookup-arrays`` property described earlier. This is used + to retrieve the right associativity list to be used for this LMB. +- A 32-bit flags word. The bit at bit position ``0x00000008`` defines whether + the LMB is assigned to the partition as of boot time. + +``ibm,dynamic-memory-v2`` +------------------------- + +This property describes the dynamically reconfigurable memory. This is +an alternate and newer way to describe dynamically reconfigurable memory. +It is a property encoded array that has an integer N (the number of +LMB set entries) followed by N LMB set entries. There is an LMB set entry +for each sequential group of LMBs that share common attributes. + +Each LMB set entry consists of the following elements: + +- Number of sequential LMBs in the entry represented by a 32-bit integer. +- Logical address of the first LMB in the set encoded as a 64-bit integer. +- DRC index of the first LMB in the set. +- Associativity list index that is used as an index into + ``ibm,associativity-lookup-arrays`` property described earlier. This + is used to retrieve the right associativity list to be used for all + the LMBs in this set. +- A 32-bit flags word that applies to all the LMBs in the set. diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt deleted file mode 100644 index f84dc55ad9..0000000000 --- a/docs/specs/ppc-spapr-hotplug.txt +++ /dev/null @@ -1,510 +0,0 @@ -============================= -sPAPR Dynamic Reconfiguration -============================= - -sPAPR or pSeries guests make use of a facility called dynamic reconfiguration -to handle hot plugging of dynamic "physical" resources like PCI cards, or -"logical"/para-virtual resources like memory, CPUs, and "physical" -host-bridges, which are generally managed by the host/hypervisor and provided -to guests as virtualized resources. The specifics of dynamic reconfiguration -are documented extensively in section 13 of the Linux on Power Architecture -Reference document ([LoPAR]_). This document provides a summary of that -information as it applies to the implementation within QEMU. - -Dynamic-reconfiguration Connectors -================================== - -To manage hot plug/unplug of these resources, a firmware abstraction known as -a Dynamic Resource Connector (DRC) is used to assign a particular dynamic -resource to the guest, and provide an interface for the guest to manage -configuration/removal of the resource associated with it. - -Device tree description of DRCs -=============================== - -A set of four Open Firmware device tree array properties are used to describe -the name/index/power-domain/type of each DRC allocated to a guest at -boot time. There may be multiple sets of these arrays, rooted at different -paths in the device tree depending on the type of resource the DRCs manage. - -In some cases, the DRCs themselves may be provided by a dynamic resource, -such as the DRCs managing PCI slots on a hot plugged PHB. In this case the -arrays would be fetched as part of the device tree retrieval interfaces -for hot plugged resources described under :ref:`guest-host-interface`. - -The array properties are described below. Each entry/element in an array -describes the DRC identified by the element in the corresponding position -of ``ibm,drc-indexes``: - -``ibm,drc-names`` ------------------ - - First 4-bytes: big-endian (BE) encoded integer denoting the number of entries. - - Each entry: a NULL-terminated ```` string encoded as a byte array. - - ```` values for logical/virtual resources are defined in the Linux on - Power Architecture Reference ([LoPAR]_) section 13.5.2.4, and basically - consist of the type of the resource followed by a space and a numerical - value that's unique across resources of that type. - - ```` values for "physical" resources such as PCI or VIO devices are - defined as being "location codes", which are the "location labels" of each - encapsulating device, starting from the chassis down to the individual slot - for the device, concatenated by a hyphen. This provides a mapping of - resources to a physical location in a chassis for debugging purposes. For - QEMU, this mapping is less important, so we assign a location code that - conforms to naming specifications, but is simply a location label for the - slot by itself to simplify the implementation. The naming convention for - location labels is documented in detail in the [LoPAR]_ section 12.3.1.5, - and in our case amounts to using ``C`` for PCI/VIO device slots, where - ```` is unique across all PCI/VIO device slots. - -``ibm,drc-indexes`` -------------------- - - First 4-bytes: BE-encoded integer denoting the number of entries. - - Each 4-byte entry: BE-encoded ```` integer that is unique across all - DRCs in the machine. - - ```` is arbitrary, but in the case of QEMU we try to maintain the - convention used to assign them to pSeries guests on pHyp (the hypervisor - portion of PowerVM): - - ``bit[31:28]``: integer encoding of ````, where ```` is: - - ``1`` for CPU resource. - - ``2`` for PHB resource. - - ``3`` for VIO resource. - - ``4`` for PCI resource. - - ``8`` for memory resource. - - ``bit[27:0]``: integer encoding of ````, where ```` is unique - across all resources of specified type. - -``ibm,drc-power-domains`` -------------------------- - - First 4-bytes: BE-encoded integer denoting the number of entries. - - Each 4-byte entry: 32-bit, BE-encoded ```` integer that specifies the - power domain the resource will be assigned to. In the case of QEMU we - associated all resources with a "live insertion" domain, where the power is - assumed to be managed automatically. The integer value for this domain is a - special value of ``-1``. - - -``ibm,drc-types`` ------------------ - - First 4-bytes: BE-encoded integer denoting the number of entries. - - Each entry: a NULL-terminated ```` string encoded as a byte array. - ```` is assigned as follows: - - "CPU" for a CPU. - - "PHB" for a physical host-bridge. - - "SLOT" for a VIO slot. - - "28" for a PCI slot. - - "MEM" for memory resource. - -.. _guest-host-interface: - -Guest->Host interface to manage dynamic resources -================================================= - -Each DRC is given a globally unique DRC index, and resources associated with a -particular DRC are configured/managed by the guest via a number of RTAS calls -which reference individual DRCs based on the DRC index. This can be considered -the guest->host interface. - -``rtas-set-power-level`` ------------------------- - -Set the power level for a specified power domain. - - ``arg[0]``: integer identifying power domain. - - ``arg[1]``: new power level for the domain, ``0-100``. - - ``output[0]``: status, ``0`` on success. - - ``output[1]``: power level after command. - -``rtas-get-power-level`` ------------------------- - -Get the power level for a specified power domain. - - ``arg[0]``: integer identifying power domain. - - ``output[0]``: status, ``0`` on success. - - ``output[1]``: current power level. - -``rtas-set-indicator`` ----------------------- - -Set the state of an indicator or sensor. - - ``arg[0]``: integer identifying sensor/indicator type. - - ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC - index. - - ``arg[2]``: desired sensor value. - - ``output[0]``: status, ``0`` on success. - -For the purpose of this document we focus on the indicator/sensor types -associated with a DRC. The types are: - -* ``9001``: ``isolation-state``, controls/indicates whether a device has been - made accessible to a guest. Supported sensor values: - - ``0``: ``isolate``, device is made inaccessible by guest OS. - - ``1``: ``unisolate``, device is made available to guest OS. - -* ``9002``: ``dr-indicator``, controls "visual" indicator associated with - device. Supported sensor values: - - ``0``: ``inactive``, resource may be safely removed. - - ``1``: ``active``, resource is in use and cannot be safely removed. - - ``2``: ``identify``, used to visually identify slot for interactive hot plug. - - ``3``: ``action``, in most cases, used in the same manner as identify. - -* ``9003``: ``allocation-state``, generally only used for "logical" DR resources - to request the allocation/deallocation of a resource prior to acquiring it via - ``isolation-state->unisolate``, or after releasing it via - ``isolation-state->isolate``, respectively. For "physical" DR (like PCI - hot plug/unplug) the pre-allocation of the resource is implied and this sensor - is unused. Supported sensor values: - - ``0``: ``unusable``, tell firmware/system the resource can be - unallocated/reclaimed and added back to the system resource pool. - - ``1``: ``usable``, request the resource be allocated/reserved for use by - guest OS. - - ``2``: ``exchange``, used to allocate a spare resource to use for fail-over - in certain situations. Unused in QEMU. - - ``3``: ``recover``, used to reclaim a previously allocated resource that's - not currently allocated to the guest OS. Unused in QEMU. - -``rtas-get-sensor-state:`` --------------------------- - -Used to read an indicator or sensor value. - - ``arg[0]``: integer identifying sensor/indicator type. - - ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC - index - - ``output[0]``: status, 0 on success - -For DR-related operations, the only noteworthy sensor is ``dr-entity-sense``, -which has a type value of ``9003``, as ``allocation-state`` does in the case of -``rtas-set-indicator``. The semantics/encodings of the sensor values are -distinct however. - -Supported sensor values for ``dr-entity-sense`` (``9003``) sensor: - - ``0``: empty. - - For physical resources: DRC/slot is empty. - - For logical resources: unused. - - ``1``: present. - - For physical resources: DRC/slot is populated with a device/resource. - - For logical resources: resource has been allocated to the DRC. - - ``2``: unusable. - - For physical resources: unused. - - For logical resources: DRC has no resource allocated to it. - - ``3``: exchange. - - For physical resources: unused. - - For logical resources: resource available for exchange (see - ``allocation-state`` sensor semantics above). - - ``4``: recovery. - - For physical resources: unused. - - For logical resources: resource available for recovery (see - ``allocation-state`` sensor semantics above). - -``rtas-ibm-configure-connector`` --------------------------------- - -Used to fetch an OpenFirmware device tree description of the resource associated -with a particular DRC. - - ``arg[0]``: guest physical address of 4096-byte work area buffer. - - ``arg[1]``: 0, or address of additional 4096-byte work area buffer; only - non-zero if a prior RTAS response indicated a need for additional memory. - - ``output[0]``: status: - - ``0``: completed transmittal of device tree node. - - ``1``: instruct guest to prepare for next device tree sibling node. - - ``2``: instruct guest to prepare for next device tree child node. - - ``3``: instruct guest to prepare for next device tree property. - - ``4``: instruct guest to ascend to parent device tree node. - - ``5``: instruct guest to provide additional work-area buffer via ``arg[1]``. - - ``990x``: instruct guest that operation took too long and to try again - later. - -The DRC index is encoded in the first 4-bytes of the first work area buffer. -Work area (``wa``) layout, using 4-byte offsets: - - ``wa[0]``: DRC index of the DRC to fetch device tree nodes from. - - ``wa[1]``: ``0`` (hard-coded). - - ``wa[2]``: - - For next-sibling/next-child response: - - ``wa`` offset of null-terminated string denoting the new node's name. - - For next-property response: - - ``wa`` offset of null-terminated string denoting new property's name. - - ``wa[3]``: for next-property response (unused otherwise): - - Byte-length of new property's value. - - ``wa[4]``: for next-property response (unused otherwise): - - New property's value, encoded as an OFDT-compatible byte array. - -Hot plug/unplug events -====================== - -For most DR operations, the hypervisor will issue host->guest add/remove events -using the EPOW/check-exception notification framework, where the host issues a -check-exception interrupt, then provides an RTAS event log via an -rtas-check-exception call issued by the guest in response. This framework is -documented by PAPR+ v2.7, and already use in by QEMU for generating powerdown -requests via EPOW events. - -For DR, this framework has been extended to include hotplug events, which were -previously unneeded due to direct manipulation of DR-related guest userspace -tools by host-level management such as an HMC. This level of management is not -applicable to KVM on Power, hence the reason for extending the notification -framework to support hotplug events. - -The format for these EPOW-signalled events is described below under -:ref:`hot-plug-unplug-event-structure`. Note that these events are not formally -part of the PAPR+ specification, and have been superseded by a newer format, -also described below under :ref:`hot-plug-unplug-event-structure`, and so are -now deemed a "legacy" format. The formats are similar, but the "modern" format -contains additional fields/flags, which are denoted for the purposes of this -documentation with ``#ifdef GUEST_SUPPORTS_MODERN`` guards. - -QEMU should assume support only for "legacy" fields/flags unless the guest -advertises support for the "modern" format via -``ibm,client-architecture-support`` hcall by setting byte 5, bit 6 of it's -``ibm,architecture-vec-5`` option vector structure (as described by [LoPAR]_, -section B.5.2.3). As with "legacy" format events, "modern" format events are -surfaced to the guest via check-exception RTAS calls, but use a dedicated event -source to signal the guest. This event source is advertised to the guest by the -addition of a ``hot-plug-events`` node under ``/event-sources`` node of the -guest's device tree using the standard format described in [LoPAR]_, -section B.5.12.2. - -.. _hot-plug-unplug-event-structure: - -Hot plug/unplug event structure -=============================== - -The hot plug specific payload in QEMU is implemented as follows (with all values -encoded in big-endian format): - -.. code-block:: c - - struct rtas_event_log_v6_hp { - #define SECTION_ID_HOTPLUG 0x4850 /* HP */ - struct section_header { - uint16_t section_id; /* set to SECTION_ID_HOTPLUG */ - uint16_t section_length; /* sizeof(rtas_event_log_v6_hp), - * plus the length of the DRC name - * if a DRC name identifier is - * specified for hotplug_identifier - */ - uint8_t section_version; /* version 1 */ - uint8_t section_subtype; /* unused */ - uint16_t creator_component_id; /* unused */ - } hdr; - #define RTAS_LOG_V6_HP_TYPE_CPU 1 - #define RTAS_LOG_V6_HP_TYPE_MEMORY 2 - #define RTAS_LOG_V6_HP_TYPE_SLOT 3 - #define RTAS_LOG_V6_HP_TYPE_PHB 4 - #define RTAS_LOG_V6_HP_TYPE_PCI 5 - uint8_t hotplug_type; /* type of resource/device */ - #define RTAS_LOG_V6_HP_ACTION_ADD 1 - #define RTAS_LOG_V6_HP_ACTION_REMOVE 2 - uint8_t hotplug_action; /* action (add/remove) */ - #define RTAS_LOG_V6_HP_ID_DRC_NAME 1 - #define RTAS_LOG_V6_HP_ID_DRC_INDEX 2 - #define RTAS_LOG_V6_HP_ID_DRC_COUNT 3 - #ifdef GUEST_SUPPORTS_MODERN - #define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4 - #endif - uint8_t hotplug_identifier; /* type of the resource identifier, - * which serves as the discriminator - * for the 'drc' union field below - */ - #ifdef GUEST_SUPPORTS_MODERN - uint8_t capabilities; /* capability flags, currently unused - * by QEMU - */ - #else - uint8_t reserved; - #endif - union { - uint32_t index; /* DRC index of resource to take action - * on - */ - uint32_t count; /* number of DR resources to take - * action on (guest chooses which) - */ - #ifdef GUEST_SUPPORTS_MODERN - struct { - uint32_t count; /* number of DR resources to take - * action on - */ - uint32_t index; /* DRC index of first resource to take - * action on. guest will take action - * on DRC index through - * DRC index in - * sequential order - */ - } count_indexed; - #endif - char name[1]; /* string representing the name of the - * DRC to take action on - */ - } drc; - } QEMU_PACKED; - -``ibm,lrdr-capacity`` -===================== - -``ibm,lrdr-capacity`` is a property in the /rtas device tree node that -identifies the dynamic reconfiguration capabilities of the guest. It consists -of a triple consisting of ````, ```` and ````. - - ````, encoded in BE format represents the maximum address in bytes and - hence the maximum memory that can be allocated to the guest. - - ````, encoded in BE format represents the size increments in which - memory can be hot-plugged to the guest. - - ````, a BE-encoded integer, represents the maximum number of - processors that the guest can have. - -``pseries`` guests use this property to note the maximum allowed CPUs for the -guest. - -``ibm,dynamic-reconfiguration-memory`` -====================================== - -``ibm,dynamic-reconfiguration-memory`` is a device tree node that represents -dynamically reconfigurable logical memory blocks (LMB). This node is generated -only when the guest advertises the support for it via -``ibm,client-architecture-support`` call. Memory that is not dynamically -reconfigurable is represented by ``/memory`` nodes. The properties of this node -that are of interest to the sPAPR memory hotplug implementation in QEMU are -described here. - -``ibm,lmb-size`` ----------------- - -This 64-bit integer defines the size of each dynamically reconfigurable LMB. - -``ibm,associativity-lookup-arrays`` ------------------------------------ - -This property defines a lookup array in which the NUMA associativity -information for each LMB can be found. It is a property encoded array -that begins with an integer M, the number of associativity lists followed -by an integer N, the number of entries per associativity list and terminated -by M associativity lists each of length N integers. - -This property provides the same information as given by ``ibm,associativity`` -property in a ``/memory`` node. Each assigned LMB has an index value between -0 and M-1 which is used as an index into this table to select which -associativity list to use for the LMB. This index value for each LMB is defined -in ``ibm,dynamic-memory`` property. - -``ibm,dynamic-memory`` ----------------------- - -This property describes the dynamically reconfigurable memory. It is a -property encoded array that has an integer N, the number of LMBs followed -by N LMB list entries. - -Each LMB list entry consists of the following elements: - -- Logical address of the start of the LMB encoded as a 64-bit integer. This - corresponds to ``reg`` property in ``/memory`` node. -- DRC index of the LMB that corresponds to ``ibm,my-drc-index`` property - in a ``/memory`` node. -- Four bytes reserved for expansion. -- Associativity list index for the LMB that is used as an index into - ``ibm,associativity-lookup-arrays`` property described earlier. This is used - to retrieve the right associativity list to be used for this LMB. -- A 32-bit flags word. The bit at bit position ``0x00000008`` defines whether - the LMB is assigned to the partition as of boot time. - -``ibm,dynamic-memory-v2`` -------------------------- - -This property describes the dynamically reconfigurable memory. This is -an alternate and newer way to describe dynamically reconfigurable memory. -It is a property encoded array that has an integer N (the number of -LMB set entries) followed by N LMB set entries. There is an LMB set entry -for each sequential group of LMBs that share common attributes. - -Each LMB set entry consists of the following elements: - -- Number of sequential LMBs in the entry represented by a 32-bit integer. -- Logical address of the first LMB in the set encoded as a 64-bit integer. -- DRC index of the first LMB in the set. -- Associativity list index that is used as an index into - ``ibm,associativity-lookup-arrays`` property described earlier. This - is used to retrieve the right associativity list to be used for all - the LMBs in this set. -- A 32-bit flags word that applies to all the LMBs in the set. -- cgit v1.2.3-55-g7522 From 22beb38b78b80e17d70b4562625557cafaedda11 Mon Sep 17 00:00:00 2001 From: Leonardo Garcia Date: Tue, 18 Jan 2022 12:56:30 +0100 Subject: Link new ppc-spapr-hotplug.rst file to pseries.rst. Signed-off-by: Leonardo Garcia Reviewed-by: Daniel Henrique Barboza Message-Id: Signed-off-by: Cédric Le Goater --- docs/system/ppc/pseries.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'docs') diff --git a/docs/system/ppc/pseries.rst b/docs/system/ppc/pseries.rst index 1689324815..d0aade3a31 100644 --- a/docs/system/ppc/pseries.rst +++ b/docs/system/ppc/pseries.rst @@ -110,13 +110,13 @@ can also be found in QEMU documentation: .. toctree:: :maxdepth: 1 + ../../specs/ppc-spapr-hotplug.rst ../../specs/ppc-spapr-hcalls.rst ../../specs/ppc-spapr-numa.rst ../../specs/ppc-spapr-xive.rst Other documentation available in QEMU docs directory: -* Hot plug (``/docs/specs/ppc-spapr-hotplug.txt``). * Hypervisor calls needed by the Ultravisor (``/docs/specs/ppc-spapr-uv-hcalls.txt``). -- cgit v1.2.3-55-g7522 From 2084b44d7afa2e604c52a31ee89f46a01835131b Mon Sep 17 00:00:00 2001 From: Leonardo Garcia Date: Tue, 18 Jan 2022 12:56:30 +0100 Subject: rSTify ppc-spapr-uv-hcalls.txt. Signed-off-by: Leonardo Garcia Reviewed-by: Daniel Henrique Barboza Message-Id: <243a714d3861f7539d29b02a899ffc376757d668.1642446876.git.lagarcia@br.ibm.com> Signed-off-by: Cédric Le Goater --- docs/specs/ppc-spapr-uv-hcalls.txt | 165 ++++++++++++++++++++----------------- 1 file changed, 89 insertions(+), 76 deletions(-) (limited to 'docs') diff --git a/docs/specs/ppc-spapr-uv-hcalls.txt b/docs/specs/ppc-spapr-uv-hcalls.txt index 389c2740d7..a00288deb3 100644 --- a/docs/specs/ppc-spapr-uv-hcalls.txt +++ b/docs/specs/ppc-spapr-uv-hcalls.txt @@ -1,76 +1,89 @@ -On PPC64 systems supporting Protected Execution Facility (PEF), system -memory can be placed in a secured region where only an "ultravisor" -running in firmware can provide to access it. pseries guests on such -systems can communicate with the ultravisor (via ultracalls) to switch to a -secure VM mode (SVM) where the guest's memory is relocated to this secured -region, making its memory inaccessible to normal processes/guests running on -the host. - -The various ultracalls/hypercalls relating to SVM mode are currently -only documented internally, but are planned for direct inclusion into the -public OpenPOWER version of the PAPR specification (LoPAPR/LoPAR). An internal -ACR has been filed to reserve a hypercall number range specific to this -use-case to avoid any future conflicts with the internally-maintained PAPR -specification. This document summarizes some of these details as they relate -to QEMU. - -== hypercalls needed by the ultravisor == - -Switching to SVM mode involves a number of hcalls issued by the ultravisor -to the hypervisor to orchestrate the movement of guest memory to secure -memory and various other aspects SVM mode. Numbers are assigned for these -hcalls within the reserved range 0xEF00-0xEF80. The below documents the -hcalls relevant to QEMU. - -- H_TPM_COMM (0xef10) - - For TPM_COMM_OP_EXECUTE operation: - Send a request to a TPM and receive a response, opening a new TPM session - if one has not already been opened. - - For TPM_COMM_OP_CLOSE_SESSION operation: - Close the existing TPM session, if any. - - Arguments: - - r3 : H_TPM_COMM (0xef10) - r4 : TPM operation, one of: - TPM_COMM_OP_EXECUTE (0x1) - TPM_COMM_OP_CLOSE_SESSION (0x2) - r5 : in_buffer, guest physical address of buffer containing the request - - Caller may use the same address for both request and response - r6 : in_size, size of the in buffer - - Must be less than or equal to 4KB - r7 : out_buffer, guest physical address of buffer to store the response - - Caller may use the same address for both request and response - r8 : out_size, size of the out buffer - - Must be at least 4KB, as this is the maximum request/response size - supported by most TPM implementations, including the TPM Resource - Manager in the linux kernel. - - Return values: - - r3 : H_Success request processed successfully - H_PARAMETER invalid TPM operation - H_P2 in_buffer is invalid - H_P3 in_size is invalid - H_P4 out_buffer is invalid - H_P5 out_size is invalid - H_RESOURCE problem communicating with TPM - H_FUNCTION TPM access is not currently allowed/configured - r4 : For TPM_COMM_OP_EXECUTE, the size of the response will be stored here - upon success. - - Use-case/notes: - - SVM filesystems are encrypted using a symmetric key. This key is then - wrapped/encrypted using the public key of a trusted system which has the - private key stored in the system's TPM. An Ultravisor will use this - hcall to unwrap/unseal the symmetric key using the system's TPM device - or a TPM Resource Manager associated with the device. - - The Ultravisor sets up a separate session key with the TPM in advance - during host system boot. All sensitive in and out values will be - encrypted using the session key. Though the hypervisor will see the 'in' - and 'out' buffers in raw form, any sensitive contents will generally be - encrypted using this session key. +=================================== +Hypervisor calls and the Ultravisor +=================================== + +On PPC64 systems supporting Protected Execution Facility (PEF), system memory +can be placed in a secured region where only an ultravisor running in firmware +can provide access to. pSeries guests on such systems can communicate with +the ultravisor (via ultracalls) to switch to a secure virtual machine (SVM) mode +where the guest's memory is relocated to this secured region, making its memory +inaccessible to normal processes/guests running on the host. + +The various ultracalls/hypercalls relating to SVM mode are currently only +documented internally, but are planned for direct inclusion into the Linux on +Power Architecture Reference document ([LoPAR]_). An internal ACR has been filed +to reserve a hypercall number range specific to this use case to avoid any +future conflicts with the IBM internally maintained Power Architecture Platform +Reference (PAPR+) documentation specification. This document summarizes some of +these details as they relate to QEMU. + +Hypercalls needed by the ultravisor +=================================== + +Switching to SVM mode involves a number of hcalls issued by the ultravisor to +the hypervisor to orchestrate the movement of guest memory to secure memory and +various other aspects of the SVM mode. Numbers are assigned for these hcalls +within the reserved range ``0xEF00-0xEF80``. The below documents the hcalls +relevant to QEMU. + +``H_TPM_COMM`` (``0xef10``) +--------------------------- + +SVM file systems are encrypted using a symmetric key. This key is then +wrapped/encrypted using the public key of a trusted system which has the private +key stored in the system's TPM. An Ultravisor will use this hcall to +unwrap/unseal the symmetric key using the system's TPM device or a TPM Resource +Manager associated with the device. + +The Ultravisor sets up a separate session key with the TPM in advance during +host system boot. All sensitive in and out values will be encrypted using the +session key. Though the hypervisor will see the in and out buffers in raw form, +any sensitive contents will generally be encrypted using this session key. + +Arguments: + + ``r3``: ``H_TPM_COMM`` (``0xef10``) + + ``r4``: ``TPM`` operation, one of: + + ``TPM_COMM_OP_EXECUTE`` (``0x1``): send a request to a TPM and receive a + response, opening a new TPM session if one has not already been opened. + + ``TPM_COMM_OP_CLOSE_SESSION`` (``0x2``): close the existing TPM session, if + any. + + ``r5``: ``in_buffer``, guest physical address of buffer containing the + request. Caller may use the same address for both request and response. + + ``r6``: ``in_size``, size of the in buffer. Must be less than or equal to + 4 KB. + + ``r7``: ``out_buffer``, guest physical address of buffer to store the + response. Caller may use the same address for both request and response. + + ``r8``: ``out_size``, size of the out buffer. Must be at least 4 KB, as this + is the maximum request/response size supported by most TPM implementations, + including the TPM Resource Manager in the linux kernel. + +Return values: + + ``r3``: one of the following values: + + ``H_Success``: request processed successfully. + + ``H_PARAMETER``: invalid TPM operation. + + ``H_P2``: ``in_buffer`` is invalid. + + ``H_P3``: ``in_size`` is invalid. + + ``H_P4``: ``out_buffer`` is invalid. + + ``H_P5``: ``out_size`` is invalid. + + ``H_RESOURCE``: problem communicating with TPM. + + ``H_FUNCTION``: TPM access is not currently allowed/configured. + + ``r4``: For ``TPM_COMM_OP_EXECUTE``, the size of the response will be stored + here upon success. -- cgit v1.2.3-55-g7522 From dedc5d79dae59562b2311301d27ecbf2234acf8a Mon Sep 17 00:00:00 2001 From: Leonardo Garcia Date: Tue, 18 Jan 2022 12:56:30 +0100 Subject: Rename ppc-spapr-uv-hcalls.txt to ppc-spapr-uv-hcalls.rst. Signed-off-by: Leonardo Garcia Reviewed-by: Daniel Henrique Barboza Message-Id: Signed-off-by: Cédric Le Goater --- docs/specs/ppc-spapr-uv-hcalls.rst | 89 ++++++++++++++++++++++++++++++++++++++ docs/specs/ppc-spapr-uv-hcalls.txt | 89 -------------------------------------- 2 files changed, 89 insertions(+), 89 deletions(-) create mode 100644 docs/specs/ppc-spapr-uv-hcalls.rst delete mode 100644 docs/specs/ppc-spapr-uv-hcalls.txt (limited to 'docs') diff --git a/docs/specs/ppc-spapr-uv-hcalls.rst b/docs/specs/ppc-spapr-uv-hcalls.rst new file mode 100644 index 0000000000..a00288deb3 --- /dev/null +++ b/docs/specs/ppc-spapr-uv-hcalls.rst @@ -0,0 +1,89 @@ +=================================== +Hypervisor calls and the Ultravisor +=================================== + +On PPC64 systems supporting Protected Execution Facility (PEF), system memory +can be placed in a secured region where only an ultravisor running in firmware +can provide access to. pSeries guests on such systems can communicate with +the ultravisor (via ultracalls) to switch to a secure virtual machine (SVM) mode +where the guest's memory is relocated to this secured region, making its memory +inaccessible to normal processes/guests running on the host. + +The various ultracalls/hypercalls relating to SVM mode are currently only +documented internally, but are planned for direct inclusion into the Linux on +Power Architecture Reference document ([LoPAR]_). An internal ACR has been filed +to reserve a hypercall number range specific to this use case to avoid any +future conflicts with the IBM internally maintained Power Architecture Platform +Reference (PAPR+) documentation specification. This document summarizes some of +these details as they relate to QEMU. + +Hypercalls needed by the ultravisor +=================================== + +Switching to SVM mode involves a number of hcalls issued by the ultravisor to +the hypervisor to orchestrate the movement of guest memory to secure memory and +various other aspects of the SVM mode. Numbers are assigned for these hcalls +within the reserved range ``0xEF00-0xEF80``. The below documents the hcalls +relevant to QEMU. + +``H_TPM_COMM`` (``0xef10``) +--------------------------- + +SVM file systems are encrypted using a symmetric key. This key is then +wrapped/encrypted using the public key of a trusted system which has the private +key stored in the system's TPM. An Ultravisor will use this hcall to +unwrap/unseal the symmetric key using the system's TPM device or a TPM Resource +Manager associated with the device. + +The Ultravisor sets up a separate session key with the TPM in advance during +host system boot. All sensitive in and out values will be encrypted using the +session key. Though the hypervisor will see the in and out buffers in raw form, +any sensitive contents will generally be encrypted using this session key. + +Arguments: + + ``r3``: ``H_TPM_COMM`` (``0xef10``) + + ``r4``: ``TPM`` operation, one of: + + ``TPM_COMM_OP_EXECUTE`` (``0x1``): send a request to a TPM and receive a + response, opening a new TPM session if one has not already been opened. + + ``TPM_COMM_OP_CLOSE_SESSION`` (``0x2``): close the existing TPM session, if + any. + + ``r5``: ``in_buffer``, guest physical address of buffer containing the + request. Caller may use the same address for both request and response. + + ``r6``: ``in_size``, size of the in buffer. Must be less than or equal to + 4 KB. + + ``r7``: ``out_buffer``, guest physical address of buffer to store the + response. Caller may use the same address for both request and response. + + ``r8``: ``out_size``, size of the out buffer. Must be at least 4 KB, as this + is the maximum request/response size supported by most TPM implementations, + including the TPM Resource Manager in the linux kernel. + +Return values: + + ``r3``: one of the following values: + + ``H_Success``: request processed successfully. + + ``H_PARAMETER``: invalid TPM operation. + + ``H_P2``: ``in_buffer`` is invalid. + + ``H_P3``: ``in_size`` is invalid. + + ``H_P4``: ``out_buffer`` is invalid. + + ``H_P5``: ``out_size`` is invalid. + + ``H_RESOURCE``: problem communicating with TPM. + + ``H_FUNCTION``: TPM access is not currently allowed/configured. + + ``r4``: For ``TPM_COMM_OP_EXECUTE``, the size of the response will be stored + here upon success. diff --git a/docs/specs/ppc-spapr-uv-hcalls.txt b/docs/specs/ppc-spapr-uv-hcalls.txt deleted file mode 100644 index a00288deb3..0000000000 --- a/docs/specs/ppc-spapr-uv-hcalls.txt +++ /dev/null @@ -1,89 +0,0 @@ -=================================== -Hypervisor calls and the Ultravisor -=================================== - -On PPC64 systems supporting Protected Execution Facility (PEF), system memory -can be placed in a secured region where only an ultravisor running in firmware -can provide access to. pSeries guests on such systems can communicate with -the ultravisor (via ultracalls) to switch to a secure virtual machine (SVM) mode -where the guest's memory is relocated to this secured region, making its memory -inaccessible to normal processes/guests running on the host. - -The various ultracalls/hypercalls relating to SVM mode are currently only -documented internally, but are planned for direct inclusion into the Linux on -Power Architecture Reference document ([LoPAR]_). An internal ACR has been filed -to reserve a hypercall number range specific to this use case to avoid any -future conflicts with the IBM internally maintained Power Architecture Platform -Reference (PAPR+) documentation specification. This document summarizes some of -these details as they relate to QEMU. - -Hypercalls needed by the ultravisor -=================================== - -Switching to SVM mode involves a number of hcalls issued by the ultravisor to -the hypervisor to orchestrate the movement of guest memory to secure memory and -various other aspects of the SVM mode. Numbers are assigned for these hcalls -within the reserved range ``0xEF00-0xEF80``. The below documents the hcalls -relevant to QEMU. - -``H_TPM_COMM`` (``0xef10``) ---------------------------- - -SVM file systems are encrypted using a symmetric key. This key is then -wrapped/encrypted using the public key of a trusted system which has the private -key stored in the system's TPM. An Ultravisor will use this hcall to -unwrap/unseal the symmetric key using the system's TPM device or a TPM Resource -Manager associated with the device. - -The Ultravisor sets up a separate session key with the TPM in advance during -host system boot. All sensitive in and out values will be encrypted using the -session key. Though the hypervisor will see the in and out buffers in raw form, -any sensitive contents will generally be encrypted using this session key. - -Arguments: - - ``r3``: ``H_TPM_COMM`` (``0xef10``) - - ``r4``: ``TPM`` operation, one of: - - ``TPM_COMM_OP_EXECUTE`` (``0x1``): send a request to a TPM and receive a - response, opening a new TPM session if one has not already been opened. - - ``TPM_COMM_OP_CLOSE_SESSION`` (``0x2``): close the existing TPM session, if - any. - - ``r5``: ``in_buffer``, guest physical address of buffer containing the - request. Caller may use the same address for both request and response. - - ``r6``: ``in_size``, size of the in buffer. Must be less than or equal to - 4 KB. - - ``r7``: ``out_buffer``, guest physical address of buffer to store the - response. Caller may use the same address for both request and response. - - ``r8``: ``out_size``, size of the out buffer. Must be at least 4 KB, as this - is the maximum request/response size supported by most TPM implementations, - including the TPM Resource Manager in the linux kernel. - -Return values: - - ``r3``: one of the following values: - - ``H_Success``: request processed successfully. - - ``H_PARAMETER``: invalid TPM operation. - - ``H_P2``: ``in_buffer`` is invalid. - - ``H_P3``: ``in_size`` is invalid. - - ``H_P4``: ``out_buffer`` is invalid. - - ``H_P5``: ``out_size`` is invalid. - - ``H_RESOURCE``: problem communicating with TPM. - - ``H_FUNCTION``: TPM access is not currently allowed/configured. - - ``r4``: For ``TPM_COMM_OP_EXECUTE``, the size of the response will be stored - here upon success. -- cgit v1.2.3-55-g7522 From 8e12c012a79320041d49ea6162e944b2f7306b71 Mon Sep 17 00:00:00 2001 From: Leonardo Garcia Date: Tue, 18 Jan 2022 12:56:30 +0100 Subject: Link new ppc-spapr-uv-hcalls.rst to pseries.rst. Signed-off-by: Leonardo Garcia Reviewed-by: Daniel Henrique Barboza Message-Id: Signed-off-by: Cédric Le Goater --- docs/system/ppc/pseries.rst | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'docs') diff --git a/docs/system/ppc/pseries.rst b/docs/system/ppc/pseries.rst index d0aade3a31..569237dc0c 100644 --- a/docs/system/ppc/pseries.rst +++ b/docs/system/ppc/pseries.rst @@ -113,13 +113,9 @@ can also be found in QEMU documentation: ../../specs/ppc-spapr-hotplug.rst ../../specs/ppc-spapr-hcalls.rst ../../specs/ppc-spapr-numa.rst + ../../specs/ppc-spapr-uv-hcalls.rst ../../specs/ppc-spapr-xive.rst -Other documentation available in QEMU docs directory: - -* Hypervisor calls needed by the Ultravisor - (``/docs/specs/ppc-spapr-uv-hcalls.txt``). - Switching between the KVM-PR and KVM-HV kernel module ===================================================== -- cgit v1.2.3-55-g7522