summaryrefslogtreecommitdiffstats
path: root/hw/ppc/spapr.c
Commit message (Collapse)AuthorAgeFilesLines
* spapr.c: send QAPI event when memory hotunplug failsDaniel Henrique Barboza2021-03-091-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes allowed the pSeries machine to rollback the hotunplug process for the DIMM when the guest kernel signals, via a reconfiguration of the DR connector, that it's not going to release the LMBs. Let's also warn QAPI listerners about it. One place to do it would be right after the unplug state is cleaned up, spapr_clear_pending_dimm_unplug_state(). This would mean that the function is now doing more than cleaning up the pending dimm state though. This patch does the following changes in spapr.c: - send a QAPI event to inform that we experienced a failure in the hotunplug of the DIMM; - rename spapr_clear_pending_dimm_unplug_state() to spapr_memory_unplug_rollback(). This is a better fit for what the function is now doing, and it makes callers care more about what the function goal is and less about spapr.c internals such as clearing the pending dimm unplug state. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210302141019.153729-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr.c: remove duplicated assert in spapr_memory_unplug_request()Daniel Henrique Barboza2021-03-091-1/+0Star
| | | | | | | | | | | | We are asserting the existence of the first DRC LMB after sending unplug requests to all LMBs of the DIMM, where every DRC is being asserted inside the loop. This means that the first DRC is being asserted twice. Remove the duplicated assert. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210302141019.153729-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr.c: add 'unplug already in progress' message for PHB unplugDaniel Henrique Barboza2021-03-091-0/+4
| | | | | | | | | | | | | Both CPU hotunplug and PC_DIMM unplug reports an user warning, mentioning that the hotunplug is in progress, if consecutive 'device_del' are issued in quick succession. Do the same for PHBs in spapr_phb_unplug_request(). Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210226163301.419727-4-danielhb413@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug stateDaniel Henrique Barboza2021-03-091-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handling errors in memory hotunplug in the pSeries machine is more complex than any other device type, because there are all the complications that other devices has, and more. For instance, determining a timeout for a DIMM hotunplug must consider if it's a Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug DIMMs. The size of the DIMM is also a factor, given that longer DIMMs naturally takes longer to be hotunplugged from the kernel. And there's also the guest memory usage to be considered: if there's a process that is consuming memory that would be lost by the DIMM unplug, the kernel will postpone the unplug process until the process finishes, and then initiate the regular hotunplug process. The first two considerations are manageable, but the last one is a deal breaker. There is no sane way for the pSeries machine to determine the memory load in the guest when attempting a DIMM hotunplug - and even if there was a way, the guest can start using all the RAM in the middle of the unplug process and invalidate our previous assumptions - and in result we can't even begin to calculate a timeout for the operation. This means that we can't implement a viable timeout mechanism for memory unplug in pSeries. Going back to why we would consider an unplug timeout, the reason is that we can't know if the kernel is giving up the unplug. Turns out that, sometimes, we can. Consider a failed memory hotunplug attempt where the kernel will error out with the following message: 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs' This happens when there is a LMB that the kernel gave up in removing, and the LMBs previously marked for removal are now being added back. This happens in the pseries kernel in [1], dlpar_memory_remove_by_ic() into dlpar_add_lmb(), and after that update_lmb_associativity_index(). In this function, the kernel is configuring the LMB DRC connector again. Note that this is a valid usage in LOPAR, as stated in section "ibm,configure-connector RTAS Call": 'A subsequent sequence of calls to ibm,configure-connector with the same entry from the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of devices which were not completely configured.' We can use this kernel behavior in our favor. If a DRC connector reconfiguration for a LMB that we marked as unplug pending happens, this indicates that the kernel changed its mind about the unplug and is reasserting that it will keep using all the LMBs of the DIMM. In this case, it's safe to assume that the whole DIMM device unplug was cancelled. This patch hops into rtas_ibm_configure_connector() and, in the scenario described above, clear the unplug state for the DIMM device. This will not solve all the problems we still have with memory unplug, but it will cover this case where the kernel reconfigures LMBs after a failed unplug. We are a bit more resilient, without using an unreliable timeout, and we didn't make the remaining error cases any worse. [1] arch/powerpc/platforms/pseries/hotplug-memory.c Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-6-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_drc.c: add hotunplug timeout for CPUsDaniel Henrique Barboza2021-03-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a reliable way to make a CPU hotunplug fail in the pseries machine. Hotplug a CPU A, then offline all other CPUs inside the guest but A. When trying to hotunplug A the guest kernel will refuse to do it, because A is now the last online CPU of the guest. PAPR has no 'error callback' in this situation to report back to the platform, so the guest kernel will deny the unplug in silent and QEMU will never know what happened. The unplug pending state of A will remain until the guest is shutdown or rebooted. Previous attempts of fixing it (see [1] and [2]) were aimed at trying to mitigate the effects of the problem. In [1] we were trying to guess which guest CPUs were online to forbid hotunplug of the last online CPU in the QEMU layer, avoiding the scenario described above because QEMU is now failing in behalf of the guest. This is not robust because the last online CPU of the guest can change while we're in the middle of the unplug process, and our initial assumptions are now invalid. In [2] we were accepting that our unplug process is uncertain and the user should be allowed to spam the IRQ hotunplug queue of the guest in case the CPU hotunplug fails. This patch presents another alternative, using the timeout infrastructure introduced in the previous patch. CPU hotunplugs in the pSeries machine will now timeout after 15 seconds. This is a long time for a single CPU unplug to occur, regardless of guest load - although the user is *strongly* encouraged to *not* hotunplug devices from a guest under high load - and we can be sure that something went wrong if it takes longer than that for the guest to release the CPU (the same can't be said about memory hotunplug - more on that in the next patch). Timing out the unplug operation will reset the unplug state of the CPU and allow the user to try it again, regardless of the error situation that prevented the hotunplug to occur. Of all the not so pretty fixes/mitigations for CPU hotunplug errors in pSeries, timing out the operation is an admission that we have no control in the process, and must assume the worst case if the operation doesn't succeed in a sensible time frame. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html [2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html Reported-by: Xujun Ma <xuma@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()Daniel Henrique Barboza2021-03-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | spapr_drc_detach() is not the best name for what the function does. The function does not detach the DRC, it makes an uncommited attempt to do it. It'll mark the DRC as pending unplug, via the 'unplug_request' flag, and only if the DRC state is drck->empty_state it will detach the DRC, via spapr_drc_release(). This is a contrast with its pair spapr_drc_attach(), where the function is indeed creating the DRC QOM object. If you know what spapr_drc_attach() does, you can be misled into thinking that spapr_drc_detach() is removing the DRC from QEMU internal state, which isn't true. The current role of this function is better described as a request for detach, since there's no guarantee that we're going to detach the DRC in the end. Rename the function to spapr_drc_unplug_request to reflect what is is doing. The initial idea was to change the name to spapr_drc_detach_request(), and later on change the unplug_request flag to detach_request. However, unplug_request is a migratable boolean for a long time now and renaming it is not worth the trouble. spapr_drc_unplug_request() setting drc->unplug_request is more natural than spapr_drc_detach_request setting drc->unplug_request. Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helperDaniel Henrique Barboza2021-02-101-10/+1Star
| | | | | | | | | | | | We'll need to check the initial value given to spapr->gpu_numa_id when building the rtas DT, so put it in a helper for easier access and to avoid repetition. Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: move spapr_machine_using_legacy_numa() to spapr_numa.cDaniel Henrique Barboza2021-02-101-9/+0Star
| | | | | | | | | | This function is used only in spapr_numa.c. Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Adjust firmware path of PCI devicesGreg Kurz2021-02-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is currently not possible to perform a strict boot from USB storage: $ qemu-system-ppc64 -accel kvm -nodefaults -nographic -serial stdio \ -boot strict=on \ -device qemu-xhci \ -device usb-storage,drive=disk,bootindex=0 \ -blockdev driver=file,node-name=disk,filename=fedora-ppc64le.qcow2 SLOF ********************************************************************** QEMU Starting Build Date = Jul 17 2020 11:15:24 FW Version = git-e18ddad8516ff2cf Press "s" to enter Open Firmware. Populating /vdevice methods Populating /vdevice/vty@71000000 Populating /vdevice/nvram@71000001 Populating /pci@800000020000000 00 0000 (D) : 1b36 000d serial bus [ usb-xhci ] No NVRAM common partition, re-initializing... Scanning USB XHCI: Initializing USB Storage SCSI: Looking for devices 101000000000000 DISK : "QEMU QEMU HARDDISK 2.5+" Using default console: /vdevice/vty@71000000 Welcome to Open Firmware Copyright (c) 2004, 2017 IBM Corporation All rights reserved. This program and the accompanying materials are made available under the terms of the BSD License available at http://www.opensource.org/licenses/bsd-license.php Trying to load: from: /pci@800000020000000/usb@0/storage@1/disk@101000000000000 ... E3405: No such device E3407: Load failed Type 'boot' and press return to continue booting the system. Type 'reset-all' and press return to reboot the system. Ready! 0 > The device tree handed over by QEMU to SLOF indeed contains: qemu,boot-list = "/pci@800000020000000/usb@0/storage@1/disk@101000000000000 HALT"; but the device node is named usb-xhci@0, not usb@0. This happens because the firmware names of PCI devices returned by get_boot_devices_list() come from pcibus_get_fw_dev_path(), while the sPAPR PHB code uses a different naming scheme for device nodes. This inconsistency has always been there but it was hidden for a long time because SLOF used to rename USB device nodes, until this commit, merged in QEMU 4.2.0 : commit 85164ad4ed9960cac842fa4cc067c6b6699b0994 Author: Alexey Kardashevskiy <aik@ozlabs.ru> Date: Wed Sep 11 16:24:32 2019 +1000 pseries: Update SLOF firmware image This fixes USB host bus adapter name in the device tree to match QEMU's one. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Fortunately, sPAPR implements the firmware path provider interface. This provides a way to override the default firmware paths. Just factor out the sPAPR PHB naming logic from spapr_dt_pci_device() to a helper, and use it in the sPAPR firmware path provider hook. Fixes: 85164ad4ed99 ("pseries: Update SLOF firmware image") Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210122170157.246374-1-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr.c: add 'name' property for hotplugged CPUs nodesDaniel Henrique Barboza2021-02-101-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the CPU hotunplug bug [1] the guest kernel throws a scary message in dmesg: pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16 The reason isn't related to the bug though. This happens because the kernel file arch/powerpc/platform/pseries/hotplug-cpu.c, function dlpar_cpu_remove(), is not finding the device_node.name of the offending CPU. We're not populating the 'name' property for hotplugged CPUs. Since the kernel relies on device_node.name for identifying CPU nodes, and the CPUs that are coldplugged has the 'name' property filled by SLOF, this is creating an unneeded inconsistency between hotplug and coldplug CPUs in the kernel. Let's fill the 'name' property for hotplugged CPUs as well. This will make the guest dmesg throws a less intimidating message when we try to unplug the last online CPU: pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9@1, rc: -16 [1] https://bugzilla.redhat.com/1911414 Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210120232305.241521-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr.c: use g_auto* with 'nodename' in CPU DT functionsDaniel Henrique Barboza2021-02-101-4/+2Star
| | | | | | | | | | | | | Next patch will use the 'nodename' string in spapr_core_dt_populate() after the point it's being freed today. Instead of moving 'g_free(nodename)' around, let's do a QoL change in both CPU DT functions where 'nodename' is being freed, and use g_autofree to avoid the 'g_free()' call altogether. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210120232305.241521-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Add PEF based confidential guest supportDavid Gibson2021-02-081-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some upcoming POWER machines have a system called PEF (Protected Execution Facility) which uses a small ultravisor to allow guests to run in a way that they can't be eavesdropped by the hypervisor. The effect is roughly similar to AMD SEV, although the mechanisms are quite different. Most of the work of this is done between the guest, KVM and the ultravisor, with little need for involvement by qemu. However qemu does need to tell KVM to allow secure VMs. Because the availability of secure mode is a guest visible difference which depends on having the right hardware and firmware, we don't enable this by default. In order to run a secure guest you need to create a "pef-guest" object and set the confidential-guest-support property to point to it. Note that this just *allows* secure guests, the architecture of PEF is such that the guest still needs to talk to the ultravisor to enter secure mode. Qemu has no direct way of knowing if the guest is in secure mode, and certainly can't know until well after machine creation time. To start a PEF-capable guest, use the command line options: -object pef-guest,id=pef0 -machine confidential-guest-support=pef0 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
* spapr: Improve handling of memory unplug with old guestsGreg Kurz2021-01-191-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 1e8b5b1aa16b ("spapr: Allow memory unplug to always succeed") trying to unplug memory from a guest that doesn't support it (eg. rhel6) no longer generates an error like it used to. Instead, it leaves the memory around : only a subsequent reboot or manual use of drmgr within the guest can complete the hot-unplug sequence. A flag was added to SpaprMachineClass so that this new behavior only applies to the default machine type. We can do better. CAS processes all pending hot-unplug requests. This means that we don't really care about what the guest supports if the hot-unplug request happens before CAS. All guests that we care for, even old ones, set enough bits in OV5 that lead to a non-empty bitmap in spapr->ov5_cas. Use that as a heuristic to decide if CAS has already occured or not. Always accept unplug requests that happen before CAS since CAS will process them. Restore the previous behavior of rejecting them after CAS when we know that the guest doesn't support memory hot-unplug. This behavior is suitable for all machine types : this allows to drop the pre_6_0_memory_unplug flag. Fixes: 1e8b5b1aa16b ("spapr: Allow memory unplug to always succeed") Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <161012708715.801107.11418801796987916516.stgit@bahia.lan> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Use spapr_drc_reset_all() at machine resetGreg Kurz2021-01-061-14/+1Star
| | | | | | | | | | | | | | | | | | Documentation of object_child_foreach_recursive() clearly stipulates that "it is forbidden to add or remove children from @obj from the @fn callback". But this is exactly what we do during machine reset. The call to spapr_drc_reset() can finalize the hot-unplug sequence of a PHB or a PCI bridge, both of which will then in turn destroy their PCI DRCs. This could potentially invalidate the iterator used by do_object_child_foreach(). It is pure luck that this haven't caused any issues so far. Use spapr_drc_reset_all() since it can cope with DRC removal. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-5-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Allow memory unplug to always succeedGreg Kurz2021-01-061-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | It is currently impossible to hot-unplug a memory device between machine reset and CAS. (qemu) device_del dimm1 Error: Memory hot unplug not supported for this guest This limitation was introduced in order to provide an explicit error path for older guests that didn't support hot-plug event sources (and thus memory hot-unplug). The linux kernel has been supporting these since 4.11. All recent enough guests are thus capable of handling the removal of a memory device at all time, including during early boot. Lift the limitation for the latest machine type. This means that trying to unplug memory from a guest that doesn't support it will likely just do nothing and the memory will only get removed at next reboot. Such older guests can still get the existing behavior by using an older machine type. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160794035064.23292.17560963281911312439.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Fix DR properties of the root nodeGreg Kurz2021-01-061-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Section 13.5.2 of LoPAPR mandates various DR related indentifiers for all hot-pluggable entities to be exposed in the "ibm,drc-indexes", "ibm,drc-power-domains", "ibm,drc-names" and "ibm,drc-types" properties of their parent node. These properties are created with spapr_dt_drc(). PHBs and LMBs are both children of the machine. Their DR identifiers are thus supposed to be exposed in the afore mentioned properties of the root node. When PHB hot-plug support was added, an extra call to spapr_dt_drc() was introduced: this overwrites the existing properties, previously populated with the LMB identifiers, and they end up containing only PHB identifiers. This went unseen so far because linux doesn't care, but this is still not conformant with LoPAPR. Fortunately spapr_dt_drc() is able to handle multiple DR entity types at the same time. Use that to handle DR indentifiers for PHBs and LMBs with a single call to spapr_dt_drc(). While here also account for PMEM DR identifiers, which were forgotten when NVDIMM hot-plug support was added. Also add an assert to prevent further misuse of spapr_dt_drc(). With -m 1G,maxmem=2G,slots=8 passed on the QEMU command line we get: Without this patch: /proc/device-tree/ibm,drc-indexes 0000001f 20000001 20000002 20000003 20000000 20000005 20000006 20000007 20000004 20000009 20000008 20000010 20000011 20000012 20000013 20000014 20000015 20000016 20000017 20000018 20000019 2000000a 2000000b 2000000c 2000000d 2000000e 2000000f 2000001a 2000001b 2000001c 2000001d 2000001e These are the DRC indexes for the 31 possible PHBs. With this patch: /proc/device-tree/ibm,drc-indexes 0000002b 90000000 90000001 90000002 90000003 90000004 90000005 90000006 90000007 20000001 20000002 20000003 20000000 20000005 20000006 20000007 20000004 20000009 20000008 20000010 20000011 20000012 20000013 20000014 20000015 20000016 20000017 20000018 20000019 2000000a 2000000b 2000000c 2000000d 2000000e 2000000f 2000001a 2000001b 2000001c 2000001d 2000001e 80000004 80000005 80000006 80000007 And now we also have the 4 ((2G - 1G) / 256M) LMBs and the 8 (slots) PMEMs. Fixes: 3998ccd09298 ("spapr: populate PHB DRC entries for root DT node") Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160794479566.35245.17809158217760761558.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: DRC lookup cannot failGreg Kurz2021-01-061-0/+2
| | | | | | | | | | | | | | All memory DRC objects are created during machine init. It is thus safe to assume spapr_drc_by_id() cannot return NULL when hot-plug/unplugging memory. Make this clear with an assertion, like the code already does a few lines above when looping over memory DRCs. This fixes Coverity reports 1437757 and 1437758. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160805381160.228955.5388294067094240175.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc/spapr: cleanup -machine pseries,nvdimm=X handlingIgor Mammedov2020-12-151-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Since NVDIMM support was introduced on pseries machine, it ignored machine's nvdimm=on|off option and effectively was always enabled on machines that support NVDIMM. Later on commit (28f5a716212 ppc/spapr_nvdimm: do not enable support with 'nvdimm=off') makes QEMU error out in case user explicitly set 'nvdimm=off' on CLI by peeking at machine_opts. However that's a workaround and leaves 'nvdimms_state->is_enabled' in inconsistent state (false) when it should be set true by default. Instead of using on machine_opts, set default to true for pseries machine in initfn time. If user sets manually 'nvdimm=off' it will overwrite default value to false and QEMU will error as expected without need to peek into machine_opts. That way pseries will have, nvdimm enabled by default and will honor user provided 'nvdimm=on|off'. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20201208164606.4109134-1-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* spapr.c: set a 'kvm-type' default value instead of relying on NULLDaniel Henrique Barboza2020-12-141-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | spapr_kvm_type() is considering 'vm_type=NULL' as a valid input, where the function returns 0. This is relying on the current QEMU machine options handling logic, where the absence of the 'kvm-type' option will be reflected as 'vm_type=NULL' in this function. This is not robust, and will break if QEMU options code decides to propagate something else in the case mentioned above (e.g. an empty string instead of NULL). Let's avoid this entirely by setting a non-NULL default value in case of no user input for 'kvm-type'. spapr_kvm_type() was changed to handle 3 fixed values of kvm-type: "auto", "hv", and "pr", with "auto" being the default if no kvm-type was set by the user. This allows us to always be predictable regardless of any enhancements/changes made in QEMU options mechanics. While we're at it, let's also document in 'kvm-type' description the already existing default mode, now named 'auto'. The information provided about it is based on how the pseries kernel handles the KVM_CREATE_VM ioctl(), where the default value '0' makes the kernel choose an available KVM module to use, giving precedence to kvm_hv. This logic is described in the kernel source file arch/powerpc/kvm/powerpc.c, function kvm_arch_init_vm(). Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20201210145517.1532269-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
* spapr: spapr_drc_attach() cannot failGreg Kurz2020-12-141-3/+3
| | | | | | | | | | All users are passing &error_abort already. Document the fact that spapr_drc_attach() should only be passed a free DRC, which is supposedly the case if appropriate checking is done earlier. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201201113728.885700-5-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Simplify error path of spapr_core_plug()Greg Kurz2020-12-141-11/+10Star
| | | | | | | | | | | | | | spapr_core_pre_plug() already guarantees that the slot for the given core ID is available. It is thus safe to assume that spapr_find_cpu_slot() returns a slot during plug. Turn the error path into an assertion. It is also safe to assume that no device is attached to the corresponding DRC and that spapr_drc_attach() shouldn't fail. Pass &error_abort to spapr_drc_attach() and simplify error handling. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201201113728.885700-4-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Abort if ppc_set_compat() fails for hot-plugged CPUsGreg Kurz2020-12-141-6/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When a CPU is hot-plugged, we set its compat mode to match the boot CPU, which was either set by machine reset or by CAS. This is currently handled in the plug handler after the core got realized. Potential errors of ppc_set_compat() are propagated to the hot-plug logic. Handling errors this late in the hot-plug sequence is generally frown upon. Ideally, we should do sanity checks in a pre-plug handler and pass &error_abort to ppc_set_compat() in the plug handler. We can filter out some error cases of ppc_set_compat() by calling ppc_check_compat() at pre-plug. But ppc_set_compat() also sets the compat register in KVM, and KVM doesn't provide any API that would allow to check valid compat mode settings beforehand. However, at this point we know that the compat mode was already successfully set for the boot CPU. Since this all boils down to setting a register with the very same value that was valid for the boot CPU, it should definitely not fail for hot-plugged CPUS. Pass &error_abort to ppc_set_compat(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201201113728.885700-3-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Fix pre-2.10 dummy ICP hackGreg Kurz2020-12-141-7/+7
| | | | | | | | | | | | | | | | | | | This hack registers dummy VMState entries of ICPs in order to support migration of old pseries machine types that used to create all smp.max_cpus possible ICPs at machine init. Part of the work is to unregister the dummy entries when plugging an actual vCPU core, and to register them back when unplugging the core. The code that unregisters the dummy ICPs in spapr_core_plug() is misplaced: if ppc_set_compat() fails afterwards, the hotplug operation will be cancelled and the dummy ICPs won't be registered back since the unplug handler isn't called. Unregister the dummy ICPs at the end of spapr_core_plug(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201201113728.885700-2-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Do TPM proxy hotplug sanity checks at pre-plugGreg Kurz2020-12-141-5/+18
| | | | | | | | | | | | There can be only one TPM proxy at a time. This is currently checked at plug time. But this can be detected at pre-plug in order to error out earlier. This allows to get rid of error handling in the plug handler. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201120234208.683521-9-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Do PHB hoplug sanity check at pre-plugGreg Kurz2020-12-141-6/+11
| | | | | | | | | | | | We currently detect that a PHB index is already in use at plug time. But this can be decteted at pre-plug in order to error out earlier. This allows to pass &error_abort to spapr_drc_attach() and to end up with a plug handler that doesn't need to report errors anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201120234208.683521-8-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Make PHB placement functions and spapr_pre_plug_phb() return statusGreg Kurz2020-12-141-17/+23
| | | | | | | | | | Read documentation in "qapi/error.h" and changelog of commit e3fe3988d785 ("error: Document Error API usage rules") for rationale. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201120234208.683521-7-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Do NVDIMM/PC-DIMM device hotplug sanity checks at pre-plug onlyGreg Kurz2020-12-141-27/+13Star
| | | | | | | | | | | | | | | Pre-plug of a memory device, be it an NVDIMM or a PC-DIMM, ensures that the memory slot is available and that addresses don't overlap with existing memory regions. The corresponding DRCs in the LMB and PMEM namespaces are thus necessarily attachable at plug time. Pass &error_abort to spapr_drc_attach() in spapr_add_lmbs() and spapr_add_nvdimm(). This allows to greatly simplify error handling on the plug path. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201120234208.683521-3-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* vl: remove serial_max_hdsPaolo Bonzini2020-12-101-4/+2Star
| | | | | | | | | serial_hd(i) is NULL if and only if i >= serial_max_hds(). Test serial_hd(i) instead of bounding the loop at serial_max_hds(), thus removing one more function that vl.c is expected to export. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* vl: extract softmmu/datadir.cPaolo Bonzini2020-12-101-0/+1
| | | | | Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* ppc: remove bios_namePaolo Bonzini2020-12-101-3/+1Star
| | | | | | | | | Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20201026143028.3034018-11-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* hw: add compat machines for 6.0Cornelia Huck2020-12-081-2/+13
| | | | | | | | | | | Add 6.0 machine types for arm/i440fx/q35/s390x/spapr. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20201109173928.1001764-1-cohuck@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* spapr: Drop dead code in spapr_reallocate_hpt()Greg Kurz2020-11-051-6/+0Star
| | | | | | | | | | | | | | | Sometimes QEMU needs to allocate the HPT in userspace, namely with TCG or PR KVM. This is performed with qemu_memalign() because of alignment requirements. Like glib's allocators, its behaviour is to abort on OOM instead of returning NULL. This could be changed to qemu_try_memalign(), but in the specific case of spapr_reallocate_hpt(), the outcome would be to terminate QEMU anyway since no HPT means no MMU for the guest. Drop the dead code instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160398562892.32380.15006707861753544263.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Improve spapr_reallocate_hpt() error reportingGreg Kurz2020-10-271-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | spapr_reallocate_hpt() has three users, two of which pass &error_fatal and the third one, htab_load(), passes &local_err, uses it to detect failures and simply propagates -EINVAL up to vmstate_load(), which will cause QEMU to exit. It is thus confusing that spapr_reallocate_hpt() doesn't return right away when an error is detected in some cases. Also, the comment suggesting that the caller is welcome to try to carry on seems like a remnant in this respect. This can be improved: - change spapr_reallocate_hpt() to always report a negative errno on failure, either as reported by KVM or -ENOSPC if the HPT is smaller than what was asked, - use that to detect failures in htab_load() which is preferred over checking &local_err, - propagate this negative errno to vmstate_load() because it is more accurate than propagating -EINVAL for all possible errors. [dwg: Fix compile error due to omitted prelim patch] Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160371605460.305923.5890143959901241157.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* target/ppc: Fix kvmppc_load_htab_chunk() error reportingGreg Kurz2020-10-271-1/+3
| | | | | | | | | | | | | | | | | If kvmppc_load_htab_chunk() fails, its return value is propagated up to vmstate_load(). It should thus be a negative errno, not -1 (which maps to EPERM and would lure the user into thinking that the problem is necessarily related to a lack of privilege). Return the error reported by KVM or ENOSPC in case of short write. While here, propagate the error message through an @errp argument and have the caller to print it with error_report_err() instead of relying on fprintf(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160371604713.305923.5264900354159029580.stgit@bahia.lan> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Use error_append_hint() in spapr_reallocate_hpt()Greg Kurz2020-10-271-3/+5
| | | | | | | | | | Hints should be added with the dedicated error_append_hint() API because we don't want to print them when using QMP. This requires to insert ERRP_GUARD as explained in "qapi/error.h". Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160371604030.305923.17464161378167312662.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Simplify error handling in spapr_memory_plug()Greg Kurz2020-10-271-12/+10Star
| | | | | | | | | | | | | As recommended in "qapi/error.h", add a bool return value to spapr_add_lmbs() and spapr_add_nvdimm(), and use them instead of local_err in spapr_memory_plug(). This allows to get rid of the error propagation overhead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160309734178.2739814.3488437759887793902.stgit@bahia.lan> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Pass &error_abort when getting some PC DIMM propertiesGreg Kurz2020-10-271-14/+3Star
| | | | | | | | | | | | | | | | | | | | Both PC_DIMM_SLOT_PROP and PC_DIMM_ADDR_PROP are defined in the default property list of the PC DIMM device class: DEFINE_PROP_UINT64(PC_DIMM_ADDR_PROP, PCDIMMDevice, addr, 0), DEFINE_PROP_INT32(PC_DIMM_SLOT_PROP, PCDIMMDevice, slot, PC_DIMM_UNASSIGNED_SLOT), They should thus be always gettable for both PC DIMMs and NVDIMMs. An error in getting them can only be the result of a programming error. It doesn't make much sense to propagate the error in this case. Abort instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160309732180.2739814.7243774674998010907.stgit@bahia.lan> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Use appropriate getter for PC_DIMM_SLOT_PROPGreg Kurz2020-10-271-3/+6
| | | | | | | | | | | | | | | | | | The PC_DIMM_SLOT_PROP property is defined as: DEFINE_PROP_INT32(PC_DIMM_SLOT_PROP, PCDIMMDevice, slot, PC_DIMM_UNASSIGNED_SLOT), Use object_property_get_int() instead of object_property_get_uint(). Since spapr_memory_plug() only gets called if pc_dimm_pre_plug() succeeded, we expect to have a valid >= 0 slot number, either because the user passed a valid slot number or because pc_dimm_get_free_slot() picked one up for us. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160309730758.2739814.15821922745424652642.stgit@bahia.lan> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Use appropriate getter for PC_DIMM_ADDR_PROPGreg Kurz2020-10-271-2/+2
| | | | | | | | | | | | | The PC_DIMM_ADDR_PROP property is defined as: DEFINE_PROP_UINT64(PC_DIMM_ADDR_PROP, PCDIMMDevice, addr, 0), Use object_property_get_uint() instead of object_property_get_int(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160309729609.2739814.4996614957953215591.stgit@bahia.lan> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* pc-dimm: Drop @errp argument of pc_dimm_plug()Greg Kurz2020-10-271-5/+1Star
| | | | | | | | | | | | | pc_dimm_plug() doesn't use it. It only aborts on error. Drop @errp and adapt the callers accordingly. [dwg: Removed unused label to fix compile] Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160309728447.2739814.12831204841251148202.stgit@bahia.lan> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Move spapr_create_nvdimm_dr_connectors() to core machine codeGreg Kurz2020-10-271-0/+10
| | | | | | | | | | | | | | The spapr_create_nvdimm_dr_connectors() function doesn't need to access any internal details of the sPAPR NVDIMM implementation. Also, pretty much like for the LMBs, only spapr_machine_init() is responsible for the creation of DR connectors for NVDIMMs. Make this clear by making this function static in hw/ppc/spapr.c. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160249772183.757627.7396780936543977766.stgit@bahia.lan> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: add spapr_machine_using_legacy_numa() helperDaniel Henrique Barboza2020-10-091-0/+12
| | | | | | | | | | | | | | | | | | | | | | | The changes to come to NUMA support are all guest visible. In theory we could just create a new 5_1 class option flag to avoid the changes to cascade to 5.1 and under. The reality is that these changes are only relevant if the machine has more than one NUMA node. There is no need to change guest behavior that has been around for years needlesly. This new helper will be used by the next patches to determine whether we should retain the (soon to be) legacy NUMA behavior in the pSeries machine. The new behavior will only be exposed if: - machine is pseries-5.2 and newer; - more than one NUMA node is declared in NUMA state. Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20201007172849.302240-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Add a return value to spapr_check_pagesize()Greg Kurz2020-10-091-3/+1Star
| | | | | | | | | | | As recommended in "qapi/error.h", return true on success and false on failure. This allows to reduce error propagation overhead in the callers. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-14-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Add a return value to spapr_nvdimm_validate()Greg Kurz2020-10-091-3/+1Star
| | | | | | | | | | | As recommended in "qapi/error.h", return true on success and false on failure. This allows to reduce error propagation overhead in the callers. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-13-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Add a return value to spapr_set_vcpu_id()Greg Kurz2020-10-091-2/+3
| | | | | | | | | | | As recommended in "qapi/error.h", return true on success and false on failure. This allows to reduce error propagation overhead in the callers. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-11-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Add a return value to spapr_drc_attach()Greg Kurz2020-10-091-12/+3Star
| | | | | | | | | | | As recommended in "qapi/error.h", return true on success and false on failure. This allows to reduce error propagation overhead in the callers. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-9-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Simplify error handling in callers of ppc_set_compat()Greg Kurz2020-10-091-4/+3Star
| | | | | | | | | | | Now that ppc_set_compat() indicates success/failure with a return value, use it and reduce error propagation overhead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-5-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Handle HPT allocation failure in nested guestFabiano Rosas2020-10-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | The nested KVM code does not yet support HPT guests. Calling the KVM_CAP_PPC_ALLOC_HTAB ioctl currently leads to KVM setting the guest as HPT and erroneously executing code in L1 that should only run in hypervisor mode, leading to an exception in the L1 vcpu thread when it enters the nested guest. This can be reproduced with -machine max-cpu-compat=power8 in the L2 guest command line. The KVM code has since been modified to fail the ioctl when running in a nested environment so QEMU needs to be able to handle that. This patch provides an error message informing the user about the lack of support for HPT in nested guests. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20200911043123.204162-1-farosas@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* numa: drop support for '-numa node' (without memory specified)Igor Mammedov2020-09-301-1/+0Star
| | | | | | | | | | | | | | | | it was deprecated since 4.1 commit 4bb4a2732e (numa: deprecate implict memory distribution between nodes) Users of existing VMs, wishing to preserve the same RAM distribution, should configure it explicitly using ``-numa node,memdev`` options. Current RAM distribution can be retrieved using HMP command `info numa` and if separate memory devices (pc|nv-dimm) are present use `info memory-device` and subtract device memory from output of `info numa`. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20200911084410.788171-2-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* load_elf: Remove unused address variables from callersBALATON Zoltan2020-09-261-7/+4Star
| | | | | | | | | | | | | | | | Several callers of load_elf() pass pointers for lowaddr and highaddr parameters which are then not used for anything. This may stem from a misunderstanding that load_elf need a value here but in fact it can take NULL to ignore these values. Remove such unused variables and pass NULL instead from callers that don't need these. Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Message-Id: <20200705174020.BDD0174633F@zero.eik.bme.hu> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>