summaryrefslogtreecommitdiffstats
path: root/hw/ppc/spapr_hcall.c
Commit message (Collapse)AuthorAgeFilesLines
* spapr_hcall.c: make do_client_architecture_support staticDaniel Henrique Barboza2021-01-191-0/+1
| | | | | | | | The function is called only inside spapr_hcall.c. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210114180628.1675603-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Introduce spapr_drc_reset_all()Greg Kurz2021-01-061-34/+6Star
| | | | | | | | | | No need to expose the way DRCs are traversed outside of spapr_drc.c. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-4-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Fix reset of transient DR connectorsGreg Kurz2021-01-061-1/+7
| | | | | | | | | | | | | | | | | | | | | | Documentation of object_property_iter_init() clearly stipulates that "it is forbidden to modify the property list while iterating". But this is exactly what we do when resetting transient DR connectors during CAS. The call to spapr_drc_reset() can finalize the hot-unplug sequence of a PHB or a PCI bridge, both of which will then in turn destroy their PCI DRCs. This could potentially invalidate the iterator. It is pure luck that this haven't caused any issues so far. Change spapr_drc_reset() to return true if it caused a device to be removed. Restart from scratch in this case. This can potentially increase the overall DRC reset time, especially with a high maxmem which generates a lot of LMB DRCs. But this kind of setup is rare, and so is the use case of rebooting a guest while doing hot-unplug. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-3-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Call spapr_drc_reset() for all DRCs at CASGreg Kurz2021-01-061-3/+4
| | | | | | | | | | | | | | Non-transient DRCs are either in the empty or the ready state, which means spapr_drc_reset() doesn't change their state. It is thus not needed to do any checking. Call spapr_drc_reset() unconditionally and squash spapr_drc_transient() into its only user, spapr_drc_needed(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-2-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Pass sPAPR machine state down to spapr_pci_switch_vga()Greg Kurz2020-12-141-3/+4
| | | | | | | | This allows to drop a user of qdev_get_machine(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201209170052.1431440-4-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Convert hpt_prepare_thread() to use qemu_try_memalign()Greg Kurz2020-11-051-1/+1
| | | | | | | | | | | | | | | HPT resizing is asynchronous: the guest first kicks off the creation of a new HPT, then it waits for that new HPT to be actually created and finally it asks the current HPT to be replaced by the new one. In the case of a userland allocated HPT, this currently relies on calling qemu_memalign() which aborts on OOM and never returns NULL. Since we seem to have path to report the failure to the guest with an H_NO_MEM return value, use qemu_try_memalign() instead of qemu_memalign(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160398563636.32380.1747166034877173994.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Simplify error handling in do_client_architecture_support()Greg Kurz2020-10-091-4/+3Star
| | | | | | | | | | | Use the return value of ppc_set_compat_all() to check failures, which is preferred over hijacking local_err. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-7-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Get rid of cas_check_pvr() error reportingGreg Kurz2020-10-091-15/+12Star
| | | | | | | | | | | | | | | | | | | | | The cas_check_pvr() function has two purposes: - finding the "best" logical PVR, ie. the most recent one supported by the guest for this CPU type - checking if the guest supports the real PVR of this CPU type, which is just an optional extra information to workaround the lack of support for "compat" mode in PR KVM This logic doesn't need error reporting, really. If we don't find a suitable logical PVR, we return the special value 0 which is definitely not a valid PVR. Let the caller decide on whether it should error out or not. This doesn't change the behavior. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20200914123505.612812-6-groug@kaod.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: move h_home_node_associativity to spapr_numa.cDaniel Henrique Barboza2020-09-081-40/+0Star
| | | | | | | | | | | The implementation of this hypercall will be modified to use spapr->numa_assoc_arrays input. Moving it to spapr_numa.c makes make more sense. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200904172422.617460-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* qom: Crash more nicely on object_property_get_link() failureMarkus Armbruster2020-07-101-1/+2
| | | | | | | | | | | Pass &error_abort instead of NULL where the returned value is dereferenced or asserted to be non-null. Drop a now redundant assertion. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-24-armbru@redhat.com>
* spapr: Drop CAS reboot flagGreg Kurz2020-05-071-19/+14Star
| | | | | | | | | | | | | | The CAS reboot flag is false by default and all the locations that could set it to true have been dropped. This means that all code blocks depending on the flag being set is dead code and the other code blocks should be executed always. Just do that and drop the now uneeded CAS reboot flag. Fix a comment on the way to make checkpatch happy. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514994893.478799.11772512888322840990.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr/cas: Separate CAS handling from rebuilding the FDTAlexey Kardashevskiy2020-05-071-26/+41
| | | | | | | | | | | | | | | | | | | | | | | At the moment "ibm,client-architecture-support" ("CAS") is implemented in SLOF and QEMU assists via the custom H_CAS hypercall which copies an updated flatten device tree (FDT) blob to the SLOF memory which it then uses to update its internal tree. When we enable the OpenFirmware client interface in QEMU, we won't need to copy the FDT to the guest as the client is expected to fetch the device tree using the client interface. This moves FDT rebuild out to a separate helper which is going to be called from the "ibm,client-architecture-support" handler and leaves writing FDT to the guest in the H_CAS handler. This should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200310050733.29805-3-aik@ozlabs.ru> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514994229.478799.2178881312094922324.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Simplify selection of radix/hash during CASGreg Kurz2020-05-071-5/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The guest can select the MMU mode by setting bits 0-1 of byte 24 in OV5 to to 0b00 for hash or 0b01 for radix. As required by the architecture, we terminate the boot process if any other value is found there. The usual way to negotiate features in OV5 is basically ANDing the bitfield provided by the guest and the bitfield of features supported by QEMU, previously populated at machine init. For some not documented reason, MMU is treated differently : bit 1 of byte 24 (the radix/hash bit) is cleared from the guest OV5 and explicitely set in the final negotiated OV5 if radix was requested. Since the only expected input from the guest is the radix/hash bit being set or not, it seems more appropriate to handle this like we do for XIVE. Set the radix bit in spapr->ov5 at machine init if it has a chance to work (ie. power9, either TCG or a radix capable KVM) and rely exclusively on spapr_ovec_intersect() to set the radix bit in spapr->ov5_cas. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514993621.478799.4204740354545734293.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Don't check capabilities removed between CAS callsGreg Kurz2020-05-071-13/+1Star
| | | | | | | | | | | | | | | | | | | | | | We currently check if some capability in OV5 was removed by the guest since the previous CAS, and we trigger a CAS reboot in that case. This was required because it could call for a device-tree property or node removal, that we didn't support until recently (see commit 6787d27b04a7 "spapr: add option vector handling in CAS-generated resets" for details). Now that we render a full FDT at CAS and that SLOF is able to handle node removal, we don't need to do a CAS reset in this case anymore. Also, this check can only return true if the guest has already called CAS since the last full system reset (otherwise spapr->ov5_cas is empty). Linux doesn't do that so this can be considered as dead code for the vast majority of existing setups. Drop the check. Since the only use of the ov5_cas_old variable is precisely the check itself, drop the variable as well. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514993021.478799.10928618293640651819.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Fix memory leak in h_client_architecture_support()Greg Kurz2020-03-241-0/+1
| | | | | | | | | | | | This is the only error path that needs to free the previously allocated ov1. Reported-by: Coverity (CID 1421924) Fixes: cbd0d7f36322 "spapr: Fail CAS if option vector table cannot be parsed" Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158481206205.336182.16106097429336044843.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
* spapr: Don't attempt to clamp RMA to VRMA constraintDavid Gibson2020-03-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Real Mode Area (RMA) is the part of memory which a guest can access when in real (MMU off) mode. Of course, for a guest under KVM, the MMU isn't really turned off, it's just in a special translation mode - Virtual Real Mode Area (VRMA) - which looks like real mode in guest mode. The mechanics of how this works when using the hash MMU (HPT) put a constraint on the size of the RMA, which depends on the size of the HPT. So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we advertise to the guest based on this VRMA limit. There are several things wrong with this: 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum of Node 0 memory size and the VRMA limit. That will *often* work the same as clamping, but there can be other constraints on RMA size which supersede Node 0 memory size. We have real bugs caused by this (currently worked around in the guest kernel) 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where we're past the point that we can actually advertise an RMA limit to the guest 3) But most fundamentally, the VRMA limit depends on host configuration (page size) which shouldn't be visible to the guest, but this partially exposes it. This can cause problems with migration in certain edge cases, although we will mostly get away with it. In practice, this clamping is almost never applied anyway. With 64kiB pages and the normal rules for sizing of the HPT, the theoretical VRMA limit will be 4x(guest memory size) and so never hit. It will hit with 4kiB pages, where it will be (guest memory size)/4. However all mainstream distro kernels for POWER have used a 64kiB page size for at least 10 years. So, simply replace this logic with a check that the RMA we've calculated based only on guest visible configuration will fit within the host implied VRMA limit. This can break if running HPT guests on a host kernel with 4kiB page size. As noted that's very rare. There also exist several possible workarounds: * Change the host kernel to use 64kiB pages * Use radix MMU (RPT) guests instead of HPT * Use 64kiB hugepages on the host to back guest memory * Increase the guest memory size so that the RMA hits one of the fixed limits before the RMA limit. This is relatively easy on POWER8 which has a 16GiB limit, harder on POWER9 which has a 1TiB limit. * Use a guest NUMA configuration which artificially constrains the RMA within the VRMA limit (the RMA must always fit within Node 0). Previously, on KVM, we also temporarily reduced the rma_size to 256M so that the we'd load the kernel and initrd safely, regardless of the VRMA limit. This was a) confusing, b) could significantly limit the size of images we could load and c) introduced a behavioural difference between KVM and TCG. So we remove that as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org>
* spapr: Handle pending hot plug/unplug requests at CASGreg Kurz2020-03-161-6/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a hot plug or unplug request is pending at CAS, we currently trigger a CAS reboot, which severely increases the guest boot time. This is because SLOF doesn't handle hot plug events and we had no way to fix the FDT that gets presented to the guest. We can do better thanks to recent changes in QEMU and SLOF: - we now return a full FDT to SLOF during CAS - SLOF was fixed to correctly detect any device that was either added or removed since boot time and to update its internal DT accordingly. The right solution is to process all pending hot plug/unplug requests during CAS: convert hot plugged devices to cold plugged devices and remove the hot unplugged ones, which is exactly what spapr_drc_reset() does. Also clear all hot plug events that are currently queued since they're no longer relevant. Note that SLOF cannot currently populate hot plugged PCI bridges or PHBs at CAS. Until this limitation is lifted, SLOF will reset the machine when this scenario occurs : this will allow the FDT to be fully processed when SLOF is started again (ie. the same effect as the CAS reboot that would occur anyway without this patch). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158257222352.4102917.8984214333937947307.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Merge branch 'exec_rw_const_v4' of https://github.com/philmd/qemu into HEADPaolo Bonzini2020-02-251-2/+2
|\
| * Let cpu_[physical]_memory() calls pass a boolean 'is_write' argumentPhilippe Mathieu-Daudé2020-02-201-2/+2
| | | | | | | | | | | | | | | | | | Use an explicit boolean type. This commit was produced with the included Coccinelle script scripts/coccinelle/exec_rw_const. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
* | spapr: Don't use spapr_drc_needed() in CAS codeGreg Kurz2020-02-201-5/+9
|/ | | | | | | | | | | | | | | | | | | | | We currently don't support hotplug of devices between boot and CAS. If this happens a CAS reboot is triggered. We detect this during CAS using the spapr_drc_needed() function which is essentially a VMStateDescription .needed callback. Even if the condition for CAS reboot happens to be the same as for DRC migration, it looks wrong to piggyback a migration helper for this. Introduce a helper with slightly more explicit name and use it in both CAS and DRC migration code. Since a subsequent patch will enhance this helper to cover the case of hot unplug, let's go for spapr_drc_transient(). While here convert spapr_hotplugged_dev_before_cas() to the "transient" wording as well. This doesn't change any behaviour. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158169248180.3465937.9531405453362718771.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Don't allow multiple active vCPUs at CASGreg Kurz2020-02-021-0/+12
| | | | | | | | | | | | | | | | | | | According to the description of "ibm,client-architecture-support" that can found in LoPAPR "B.6.2.3 Root Node Methods": If multiple partition processors or threads are active at the time of the ibm,client-architecture-support method call, or an error is detected in the format of the ibm,architecture.vec structure, the err? boolean shall be TRUE; else FALSE. We certainly don't want to temper with the platform or with the PCR of the other vCPUs if they happen to be active. Ensure we have only one active vCPU and fail CAS otherwise. This is just for conformance and robustness, it doesn't fix any known bugs. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157969867170.571404.12117797348882189656.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Fail CAS if option vector table cannot be parsedGreg Kurz2020-02-021-0/+8
| | | | | | | | | | | | | Most of the option vector helpers have assertions to check their arguments aren't null. The guest can provide an arbitrary address for the CAS structure that would result in such null arguments. Fail CAS with H_PARAMETER and print a warning instead of aborting QEMU. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <157925255250.397143.10855183619366882459.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Simplify ovec diffDavid Gibson2019-12-171-6/+2Star
| | | | | | | | | | | | | | | | spapr_ovec_diff(ov, old, new) has somewhat complex semantics. ov is set to those bits which are in new but not old, and it returns as a boolean whether or not there are any bits in old but not new. It turns out that both callers only care about the second, not the first. This is basically equivalent to a bitmap subset operation, which is easier to understand and implement. So replace spapr_ovec_diff() with spapr_ovec_subset(). Cc: Mike Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>
* spapr: Fold h_cas_compose_response() into h_client_architecture_support()David Gibson2019-12-171-3/+52
| | | | | | | | | | | | | | | | | | spapr_h_cas_compose_response() handles the last piece of the PAPR feature negotiation process invoked via the ibm,client-architecture-support OF call. Its only caller is h_client_architecture_support() which handles most of the rest of that process. I believe it was placed in a separate file originally to handle some fiddly dependencies between functions, but mostly it's just confusing to have the CAS process split into two pieces like this. Now that compose response is simplified (by just generating the whole device tree anew), it's cleaner to just fold it into h_client_architecture_support(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>
* spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeoverDavid Gibson2019-12-171-20/+13Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAPR allows the interrupt controller used on a POWER9 machine (XICS or XIVE) to be selected by the guest operating system, by using the ibm,client-architecture-support (CAS) feature negotiation call. Currently, if the guest selects an interrupt controller different from the one selected at initial boot, this causes the system to be reset with the new model and the boot starts again. This means we run through the SLOF boot process twice, as well as any other bootloader (e.g. grub) in use before the OS calls CAS. This can be confusing and/or inconvenient for users. Thanks to two fairly recent changes, we no longer need this reboot. 1) we now completely regenerate the device tree when CAS is called (meaning we don't need special case updates for all the device tree changes caused by the interrupt controller mode change), 2) we now have explicit code paths to activate and deactivate the different interrupt controllers, rather than just implicitly calling those at machine reset time. We can therefore eliminate the reboot for changing irq mode, simply by putting a call to spapr_irq_update_active_intc() before we call spapr_h_cas_compose_response() (which gives the updated device tree to the guest firmware and OS). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>
* spapr: Use less cryptic representation of which irq backends are supportedDavid Gibson2019-10-041-3/+3
| | | | | | | | | | | | | | | | SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector 5 which indicates whether XICS, XIVE or both interrupt controllers are available. As usual for PAPR, the encoding is kind of overly complicated and confusing (though to be fair there are some backwards compat things it has to handle). But to make our internal code clearer, have SpaprIrq encode more directly which backends are available as two booleans, and derive the OV5 value from that at the point we need it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
* spapr: Simplify handling of pre ISA 3.0 guest workaround handlingDavid Gibson2019-10-041-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | Certain old guest versions don't understand the radix MMU introduced with POWER ISA 3.0, but incorrectly select it if presented with the option at CAS time. We workaround this in qemu by explicitly excluding the radix (and other ISA 3.0 linked) options if the guest doesn't explicitly note support for ISA 3.0. This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty vague. Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for. In addition, we unnecessarily call spapr_populate_pa_features() with different options when initially constructing the device tree and when adjusting it at CAS time. At the initial construct time cas_pre_isa3_guest is already false, so we can still use the flag, rather than explicitly overriding it to be false at the callsite. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
* spapr: Use SHUTDOWN_CAUSE_SUBSYSTEM_RESET for CAS rebootsDavid Gibson2019-08-291-1/+1
| | | | | | | | | | | | | | | | The sPAPR platform includes feature negotiation between the guest and platform. That sometimes requires reconfiguring the virtual hardware, and in some cases that is a complex enough process that we trigger a system reset to handle it. That interacts badly with -no-reboot - we trigger the reboot, -no-reboot means we exit and so the guest never gets to try again. Eventually we want to get rid of CAS reboots entirely, since they're odd and irritating for the user. But in the meantime we can fix the -no-reboot problem by using SHUTDOWN_CAUSE_SUBSYSTEM_RESET which ignores -no-reboot and seems to be designed for this sort of faux-reset for internal purposes only. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: initial implementation for H_TPM_COMM/spapr-tpm-proxyMichael Roth2019-08-211-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | This implements the H_TPM_COMM hypercall, which is used by an Ultravisor to pass TPM commands directly to the host's TPM device, or a TPM Resource Manager associated with the device. This also introduces a new virtual device, spapr-tpm-proxy, which is used to configure the host TPM path to be used to service requests sent by H_TPM_COMM hcalls, for example: -device spapr-tpm-proxy,id=tpmp0,host-path=/dev/tpmrm0 By default, no spapr-tpm-proxy will be created, and hcalls will return H_FUNCTION. The full specification for this hypercall can be found in docs/specs/ppc-spapr-uv-hcalls.txt Since SVM-related hcalls like H_TPM_COMM use a reserved range of 0xEF00-0xEF80, we introduce a separate hcall table here to handle them. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com Message-Id: <20190717205842.17827-3-mdroth@linux.vnet.ibm.com> [dwg: Corrected #include for upstream change] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Implement H_JOINNicholas Piggin2019-08-211-13/+61
| | | | | | | | | | | | This has been useful to modify and test the Linux pseries suspend code but it requires modification to the guest to call it (due to being gated by other unimplemented features). It is not otherwise used by Linux yet, but work is slowly progressing there. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-5-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Implement H_CONFERNicholas Piggin2019-08-211-0/+67
| | | | | | | | | | | | This does not do directed yielding and is not quite as strict as PAPR specifies in terms of precise dispatch behaviour. This generally will mean suboptimal performance, rather than guest misbehaviour. Linux does not rely on exact dispatch behaviour. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-4-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Implement H_PRODNicholas Piggin2019-08-211-0/+32
| | | | | | | | | | H_PROD is added, and H_CEDE is modified to test the prod bit according to PAPR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-3-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Implement dispatch tracking for tcgNicholas Piggin2019-08-211-5/+0Star
| | | | | | | | | | | | Implement cpu_exec_enter/exit on ppc which calls into new methods of the same name in PPCVirtualHypervisorClass. These are used by spapr to implement the splpar VPA dispatch counter initially. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20190718034214.14948-2-npiggin@gmail.com> [dwg: Removed unnecessary CONFIG_USER_ONLY checks as suggested by gkurz] Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc: fix leak in h_client_architecture_supportShivaprasad G Bhat2019-08-211-0/+2
| | | | | | | | | Free all SpaprOptionVector local pointers after use. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Message-Id: <156335160761.82682.11912058325777251614.stgit@lep8c.aus.stglabs.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* sysemu: Split sysemu/runstate.h off sysemu/sysemu.hMarkus Armbruster2019-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]
* Include qemu/main-loop.h lessMarkus Armbruster2019-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
* Include qemu/module.h where needed, drop it from qemu-common.hMarkus Armbruster2019-06-121-0/+1
| | | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]
* spapr: Print out extra hints when CAS negotiation of interrupt mode failsGreg Kurz2019-05-291-2/+4
| | | | | | | | | | | | | Let's suggest to the user how the machine should be configured to allow the guest to boot successfully. Suggested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155799221739.527449.14907564571096243745.stgit@bahia.lan> Reviewed-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> [dwg: Adjusted for style error] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr/xive: Sanity checks of OV5 during CASGreg Kurz2019-05-291-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | If a machine is started with ic-mode=xive but the guest only knows about XICS, eg. an RHEL 7.6 guest, the kernel panics. This is expected but a bit unfortunate since the crash doesn't provide much information for the end user to guess what's happening. Detect that during CAS and exit QEMU with a proper error message instead, like it is already done for the MMU. Even if this is less likely to happen, the opposite case of a guest that only knows about XIVE would certainly fail all the same if the machine is started with ic-mode=xics. Also, the only valid values a guest can pass in byte 23 of OV5 during CAS are 0b00 (XIVE legacy mode) and 0b01 (XIVE exploitation mode). Any other value is a bug, at least with the current spec. Again, it does not seem right to let the guest go on without a precise idea of the interrupt mode it asked for. Handle these cases as well. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155793986451.464434.12887933000007255549.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc/hash64: Rework R and C bit updatesBenjamin Herrenschmidt2019-04-261-6/+7
| | | | | | | | | | | | | | With MT-TCG, we are now running translation in a racy way, thus we need to mimic hardware when it comes to updating the R and C bits, by doing byte stores. The current "store_hpte" abstraction is ill suited for this, we replace it with two separate callbacks for setting R and C. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190411080004.8690-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc/spapr: Use proper HPTE accessors for H_READBenjamin Herrenschmidt2019-04-261-6/+5Star
| | | | | | | Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190411080004.8690-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: Correctly set LPCR[GTSE] in H_REGISTER_PROCESS_TABLEDavid Gibson2019-03-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | 176dccee "target/ppc/spapr: Clear partition table entry when allocating hash table" reworked the H_REGISTER_PROCESS_TABLE hypercall, but unfortunately due to a small error no longer correctly sets the LPCR[GTSE] bit which allows the guest to directly execute (some types of) tlbie (TLB flush) instructions without involving the hypervisor. We got away with this, initially, because POWER9 did not have hypervisor mode enabled in its msr_mask, which meant we didn't actually run hypervisor privilege checks in TCG at all. However, da874d90 "target/ppc: add HV support for POWER9" turned on HV support on POWER9 for the benefit of the powernv machine type. This exposed the earlier bug in H_REGISTER_PROCESS_TABLE, and causes guests which rely on LPCR[GTSE] (i.e. basically all of them) to crash during early boot when their first tlbie instruction causes an unexpected trap. Fixes: 176dccee target/ppc/spapr: Clear partition table entry when allocating hash table Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Cleber Rosa <crosa@redhat.com>
* spapr: Use CamelCase properlyDavid Gibson2019-03-121-49/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The qemu coding standard is to use CamelCase for type and structure names, and the pseries code follows that... sort of. There are quite a lot of places where we bend the rules in order to preserve the capitalization of internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR". That was a bad idea - it frequently leads to names ending up with hard to read clusters of capital letters, and means they don't catch the eye as type identifiers, which is kind of the point of the CamelCase convention in the first place. In short, keeping type identifiers look like CamelCase is more important than preserving standard capitalization of internal "words". So, this patch renames a heap of spapr internal type names to a more standard CamelCase. In addition to case changes, we also make some other identifier renames: VIOsPAPR* -> SpaprVio* The reverse word ordering was only ever used to mitigate the capital cluster, so revert to the natural ordering. VIOsPAPRVTYDevice -> SpaprVioVty VIOsPAPRVLANDevice -> SpaprVioVlan Brevity, since the "Device" didn't add useful information sPAPRDRConnector -> SpaprDrc sPAPRDRConnectorClass -> SpaprDrcClass Brevity, and makes it clearer this is the same thing as a "DRC" mentioned in many other places in the code This is 100% a mechanical search-and-replace patch. It will, however, conflict with essentially any and all outstanding patches touching the spapr code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* target/ppc/spapr: Clear partition table entry when allocating hash tableSuraj Jitindar Singh2019-03-121-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we allocate a hash page table then we know that the guest won't be using process tables, so set the partition table entry maintained for the guest to zero. If this isn't done, then the guest radix bit will remain set in the entry. This means that when the guest calls H_REGISTER_PROCESS_TABLE there will be a mismatch between then flags and the value in spapr->patb_entry, and the call will fail. The guest will then panic: Failed to register process table (rc=-4) kernel BUG at arch/powerpc/platforms/pseries/lpar.c:959 The result being that it isn't possible to boot a hash guest on a P9 system. Also fix a bug in the flags parsing in h_register_process_table() which was introduced by the same patch, and simplify the handling to make it less likely that errors will be introduced in the future. The effect would have been setting the host radix bit LPCR_HR for a hash guest using process tables, which currently isn't supported and so couldn't have been triggered. Fixes: 00fd075e18 "target/ppc/spapr: Set LPCR:HR when using Radix mode" Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190305022102.17610-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* target/ppc/spapr: Add SPAPR_CAP_CCF_ASSISTSuraj Jitindar Singh2019-03-121-0/+5
| | | | | | | | | | | | | | | | | | | | | | Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate the requirement for a hw-assisted version of the count cache flush workaround. The count cache flush workaround is a software workaround which can be used to flush the count cache on context switch. Some revisions of hardware may have a hardware accelerated flush, in which case the software flush can be shortened. This cap is used to set the availability of such hardware acceleration for the count cache flush routine. The availability of such hardware acceleration is indicated by the H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics returned from the KVM_PPC_GET_CPU_CHAR ioctl. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com> [dwg: Small style fixes] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* target/ppc/spapr: Add workaround option to SPAPR_CAP_IBSSuraj Jitindar Singh2019-03-121-0/+5
| | | | | | | | | | | | | | | | | The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability for mitigations for indirect branch speculation. Currently the available values are broken (default), fixed-ibs (fixed by serialising indirect branches) and fixed-ccd (fixed by diabling the count cache). Introduce a new value for this capability denoted workaround, meaning that software can work around the issue by flushing the count cache on context switch. This option is available if the hypervisor sets the H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from the KVM_PPC_GET_CPU_CHAR ioctl. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* target/ppc: Rename PATB/PATBE -> PATEBenjamin Herrenschmidt2019-02-251-10/+12
| | | | | | | | | | | | | | That "b" means "base address" and thus shouldn't be in the name of actual entries and related constants. This patch keeps the synthetic patb_entry field of the spapr virtual hypervisor unchanged until I figure out if that has an impact on the migration stream. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* target/ppc/spapr: Set LPCR:HR when using Radix modeBenjamin Herrenschmidt2019-02-251-38/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | | | The HW relies on LPCR:HR along with the PATE to determine whether to use Radix or Hash mode. In fact it uses LPCR:HR more commonly than the PATE. For us, it's also more efficient to do so, especially since unlike the HW we do not maintain a cache of the current PATE and HV PATE in a generic place. Prepare the grounds for that by ensuring that LPCR:HR is set properly on SPAPR machines. Another option would have been to use a callback to get the PATE but this gets messy when implementing bare metal support, it's much simpler (and faster) to use LPCR. Since existing migration streams may not have it, fix it up in spapr_post_load() as well based on the pseudo-PATE entry that we keep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICSCédric Le Goater2019-01-081-0/+11
| | | | | | | | | | | | | | | The 'dual' sPAPR IRQ backend supports both interrupt mode, XIVE exploitation mode and the legacy compatibility mode (XICS). both modes are not supported at the same time. The machine starts with the legacy mode and a new interrupt mode can then be negotiated by the CAS process. In this case, the new mode is activated after a reset to take into account the required changes in the machine. These impact the device tree layout, the interrupt presenter object and the exposed MMIO regions in the case of XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* ppc/spapr: Receive and store device tree blob from SLOFAlexey Kardashevskiy2019-01-081-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SLOF receives a device tree and updates it with various properties before switching to the guest kernel and QEMU is not aware of any changes made by SLOF. Since there is no real RTAS (QEMU implements it), it makes sense to pass the SLOF final device tree to QEMU to let it implement RTAS related tasks better, such as PCI host bus adapter hotplug. Specifially, now QEMU can find out the actual XICS phandle (for PHB hotplug) and the RTAS linux,rtas-entry/base properties (for firmware assisted NMI - FWNMI). This stores the initial DT blob in the sPAPR machine and replaces it in the KVMPPC_H_UPDATE_DT (new private hypercall) handler. This adds an @update_dt_enabled machine property to allow backward migration. SLOF already has a hypercall since https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183 This makes use of the new fdt_check_full() helper. In order to allow the configure script to pick the correct DTC version, this adjusts the DTC presense test. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>