summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | spapr-pci: Stop providing assigned-addressesAlexey Kardashevskiy2019-10-041-35/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QEMU does not allocate PCI resources (BARs) in any case - coldplug devices are configured by the firmware and hotplug devices rely on the guest system to do the assignment via the PCI rescan mechanism. Also in order to create non empty "assigned-addresses", the device has to be enabled (i.e. PCI_COMMAND needs the MMIO bit set) first as otherwise io_regions[i].addr are -1, and devices are not enabled at this point. This removes "assigned-addresses" and leaves it to those who actually do resource allocation. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190927022651.71642-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | target/ppc: remove unnecessary if() around calls to set_dfp{64,128}() in DFP ↵Mark Cave-Ayland2019-10-041-50/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | macros Now that the parameters to both set_dfp64() and set_dfp128() are exactly the same, there is no need for an explicit if() statement to determine which function should be called based upon size. Instead we can simply use the preprocessor to generate the call to set_dfp##size() directly. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190926185801.11176-8-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | target/ppc: use existing VsrD() macro to eliminate HI_IDX and LO_IDX from ↵Mark Cave-Ayland2019-10-041-39/+31Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dfp_helper.c Switch over all accesses to the decimal numbers held in struct PPC_DFP from using HI_IDX and LO_IDX to using the VsrD() macro instead. Not only does this allow the compiler to ensure that the various dfp_* functions are being passed a ppc_vsr_t rather than an arbitrary uint64_t pointer, but also allows the host endian-specific HI_IDX and LO_IDX to be completely removed from dfp_helper.c. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190926185801.11176-7-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | target/ppc: change struct PPC_DFP decimal storage from uint64[2] to ppc_vsr_tMark Cave-Ayland2019-10-041-102/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several places in dfp_helper.c that access the decimal number representations in struct PPC_DFP via HI_IDX and LO_IDX defines which are set at the top of dfp_helper.c according to the host endian. However we can instead switch to using ppc_vsr_t for decimal numbers and then make subsequent use of the existing VsrD() macros to access the correct element regardless of host endian. Note that 64-bit decimals are stored in the LSB of ppc_vsr_t (equivalent to VsrD(1)). Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190926185801.11176-6-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | target/ppc: introduce dfp_finalize_decimal{64,128}() helper functionsMark Cave-Ayland2019-10-041-19/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the DFP helper functions call decimal{64,128}FromNumber() just before returning in order to convert the decNumber stored in dfp.t64 back to a Decimal{64,128} to write back to the FP registers. Introduce new dfp_finalize_decimal{64,128}() helper functions which both enable the parameter list to be reduced considerably, and also help minimise the changes required in the next patch. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190926185801.11176-5-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | target/ppc: update {get,set}_dfp{64,128}() helper functions to read/write ↵Mark Cave-Ayland2019-10-043-39/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DFP numbers correctly Since commit ef96e3ae96 "target/ppc: move FP and VMX registers into aligned vsr register array" FP registers are no longer stored consecutively in memory and so the current method of combining FP register pairs into DFP numbers is incorrect. Firstly update the definition of the dh_*_fprp defines in helper.h to reflect that FP registers are now stored as part of an array of ppc_vsr_t elements rather than plain uint64_t elements, and then introduce a new ppc_fprp_t type which conceptually represents a DFP even-odd register pair to be consumed by the DFP helper functions. Finally update the new DFP {get,set}_dfp{64,128}() helper functions to convert between DFP numbers and DFP even-odd register pairs correctly, making use of the existing VsrD() macro to access the correct elements regardless of host endian. Fixes: ef96e3ae96 "target/ppc: move FP and VMX registers into aligned vsr register array" Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190926185801.11176-4-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | target/ppc: introduce set_dfp{64,128}() helper functionsMark Cave-Ayland2019-10-041-42/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing functions (now incorrectly) assume that the MSB and LSB of DFP numbers are stored as consecutive 64-bit words in memory. Instead of accessing the DFP numbers directly, introduce set_dfp{64,128}() helper functions to ease the switch to the correct representation. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190926185801.11176-3-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | target/ppc: introduce get_dfp{64,128}() helper functionsMark Cave-Ayland2019-10-041-13/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing functions (now incorrectly) assume that the MSB and LSB of DFP numbers are stored as consecutive 64-bit words in memory. Instead of accessing the DFP numbers directly, introduce get_dfp{64,128}() helper functions to ease the switch to the correct representation. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190926185801.11176-2-mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | pseries: Update SLOF firmware imageAlexey Kardashevskiy2019-10-043-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes USB host bus adapter name in the device tree to match QEMU's one. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | spapr: Stop providing RTAS blobAlexey Kardashevskiy2019-10-049-145/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SLOF implements one itself so let's remove it from QEMU. It is one less image and simpler setup as the RTAS blob never stays in its initial place anyway as the guest OS always decides where to put it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | spapr: Do not put empty properties for -kernel/-initrd/-appendAlexey Kardashevskiy2019-10-041-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are going to use spapr_build_fdt() for the boot time FDT and as an update for SLOF during handling of H_CAS. SLOF will apply all properties from the QEMU's FDT which is usually ok unless there are properties changed by grub or guest kernel. The properties are: bootargs, linux,initrd-start, linux,initrd-end, linux,stdout-path, linux,rtas-base, linux,rtas-entry. Resetting those during CAS will most likely cause grub failure. Don't create such properties if we're booting without "-kernel" and "-initrd" so they won't get included into the DT update blob and therefore the guest is more likely to boot successfully. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Tweaked commit message based on Greg Kurz's input] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | spapr: Skip leading zeroes from memory@ DT node namesAlexey Kardashevskiy2019-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device tree build by QEMU at the machine reset time is used by SLOF to build its internal device tree but the node names are not preserved exactly so when QEMU provides a device tree update in response to H_CAS, it might become tricky to match a node from the update blob to the actual node in SLOF. This removed leading zeroes from "memory@" nodes and makes the DTC checker happy. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
| * | | spapr: Fixes a leak in CASAlexey Kardashevskiy2019-10-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a missing g_free(fdt) if the resulting tree is bigger than the space allocated by SLOF. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
| * | | spapr: Move handling of special NVLink numa node from reset to initDavid Gibson2019-10-041-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The number of NUMA nodes in the system is fixed from the command line. Therefore, there's no need to recalculate it at reset time, and we can determine the special gpu_numa_id value used for NVLink2 devices at init time. This simplifies the reset path a bit which will make further improvements easier. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
| * | | spapr: Simplify handling of pre ISA 3.0 guest workaround handlingDavid Gibson2019-10-043-9/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Certain old guest versions don't understand the radix MMU introduced with POWER ISA 3.0, but incorrectly select it if presented with the option at CAS time. We workaround this in qemu by explicitly excluding the radix (and other ISA 3.0 linked) options if the guest doesn't explicitly note support for ISA 3.0. This is handled by the 'cas_legacy_guest_workaround' flag, which is pretty vague. Rename it to 'cas_pre_isa3_guest' to be clearer about what it's for. In addition, we unnecessarily call spapr_populate_pa_features() with different options when initially constructing the device tree and when adjusting it at CAS time. At the initial construct time cas_pre_isa3_guest is already false, so we can still use the flag, rather than explicitly overriding it to be false at the callsite. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
| * | | ppc/kvm: Skip writing DPDES back when in run time stateAlexey Kardashevskiy2019-10-042-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On POWER8 systems the Directed Privileged Door-bell Exception State register (DPDES) stores doorbell pending status, one bit per a thread of a core, set by "msgsndp" instruction. The register is shared among threads of the same core and KVM on POWER9 emulates it in a similar way (POWER9 does not have DPDES). DPDES is shared but QEMU assumes all SPRs are per thread so the only safe way to write DPDES back to VCPU before running a guest is doing so while all threads are pulled out of the guest so DPDES cannot change. There is only one situation when this condition is met: incoming migration when all threads are stopped. Otherwise any QEMU HMP/QMP command causing kvm_arch_put_registers() (for example printing registers or dumping memory) can clobber DPDES in a race with other vcpu threads. This changes DPDES handling so it is not written to KVM at runtime. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190923084110.34643-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | ppc: Use FPSCR defines instead of constantsPaul A. Clarke2019-10-042-65/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are FPSCR-related defines in target/ppc/cpu.h which can be used in place of constants and explicit shifts which arguably improve the code a bit in places. Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Message-Id: <1568817169-1721-1-git-send-email-pc@us.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | ppc: Add support for 'mffsce' instructionPaul A. Clarke2019-10-042-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR) instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl. This patch adds support for 'mffsce' instruction. 'mffsce' is identical to 'mffs', except that it also clears the exception enable bits in the FPSCR. On CPUs without support for 'mffsce' (below ISA 3.0), the instruction will execute identically to 'mffs'. Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1568817082-1384-1-git-send-email-pc@us.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | ppc: Add support for 'mffscrn','mffscrni' instructionsPaul A. Clarke2019-10-045-3/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR) instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl. This patch adds support for 'mffscrn' and 'mffscrni' instructions. 'mffscrn' and 'mffscrni' are similar to 'mffsl', except they do not return the status bits (FI, FR, FPRF) and they also set the rounding mode in the FPSCR. On CPUs without support for 'mffscrn'/'mffscrni' (below ISA 3.0), the instructions will execute identically to 'mffs'. Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Message-Id: <1568817081-1345-1-git-send-email-pc@us.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | spapr/irq: Only claim VALID interrupts at the KVM levelCédric Le Goater2019-10-042-3/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A typical pseries VM with 16 vCPUs, one disk, one network adapater uses less than 100 interrupts but the whole IRQ number space of the QEMU machine is allocated at reset time and it is 8K wide. This is wasting a considerable amount of interrupt numbers in the global IRQ space which has 1M interrupts per socket on a POWER9. To optimise the HW resources, only request at the KVM level interrupts which have been claimed by the guest. This will help to increase the maximum number of VMs per system and also help supporting nested guests using the XIVE interrupt mode. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190911133937.2716-3-clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156942766014.1274533.10792048853177121231.stgit@bahia.lan> [dwg: Folded in fix up from Greg Kurz] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | spapr/irq: Introduce an ics_irq_free() helperCédric Le Goater2019-10-042-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It will help us to discard interrupt numbers which have not been claimed in the next patch. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190911133937.2716-2-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | hw/ppc/pnv_homer: add PowerNV homer device modelBalamuruhan S2019-10-045-0/+359
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | add PnvHomer device model to emulate homer memory access for pstate table, occ-sensors, slw, occ static and dynamic values for Power8 and Power9 chips. Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Message-Id: <20190912093056.4516-4-bala24@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | hw/ppc/pnv_occ: add sram device model for occ common areaBalamuruhan S2019-10-043-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | emulate occ common area region with occ sram device model which occ and skiboot uses it to communicate regarding sensors, slw and HWMON in PowerNV emulated host. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Message-Id: <20190912093056.4516-3-bala24@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | hw/ppc/pnv_xscom: retrieve homer/occ base address from PBA BARsBalamuruhan S2019-10-042-4/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During PowerNV boot skiboot populates the device tree by retrieving base address of homer/occ common area from PBA BARs and prd ipoll mask by accessing xscom read/write accesses. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Message-Id: <20190912093056.4516-2-bala24@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | spapr: Report kvm_irqchip_in_kernel() in 'info pic'Greg Kurz2019-10-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unless the machine was started with kernel-irqchip=on, we cannot easily tell if we're actually using an in-kernel or an emulated irqchip. This information is important enough that it is worth printing it in 'info pic'. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156829860985.2073005.5893493824873412773.stgit@bahia.tls.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | hw/ppc/pnv: fix checkpatch.pl coding style warningsBalamuruhan S2019-10-041-17/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were few trailing comments after `/*` instead in new line and line more than 80 character, these fixes are trivial and doesn't change any logic in code. Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Message-Id: <20190911142925.19197-5-bala24@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | spapr-tpm-proxy: Drop misleading checkGreg Kurz2019-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity is reporting in CID 1405304 that tpm_execute() may pass a NULL tpm_proxy->host_path pointer to open(). This is based on the fact that h_tpm_comm() does a NULL check on tpm_proxy->host_path and then passes tpm_proxy to tpm_execute(). The check in h_tpm_comm() is abusive actually since a spapr-proxy-tpm requires a non NULL host_path property, as checked during realize. Fixes: 0fb6bd073230 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156805260916.1779401.11054185183758185247.stgit@bahia.lan> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | ppc/pnv: fix "bmc" node name in DTCédric Le Goater2019-10-041-4/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes the dtc output : ERROR (node_name_chars): //bmc: Bad character '/' in node name Warning (avoid_unnecessary_addr_size): /bmc: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190902092932.20200-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
| * | | pseries: do not allow memory-less/cpu-less NUMA nodeLaurent Vivier2019-10-041-0/+33
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we hotplug a CPU on memory-less/cpu-less node, the linux kernel crashes. This happens because linux kernel needs to know the NUMA topology at start to be able to initialize the distance lookup table. On pseries, the topology is provided by the firmware via the existing CPUs and memory information. Thus a node without memory and CPU cannot be discovered by the kernel. To avoid the kernel crash, do not allow to start pseries with empty nodes. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20190830161345.22436-1-lvivier@redhat.com> [dwg: Rework to cope with movement of numa state from globals to MachineState] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* | | Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell2019-10-0438-244/+1038
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Compilation fix for KVM (Alex) * SMM fix (Dmitry) * VFIO error reporting (Eric) * win32 fixes and workarounds (Marc-André) * qemu-pr-helper crash bugfix (Maxim) * Memory leak fixes (myself) * VMX features (myself) * Record-replay deadlock (Pavel) * i386 CPUID bits (Sebastian) * kconfig tweak (Thomas) * Valgrind fix (Thomas) * Autoconverge test (Yury) # gpg: Signature made Fri 04 Oct 2019 17:57:48 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (29 commits) target/i386/kvm: Silence warning from Valgrind about uninitialized bytes target/i386: work around KVM_GET_MSRS bug for secondary execution controls target/i386: add VMX features vmxcap: correct the name of the variables target/i386: add VMX definitions target/i386: expand feature words to 64 bits target/i386: introduce generic feature dependency mechanism target/i386: handle filtered_features in a new function mark_unavailable_features tests/docker: only enable ubsan for test-clang win32: work around main-loop busy loop on socket/fd event tests: skip serial test on windows util: WSAEWOULDBLOCK on connect should map to EINPROGRESS Fix wrong behavior of cpu_memory_rw_debug() function in SMM memory: allow memory_region_register_iommu_notifier() to fail vfio: Turn the container error into an Error handle i386: Add CPUID bit for CLZERO and XSAVEERPTR docker: test-debug: disable LeakSanitizer lm32: do not leak memory on object_new/object_unref cris: do not leak struct cris_disasm_data mips: fix memory leaks in board initialization ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * | target/i386/kvm: Silence warning from Valgrind about uninitialized bytesThomas Huth2019-10-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I run QEMU with KVM under Valgrind, I currently get this warning: Syscall param ioctl(generic) points to uninitialised byte(s) at 0x95BA45B: ioctl (in /usr/lib64/libc-2.28.so) by 0x429DC3: kvm_ioctl (kvm-all.c:2365) by 0x51B249: kvm_arch_get_supported_msr_feature (kvm.c:469) by 0x4C2A49: x86_cpu_get_supported_feature_word (cpu.c:3765) by 0x4C4116: x86_cpu_expand_features (cpu.c:5065) by 0x4C7F8D: x86_cpu_realizefn (cpu.c:5242) by 0x5961F3: device_set_realized (qdev.c:835) by 0x7038F6: property_set_bool (object.c:2080) by 0x707EFE: object_property_set_qobject (qom-qobject.c:26) by 0x705814: object_property_set_bool (object.c:1338) by 0x498435: pc_new_cpu (pc.c:1549) by 0x49C67D: pc_cpus_init (pc.c:1681) Address 0x1ffeffee74 is on thread 1's stack in frame #2, created by kvm_arch_get_supported_msr_feature (kvm.c:445) It's harmless, but a little bit annoying, so silence it by properly initializing the whole structure with zeroes. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | target/i386: work around KVM_GET_MSRS bug for secondary execution controlsPaolo Bonzini2019-10-041-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some secondary controls are automatically enabled/disabled based on the CPUID values that are set for the guest. However, they are still available at a global level and therefore should be present when KVM_GET_MSRS is sent to /dev/kvm. Unfortunately KVM forgot to include those, so fix that. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | target/i386: add VMX featuresPaolo Bonzini2019-10-043-2/+394
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add code to convert the VMX feature words back into MSR values, allowing the user to enable/disable VMX features as they wish. The same infrastructure enables support for limiting VMX features in named CPU models. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | vmxcap: correct the name of the variablesPaolo Bonzini2019-10-041-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | The low bits are 1 if the control must be one, the high bits are 1 if the control can be one. Correct the variable names as they are very confusing. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | target/i386: add VMX definitionsPaolo Bonzini2019-10-041-0/+130
| | | | | | | | | | | | | | | | | | | | | These will be used to compile the list of VMX features for named CPU models, and/or by the code that sets up the VMX MSRs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | target/i386: expand feature words to 64 bitsPaolo Bonzini2019-10-044-37/+40
| | | | | | | | | | | | | | | | | | | | | | | | VMX requires 64-bit feature words for the IA32_VMX_EPT_VPID_CAP and IA32_VMX_BASIC MSRs. (The VMX control MSRs are 64-bit wide but actually have only 32 bits of information). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | target/i386: introduce generic feature dependency mechanismPaolo Bonzini2019-10-041-24/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes a CPU feature does not make sense unless another is present. In the case of VMX features, KVM does not even allow setting the VMX controls to some invalid combinations. Therefore, this patch adds a generic mechanism that looks for bits that the user explicitly cleared, and uses them to remove other bits from the expanded CPU definition. If these dependent bits were also explicitly *set* by the user, this will be a warning for "-cpu check" and an error for "-cpu enforce". If not, then the dependent bits are cleared silently, for convenience. With VMX features, this will be used so that for example "-cpu host,-rdrand" will also hide support for RDRAND exiting. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | target/i386: handle filtered_features in a new function ↵Paolo Bonzini2019-10-041-39/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mark_unavailable_features The next patch will add a different reason for filtering features, unrelated to host feature support. Extract a new function that takes care of disabling the features and optionally reporting them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | tests/docker: only enable ubsan for test-clangPaolo Bonzini2019-10-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | -fsanitize=undefined is not the same thing as --enable-sanitizers. After commit 47c823e ("tests/docker: add sanitizers back to clang build", 2019-09-11) test-clang is almost duplicating the asan (test-debug) test, so partly revert commit 47c823e5b while leaving ubsan enabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | win32: work around main-loop busy loop on socket/fd eventMarc-André Lureau2019-10-041-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 05e514b1d4d5bd4209e2c8bbc76ff05c85a235f3 introduced an AIO context optimization to avoid calling event_notifier_test_and_clear() on ctx->notifier. On Windows, the same notifier is being used to wakeup the wait on socket events (see commit d3385eb448e38f828c78f8f68ec5d79c66a58b5d). The ctx->notifier event is added to the gpoll sources in aio_set_event_notifier(), aio_ctx_check() should clear the event regardless of ctx->notified, since Windows sets the event by itself, bypassing the aio->notified. This fixes qemu not clearing the event resulting in a busy loop. Paolo suggested to me on irc to call event_notifier_test_and_clear() after select() >0 from aio-win32.c's aio_prepare. Unfortunately, not all fds associated with ctx->notifiers are in AIO fd handlers set. (qemu_set_nonblock() in util/oslib-win32.c calls qemu_fd_register()). This is essentially a v2 of a patch that was sent earlier: https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg00420.html that resurfaced when James investigated Spice performance issues on Windows: https://gitlab.freedesktop.org/spice/spice/issues/36 In order to test that patch, I simply tried running test-char on win32, and it hangs. Applying that patch solves it. QIO idle sources are not dispatched. I haven't investigated much further, I suspect source priorities and busy looping still come into play. This version keeps the "notified" field, so event_notifier_poll() should still work as expected. Cc: James Le Cuirot <chewi@gentoo.org> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | tests: skip serial test on windowsMarc-André Lureau2019-10-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Serial test is currently hard-coded to /dev/null. On Windows, serial chardev expect a COM: device, which may not be availble. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | util: WSAEWOULDBLOCK on connect should map to EINPROGRESSMarc-André Lureau2019-10-041-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In general, WSAEWOULDBLOCK can be mapped to EAGAIN as done by socket_error() (or EWOULDBLOCK). But for connect() with non-blocking sockets, it actually means the operation is in progress: https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-connect "The socket is marked as nonblocking and the connection cannot be completed immediately." (this is also the behaviour implemented by GLib GSocket) This fixes socket_can_bind_connect() test on win32. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | Fix wrong behavior of cpu_memory_rw_debug() function in SMMDmitry Poletaev2019-10-043-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | There is a problem, that you don't have access to the data using cpu_memory_rw_debug() function when in SMM. You can't remotely debug SMM mode program because of that for example. Likely attrs version of get_phys_page_debug should be used to get correct asidx at the end to handle access properly. Here the patch to fix it. Signed-off-by: Dmitry Poletaev <poletaev@ispras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | memory: allow memory_region_register_iommu_notifier() to failEric Auger2019-10-049-43/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when a notifier is attempted to be registered and its flags are not supported (especially the MAP one) by the IOMMU MR, we generally abruptly exit in the IOMMU code. The failure could be handled more nicely in the caller and especially in the VFIO code. So let's allow memory_region_register_iommu_notifier() to fail as well as notify_flag_changed() callback. All sites implementing the callback are updated. This patch does not yet remove the exit(1) in the amd_iommu code. in SMMUv3 we turn the warning message into an error message saying that the assigned device would not work properly. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | vfio: Turn the container error into an Error handleEric Auger2019-10-043-17/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The container error integer field is currently used to store the first error potentially encountered during any vfio_listener_region_add() call. However this fails to propagate detailed error messages up to the vfio_connect_container caller. Instead of using an integer, let's use an Error handle. Messages are slightly reworded to accomodate the propagation. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | i386: Add CPUID bit for CLZERO and XSAVEERPTRSebastian Andrzej Siewior2019-10-042-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | The CPUID bits CLZERO and XSAVEERPTR are availble on AMD's ZEN platform and could be passed to the guest. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | docker: test-debug: disable LeakSanitizerPaolo Bonzini2019-10-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are just too many leaks in device-introspect-test (especially for the plethora of arm and aarch64 boards) to make LeakSanitizer useful; disable it for now. Whoever is interested in debugging leaks can also use valgrind like this: QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 \ QTEST_QEMU_IMG=qemu-img \ valgrind --trace-children=yes --leak-check=full \ tests/device-introspect-test -p /aarch64/device/introspect/concrete/defaults/none Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | lm32: do not leak memory on object_new/object_unrefPaolo Bonzini2019-10-042-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bottom halves and ptimers are malloced, but nothing in these files is freeing memory allocated by instance_init. Since these are sysctl devices that are never unrealized, just moving the allocations to realize is enough to avoid the leak in practice (and also to avoid upsetting asan when running device-introspect-test). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
| * | cris: do not leak struct cris_disasm_dataPaolo Bonzini2019-10-041-30/+29Star
| | | | | | | | | | | | | | | | | | Use a stack-allocated struct to avoid a memory leak. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | mips: fix memory leaks in board initializationPaolo Bonzini2019-10-042-0/+3
| | | | | | | | | | | | | | | | | | | | | They are not a big deal, but they upset asan. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com>