summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kvm/book3s_hv_rmhandlers.S
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'powerpc-5.2-5' of ↵Linus Torvalds2019-06-221-2/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "This is a frustratingly large batch at rc5. Some of these were sent earlier but were missed by me due to being distracted by other things, and some took a while to track down due to needing manual bisection on old hardware. But still we clearly need to improve our testing of KVM, and of 32-bit, so that we catch these earlier. Summary: seven fixes, all for bugs introduced this cycle. - The commit to add KASAN support broke booting on 32-bit SMP machines, due to a refactoring that moved some setup out of the secondary CPU path. - A fix for another 32-bit SMP bug introduced by the fast syscall entry implementation for 32-bit BOOKE. And a build fix for the same commit. - Our change to allow the DAWR to be force enabled on Power9 introduced a bug in KVM, where we clobber r3 leading to a host crash. - The same commit also exposed a previously unreachable bug in the nested KVM handling of DAWR, which could lead to an oops in a nested host. - One of the DMA reworks broke the b43legacy WiFi driver on some people's powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit. - A fix for TLB flushing in KVM introduced a new bug, as it neglected to also flush the ERAT, this could lead to memory corruption in the guest. Thanks to: Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry Finger, Michael Neuling, Suraj Jitindar Singh" * tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr() powerpc/32: fix build failure on book3e with KVM powerpc/booke: fix fast syscall entry on SMP powerpc/32s: fix initial setup of segment registers on secondary CPU
| * KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real modeSuraj Jitindar Singh2019-06-181-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hcall H_SET_DAWR is used by a guest to set the data address watchpoint register (DAWR). This hcall is handled in the host in kvmppc_h_set_dawr() which can be called in either real mode on the guest exit path from hcall_try_real_mode() in book3s_hv_rmhandlers.S, or in virtual mode when called from kvmppc_pseries_do_hcall() in book3s_hv.c. The function kvmppc_h_set_dawr() updates the dawr and dawrx fields in the vcpu struct accordingly and then also writes the respective values into the DAWR and DAWRX registers directly. It is necessary to write the registers directly here when calling the function in real mode since the path to re-enter the guest won't do this. However when in virtual mode the host DAWR and DAWRX values have already been restored, and so writing the registers would overwrite these. Additionally there is no reason to write the guest values here as these will be read from the vcpu struct and written to the registers appropriately the next time the vcpu is run. This also avoids the case when handling h_set_dawr for a nested guest where the guest hypervisor isn't able to write the DAWR and DAWRX registers directly and must rely on the real hypervisor to do this for it when it calls H_ENTER_NESTED. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()Michael Neuling2019-06-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") screwed up some assembler and corrupted a pointer in r3. This resulted in crashes like the below: BUG: Kernel NULL pointer dereference at 0x000013bf Faulting instruction address: 0xc00000000010b044 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc4+ #3 NIP: c00000000010b044 LR: c0080000089dacf4 CTR: c00000000010aff4 REGS: c00000179b397710 TRAP: 0300 Not tainted (5.2.0-rc4+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 42244842 XER: 00000000 CFAR: c00000000010aff8 DAR: 00000000000013bf DSISR: 42000000 IRQMASK: 0 GPR00: c0080000089dd6bc c00000179b3979a0 c008000008a04300 ffffffffffffffff GPR04: 0000000000000000 0000000000000003 000000002444b05d c0000017f11c45d0 ... NIP kvmppc_h_set_dabr+0x50/0x68 LR kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv] Call Trace: 0xc0000017f11c0000 (unreliable) kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv] kvmppc_vcpu_run+0x34/0x48 [kvm] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm] kvm_vcpu_ioctl+0x460/0x850 [kvm] do_vfs_ioctl+0xe4/0xb40 ksys_ioctl+0xc4/0x110 sys_ioctl+0x28/0x80 system_call+0x5c/0x70 Instruction dump: 4082fff4 4c00012c 38600000 4e800020 e96280c0 896b0000 2c2b0000 3860ffff 4d820020 50852e74 508516f6 78840724 <f88313c0> f8a313c8 7c942ba6 7cbc2ba6 Fix the bug by only changing r3 when we are returning immediately. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174Thomas Gleixner2019-05-301-8/+1Star
|/ | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 655 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070034.575739538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2019-05-171-53/+33Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "ARM: - support for SVE and Pointer Authentication in guests - PMU improvements POWER: - support for direct access to the POWER9 XIVE interrupt controller - memory and performance optimizations x86: - support for accessing memory not backed by struct page - fixes and refactoring Generic: - dirty page tracking improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (155 commits) kvm: fix compilation on aarch64 Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU" kvm: x86: Fix L1TF mitigation for shadow MMU KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible KVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device KVM: PPC: Book3S HV: XIVE: Fix spelling mistake "acessing" -> "accessing" KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUs kvm: nVMX: Set nested_run_pending in vmx_set_nested_state after checks complete tests: kvm: Add tests for KVM_SET_NESTED_STATE KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state tests: kvm: Add tests for KVM_CAP_MAX_VCPUS and KVM_CAP_MAX_CPU_ID tests: kvm: Add tests to .gitignore KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 KVM: Fix kvm_clear_dirty_log_protect off-by-(minus-)one KVM: Fix the bitmap range to copy during clear dirty KVM: arm64: Fix ptrauth ID register masking logic KVM: x86: use direct accessors for RIP and RSP KVM: VMX: Use accessors for GPRs outside of dedicated caching logic KVM: x86: Omit caching logic for always-available GPRs kvm, x86: Properly check whether a pfn is an MMIO or not ...
| * KVM: PPC: Book3S HV: Make sure to load LPID for radix VCPUsPaul Mackerras2019-05-141-6/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 70ea13f6e609 ("KVM: PPC: Book3S HV: Flush TLB on secondary radix threads", 2019-04-29) aimed to make radix guests that are using the real-mode entry path load the LPID register and flush the TLB in the same place where those things are done for HPT guests. However, it omitted to remove a branch which branches around that code for radix guests. The result is that with indep_thread_mode = N, radix guests don't run correctly. (With indep_threads_mode = Y, which is the default, radix guests use a different entry path.) This removes the offending branch, and also the load and compare that the branch depends on, since the cr7 setting is now unused. Reported-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Fixes: 70ea13f6e609 ("KVM: PPC: Book3S HV: Flush TLB on secondary radix threads") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras2019-04-301-10/+13
| |\ | | | | | | | | | | | | | | | | | | | | | This merges in the ppc-kvm topic branch from the powerpc tree to get patches which touch both general powerpc code and KVM code, one of which is a prerequisite for following patches. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | KVM: PPC: Book3S HV: Flush TLB on secondary radix threadsPaul Mackerras2019-04-301-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running on POWER9 with kvm_hv.indep_threads_mode = N and the host in SMT1 mode, KVM will run guest VCPUs on offline secondary threads. If those guests are in radix mode, we fail to load the LPID and flush the TLB if necessary, leading to the guest crashing with an unsupported MMU fault. This arises from commit 9a4506e11b97 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on", 2018-05-17), which didn't consider the case where indep_threads_mode = N. For simplicity, this makes the real-mode guest entry path flush the TLB in the same place for both radix and hash guests, as we did before 9a4506e11b97, though the code is now C code rather than assembly code. We also have the radix TLB flush open-coded rather than calling radix__local_flush_tlb_lpid_guest(), because the TLB flush can be called in real mode, and in real mode we don't want to invoke the tracepoint code. Fixes: 9a4506e11b97 ("KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | KVM: PPC: Book3S HV: Move HPT guest TLB flushing to C codePaul Mackerras2019-04-301-34/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces assembler code in book3s_hv_rmhandlers.S that checks the kvm->arch.need_tlb_flush cpumask and optionally does a TLB flush with C code in book3s_hv_builtin.c. Note that unlike the radix version, the hash version doesn't do an explicit ERAT invalidation because we will invalidate and load up the SLB before entering the guest, and that will invalidate the ERAT. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | KVM: PPC: Book3S HV: Handle virtual mode in XIVE VCPU push codeSuraj Jitindar Singh2019-04-301-11/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in book3s_hv_rmhandlers.S that pushes the XIVE virtual CPU context to the hardware currently assumes it is being called in real mode, which is usually true. There is however a path by which it can be executed in virtual mode, in the case where indep_threads_mode = N. A virtual CPU executing on an offline secondary thread can take a hypervisor interrupt in virtual mode and return from the kvmppc_hv_entry() call after the kvm_secondary_got_guest label. It is possible for it to be given another vcpu to execute before it gets to execute the stop instruction. In that case it will call kvmppc_hv_entry() for the second VCPU in virtual mode, and the XIVE vCPU push code will be executed in virtual mode. The result in that case will be a host crash due to an unexpected data storage interrupt caused by executing the stdcix instruction in virtual mode. This fixes it by adding a code path for virtual mode, which uses the virtual TIMA pointer and normal load/store instructions. [paulus@ozlabs.org - wrote patch description] Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | KVM: PPC: Book3S HV: Implement real mode H_PAGE_INIT handlerSuraj Jitindar Singh2019-04-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement a real mode handler for the H_CALL H_PAGE_INIT which can be used to zero or copy a guest page. The page is defined to be 4k and must be 4k aligned. The in-kernel real mode handler halves the time to handle this H_CALL compared to handling it in userspace for a hash guest. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | | powerpc/64s: Reimplement book3s idle code in CNicholas Piggin2019-04-301-46/+72
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reimplement Book3S idle code in C, moving POWER7/8/9 implementation speific HV idle code to the powernv platform code. Book3S assembly stubs are kept in common code and used only to save the stack frame and non-volatile GPRs before executing architected idle instructions, and restoring the stack and reloading GPRs then returning to C after waking from idle. The complex logic dealing with threads and subcores, locking, SPRs, HMIs, timebase resync, etc., is all done in C which makes it more maintainable. This is not a strict translation to C code, there are some significant differences: - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs, but saves and restores them itself. - The optimisation where EC=ESL=0 idle modes did not have to save GPRs or change MSR is restored, because it's now simple to do. ESL=1 sleeps that do not lose GPRs can use this optimization too. - KVM secondary entry and cede is now more of a call/return style rather than branchy. nap_state_lost is not required because KVM always returns via NVGPR restoring path. - KVM secondary wakeup from offline sequence is moved entirely into the offline wakeup, which avoids a hwsync in the normal idle wakeup path. Performance measured with context switch ping-pong on different threads or cores, is possibly improved a small amount, 1-3% depending on stop state and core vs thread test for shallow states. Deep states it's in the noise compared with other latencies. KVM improvements: - Idle sleepers now always return to caller rather than branch out to KVM first. - This allows optimisations like very fast return to caller when no state has been lost. - KVM no longer requires nap_state_lost because it controls NVGPR save/restore itself on the way in and out. - The heavy idle wakeup KVM request check can be moved out of the normal host idle code and into the not-performance-critical offline code. - KVM nap code now returns from where it is called, which makes the flow a bit easier to follow. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Squash the KVM changes in] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: Add force enable of DAWR on P9 optionMichael Neuling2019-04-201-10/+13
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag so that the DAWR can be enabled on P9 via: echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous The DAWR was previously force disabled on POWER9 in: 9654153158 powerpc: Disable DAWR in the base POWER9 CPU features Also see Documentation/powerpc/DAWR-POWER9.txt This is a dangerous setting, USE AT YOUR OWN RISK. Some users may not care about a bad user crashing their box (ie. single user/desktop systems) and really want the DAWR. This allows them to force enable DAWR. This flag can also be used to disable DAWR access. Once this is cleared, all DAWR access should be cleared immediately and your machine once again safe from crashing. Userspace may get confused by toggling this. If DAWR is force enabled/disabled between getting the number of breakpoints (via PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an inconsistent view of what's available. Similarly for guests. For the DAWR to be enabled in a KVM guest, the DAWR needs to be force enabled in the host AND the guest. For this reason, this won't work on POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the dawr_enable_dangerous file will fail if the hypervisor doesn't support writing the DAWR. To double check the DAWR is working, run this kernel selftest: tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c Any errors/failures/skips mean something is wrong. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras2019-02-221-46/+20Star
|\ | | | | | | | | | | | | | | | | This merges in the "ppc-kvm" topic branch of the powerpc tree to get a series of commits that touch both general arch/powerpc code and KVM code. These commits will be merged both via the KVM tree and the powerpc tree. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * powerpc/kvm: Save and restore host AMR/IAMR/UAMORMichael Ellerman2019-02-221-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the hash MMU is active the AMR, IAMR and UAMOR are used for pkeys. The AMR is directly writable by user space, and the UAMOR masks those writes, meaning both registers are effectively user register state. The IAMR is used to create an execute only key. Also we must maintain the value of at least the AMR when running in process context, so that any memory accesses done by the kernel on behalf of the process are correctly controlled by the AMR. Although we are correctly switching all registers when going into a guest, on returning to the host we just write 0 into all regs, except on Power9 where we restore the IAMR correctly. This could be observed by a user process if it writes the AMR, then runs a guest and we then return immediately to it without rescheduling. Because we have written 0 to the AMR that would have the effect of granting read/write permission to pages that the process was trying to protect. In addition, when using the Radix MMU, the AMR can prevent inadvertent kernel access to userspace data, writing 0 to the AMR disables that protection. So save and restore AMR, IAMR and UAMOR. Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Simplify machine check handlingPaul Mackerras2019-02-211-37/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the handling of machine check interrupts that occur inside a guest simpler and more robust, with less done in assembler code and in real mode. Now, when a machine check occurs inside a guest, we always get the machine check event struct and put a copy in the vcpu struct for the vcpu where the machine check occurred. We no longer call machine_check_queue_event() from kvmppc_realmode_mc_power7(), because on POWER8, when a vcpu is running on an offline secondary thread and we call machine_check_queue_event(), that calls irq_work_queue(), which doesn't work because the CPU is offline, but instead triggers the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which fires again and again because nothing clears the condition). All that machine_check_queue_event() actually does is to cause the event to be printed to the console. For a machine check occurring in the guest, we now print the event in kvmppc_handle_exit_hv() instead. The assembly code at label machine_check_realmode now just calls C code and then continues exiting the guest. We no longer either synthesize a machine check for the guest in assembly code or return to the guest without a machine check. The code in kvmppc_handle_exit_hv() is extended to handle the case where the guest is not FWNMI-capable. In that case we now always synthesize a machine check interrupt for the guest. Previously, if the host thinks it has recovered the machine check fully, it would return to the guest without any notification that the machine check had occurred. If the machine check was caused by some action of the guest (such as creating duplicate SLB entries), it is much better to tell the guest that it has caused a problem. Therefore we now always generate a machine check interrupt for guests that are not FWNMI-capable. Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | KVM: PPC: Book3S HV: Fix build failure without IOMMU supportJordan Niethe2019-02-221-0/+10
|/ | | | | | | | | | | | | | | | | | | Currently trying to build without IOMMU support will fail: (.text+0x1380): undefined reference to `kvmppc_h_get_tce' (.text+0x1384): undefined reference to `kvmppc_rm_h_put_tce' (.text+0x149c): undefined reference to `kvmppc_rm_h_stuff_tce' (.text+0x14a0): undefined reference to `kvmppc_rm_h_put_tce_indirect' This happens because turning off IOMMU support will prevent book3s_64_vio_hv.c from being built because it is only built when SPAPR_TCE_IOMMU is set, which depends on IOMMU support. Fix it using ifdefs for the undefined references. Fixes: 76d837a4c0f9 ("KVM: PPC: Book3S PR: Don't include SPAPR TCE code on non-pseries platforms") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* KVM: PPC: Book3S HV: Handle hypercalls correctly when nestedPaul Mackerras2018-10-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we are running as a nested hypervisor, we use a hypercall to enter the guest rather than code in book3s_hv_rmhandlers.S. This means that the hypercall handlers listed in hcall_real_table never get called. There are some hypercalls that are handled there and not in kvmppc_pseries_do_hcall(), which therefore won't get processed for a nested guest. To fix this, we add cases to kvmppc_pseries_do_hcall() to handle those hypercalls, with the following exceptions: - The HPT hypercalls (H_ENTER, H_REMOVE, etc.) are not handled because we only support radix mode for nested guests. - H_CEDE has to be handled specially because the cede logic in kvmhv_run_single_vcpu assumes that it has been processed by the time that kvmhv_p9_guest_entry() returns. Therefore we put a special case for H_CEDE in kvmhv_p9_guest_entry(). For the XICS hypercalls, if real-mode processing is enabled, then the virtual-mode handlers assume that they are being called only to finish up the operation. Therefore we turn off the real-mode flag in the XICS code when running as a nested hypervisor. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* KVM: PPC: Book3S HV: Nested guest entry via hypercallPaul Mackerras2018-10-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | This adds a new hypercall, H_ENTER_NESTED, which is used by a nested hypervisor to enter one of its nested guests. The hypercall supplies register values in two structs. Those values are copied by the level 0 (L0) hypervisor (the one which is running in hypervisor mode) into the vcpu struct of the L1 guest, and then the guest is run until an interrupt or error occurs which needs to be reported to L1 via the hypercall return value. Currently this assumes that the L0 and L1 hypervisors are the same endianness, and the structs passed as arguments are in native endianness. If they are of different endianness, the version number check will fail and the hcall will be rejected. Nested hypervisors do not support indep_threads_mode=N, so this adds code to print a warning message if the administrator has set indep_threads_mode=N, and treat it as Y. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu structPaul Mackerras2018-10-091-2/+2
| | | | | | | | | | | | | | When the 'regs' field was added to struct kvm_vcpu_arch, the code was changed to use several of the fields inside regs (e.g., gpr, lr, etc.) but not the ccr field, because the ccr field in struct pt_regs is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is only 32 bits. This changes the code to use the regs.ccr field instead of cr, and changes the assembly code on 64-bit platforms to use 64-bit loads and stores instead of 32-bit ones. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guestsPaul Mackerras2018-10-091-3/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This creates an alternative guest entry/exit path which is used for radix guests on POWER9 systems when we have indep_threads_mode=Y. In these circumstances there is exactly one vcpu per vcore and there is no coordination required between vcpus or vcores; the vcpu can enter the guest without needing to synchronize with anything else. The new fast path is implemented almost entirely in C in book3s_hv.c and runs with the MMU on until the guest is entered. On guest exit we use the existing path until the point where we are committed to exiting the guest (as distinct from handling an interrupt in the low-level code and returning to the guest) and we have pulled the guest context from the XIVE. At that point we check a flag in the stack frame to see whether we came in via the old path and the new path; if we came in via the new path then we go back to C code to do the rest of the process of saving the guest context and restoring the host context. The C code is split into separate functions for handling the OS-accessible state and the hypervisor state, with the idea that the latter can be replaced by a hypercall when we implement nested virtualization. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [mpe: Fix CONFIG_ALTIVEC=n build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* KVM: PPC: Book3S: Rework TM save/restore code and make it C-callablePaul Mackerras2018-10-091-26/+23Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a parameter to __kvmppc_save_tm and __kvmppc_restore_tm which allows the caller to indicate whether it wants the nonvolatile register state to be preserved across the call, as required by the C calling conventions. This parameter being non-zero also causes the MSR bits that enable TM, FP, VMX and VSX to be preserved. The condition register and DSCR are now always preserved. With this, kvmppc_save_tm_hv and kvmppc_restore_tm_hv can be called from C code provided the 3rd parameter is non-zero. So that these functions can be called from modules, they now include code to set the TOC pointer (r2) on entry, as they can call other built-in C functions which will assume the TOC to have been set. Also, the fake suspend code in kvmppc_save_tm_hv is modified here to assume that treclaim in fake-suspend state does not modify any registers, which is the case on POWER9. This enables the code to be simplified quite a bit. _kvmppc_save_tm_pr and _kvmppc_restore_tm_pr become much simpler with this change, since they now only need to save and restore TAR and pass 1 for the 3rd argument to __kvmppc_{save,restore}_tm. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* KVM: PPC: Book3S HV: Simplify real-mode interrupt handlingPaul Mackerras2018-10-091-109/+111
| | | | | | | | | | | | | | | | This streamlines the first part of the code that handles a hypervisor interrupt that occurred in the guest. With this, all of the real-mode handling that occurs is done before the "guest_exit_cont" label; once we get to that label we are committed to exiting to host virtual mode. Thus the machine check and HMI real-mode handling is moved before that label. Also, the code to handle external interrupts is moved out of line, as is the code that calls kvmppc_realmode_hmi_handler(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* KVM: PPC: Book3S HV: Extract PMU save/restore operations as C-callable functionsPaul Mackerras2018-10-091-164/+199
| | | | | | | | | | | | This pulls out the assembler code that is responsible for saving and restoring the PMU state for the host and guest into separate functions so they can be used from an alternate entry path. The calling convention is made compatible with C. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* KVM: PPC: Book3S HV: Move interrupt delivery on guest entry to C codePaul Mackerras2018-10-091-53/+17Star
| | | | | | | | | | | | | | | This is based on a patch by Suraj Jitindar Singh. This moves the code in book3s_hv_rmhandlers.S that generates an external, decrementer or privileged doorbell interrupt just before entering the guest to C code in book3s_hv_builtin.c. This is to make future maintenance and modification easier. The algorithm expressed in the C code is almost identical to the previous algorithm. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* KVM: PPC: Book3S: Simplify external interrupt handlingPaul Mackerras2018-10-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we use two bits in the vcpu pending_exceptions bitmap to indicate that an external interrupt is pending for the guest, one for "one-shot" interrupts that are cleared when delivered, and one for interrupts that persist until cleared by an explicit action of the OS (e.g. an acknowledge to an interrupt controller). The BOOK3S_IRQPRIO_EXTERNAL bit is used for one-shot interrupt requests and BOOK3S_IRQPRIO_EXTERNAL_LEVEL is used for persisting interrupts. In practice BOOK3S_IRQPRIO_EXTERNAL never gets used, because our Book3S platforms generally, and pseries in particular, expect external interrupt requests to persist until they are acknowledged at the interrupt controller. That combined with the confusion introduced by having two bits for what is essentially the same thing makes it attractive to simplify things by only using one bit. This patch does that. With this patch there is only BOOK3S_IRQPRIO_EXTERNAL, and by default it has the semantics of a persisting interrupt. In order to avoid breaking the ABI, we introduce a new "external_oneshot" flag which preserves the behaviour of the KVM_INTERRUPT ioctl with the KVM_INTERRUPT_SET argument. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: clean inclusions of asm/feature-fixups.hChristophe Leroy2018-07-301-0/+1
| | | | | | | | files not using feature fixup don't need asm/feature-fixups.h files using feature fixup need asm/feature-fixups.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: move ASM_CONST and stringify_in_c() into asm-const.hChristophe Leroy2018-07-301-0/+1
| | | | | | | | | This patch moves ASM_CONST() and stringify_in_c() into dedicated asm-const.h, then cleans all related inclusions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: asm-compat.h should include asm-const.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Remove POWER9 DD1 supportNicholas Piggin2018-07-161-14/+2Star
| | | | | | | | | | | POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge tag 'kvm-ppc-next-4.18-2' of ↵Paolo Bonzini2018-06-141-241/+77Star
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
| * KVM: PPC: Book3S PR: Add C function wrapper for _kvmppc_save/restore_tm()Simon Guo2018-06-011-3/+3
| | | | | | | | | | | | | | | | | | Currently __kvmppc_save/restore_tm() APIs can only be invoked from assembly function. This patch adds C function wrappers for them so that they can be safely called from C function. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S PR: Add guest MSR parameter for ↵Simon Guo2018-06-011-12/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvmppc_save_tm()/kvmppc_restore_tm() HV KVM and PR KVM need different MSR source to indicate whether treclaim. or trecheckpoint. is necessary. This patch add new parameter (guest MSR) for these kvmppc_save_tm/ kvmppc_restore_tm() APIs: - For HV KVM, it is VCPU_MSR - For PR KVM, it is current host MSR or VCPU_SHADOW_SRR1 This enhancement enables these 2 APIs to be reused by PR KVM later. And the patch keeps HV KVM logic unchanged. This patch also reworks kvmppc_save_tm()/kvmppc_restore_tm() to have a clean ABI: r3 for vcpu and r4 for guest_msr. During kvmppc_save_tm/kvmppc_restore_tm(), the R1 need to be saved or restored. Currently the R1 is saved into HSTATE_HOST_R1. In PR KVM, we are going to add a C function wrapper for kvmppc_save_tm/kvmppc_restore_tm() where the R1 will be incremented with added stackframe and save into HSTATE_HOST_R1. There are several places in HV KVM to load HSTATE_HOST_R1 as R1, and we don't want to bring risk or confusion by TM code. This patch will use HSTATE_SCRATCH2 to save/restore R1 in kvmppc_save_tm/kvmppc_restore_tm() to avoid future confusion, since the r1 is actually a temporary/scratch value to be saved/stored. [paulus@ozlabs.org - rebased on top of 7b0e827c6970 ("KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)] Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S PR: Move kvmppc_save_tm/kvmppc_restore_tm to separate fileSimon Guo2018-05-311-231/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is a simple patch just for moving kvmppc_save_tm/kvmppc_restore_tm() functionalities to tm.S. There is no logic change. The reconstruct of those APIs will be done in later patches to improve readability. It is for preparation of reusing those APIs on both HV/PR PPC KVM. Some slight change during move the functions includes: - surrounds some HV KVM specific code with CONFIG_KVM_BOOK3S_HV_POSSIBLE for compilation. - use _GLOBAL() to define kvmppc_save_tm/kvmppc_restore_tm() [paulus@ozlabs.org - rebased on top of 7b0e827c6970 ("KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)] Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tmPaul Mackerras2018-05-311-69/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This splits out the handling of "fake suspend" mode, part of the hypervisor TM assist code for POWER9, and puts almost all of it in new kvmppc_save_tm_hv and kvmppc_restore_tm_hv functions. The new functions branch to kvmppc_save/restore_tm if the CPU does not require hypervisor TM assistance. With this, it will be more straightforward to move kvmppc_save_tm and kvmppc_restore_tm to another file and use them for transactional memory support in PR KVM. Additionally, it also makes the code a bit clearer and reduces the number of feature sections. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Fix kvmppc_bad_host_intr for real mode interruptsNicholas Piggin2018-05-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_RELOCATABLE=n, the Linux real mode interrupt handlers call into KVM using real address. This needs to be translated to the kernel linear effective address before the MMU is switched on. kvmppc_bad_host_intr misses adding these bits, so when it is used to handle a system reset interrupt (that always gets delivered in real mode), it results in an instruction access fault immediately after the MMU is turned on. Fix this by ensuring the top 2 address bits are set when the MMU is turned on. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with ↵Nicholas Piggin2018-05-181-7/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | relocation on The radix guest code can has fewer restrictions about what context it can run in, so move this flushing out of assembly and have it use the Linux TLB flush implementations introduced previously. This allows powerpc:tlbie trace events to be used. This changes the tlbiel sequence to only execute RIC=2 flush once on the first set flushed, then RIC=0 for the rest of the sets. The end result of the flush should be unchanged. This matches the local PID flush pattern that was introduced in a5998fcb92 ("powerpc/mm/radix: Optimise tlbiel flush all case"). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | Merge tag 'powerpc-4.18-1' of ↵Linus Torvalds2018-06-071-0/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for split PMD page table lock on 64-bit Book3S (Power8/9). - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support live patching again. - Add support for patching barrier_nospec in copy_from_user() and syscall entry. - A couple of fixes for our data breakpoints on Book3S. - A series from Nick optimising TLB/mm handling with the Radix MMU. - Numerous small cleanups to squash sparse/gcc warnings from Mathieu Malaterre. - Several series optimising various parts of the 32-bit code from Christophe Leroy. - Removal of support for two old machines, "SBC834xE" and "C2K" ("GEFanuc,C2K"), which is why the diffstat has so many deletions. And many other small improvements & fixes. There's a few out-of-area changes. Some minor ftrace changes OK'ed by Steve, and a fix to our powernv cpuidle driver. Then there's a series touching mm, x86 and fs/proc/task_mmu.c, which cleans up some details around pkey support. It was ack'ed/reviewed by Ingo & Dave and has been in next for several weeks. Thanks to: Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Balbir Singh, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Dave Hansen, Fabio Estevam, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Haren Myneni, Hari Bathini, Ingo Molnar, Jonathan Neuschäfer, Josh Poimboeuf, Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu Malaterre, Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao, Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica Gupta, Ravi Bangoria, Russell Currey, Sam Bobroff, Samuel Mendoza-Jonas, Segher Boessenkool, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stewart Smith, Thiago Jung Bauermann, Torsten Duwe, Vaibhav Jain, Wei Yongjun, Wolfram Sang, Yisheng Xie, YueHaibing" * tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (251 commits) powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap cpuidle: powernv: Fix promotion from snooze if next state disabled powerpc: fix build failure by disabling attribute-alias warning in pci_32 ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait() powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted" powerpc: fix spelling mistake: "Usupported" -> "Unsupported" powerpc/pkeys: Detach execute_only key on !PROT_EXEC powerpc/powernv: copy/paste - Mask SO bit in CR powerpc: Remove core support for Marvell mv64x60 hostbridges powerpc/boot: Remove core support for Marvell mv64x60 hostbridges powerpc/boot: Remove support for Marvell mv64x60 i2c controller powerpc/boot: Remove support for Marvell MPSC serial controller powerpc/embedded6xx: Remove C2K board support powerpc/lib: optimise PPC32 memcmp powerpc/lib: optimise 32 bits __clear_user() powerpc/time: inline arch_vtime_task_switch() powerpc/Makefile: set -mcpu=860 flag for the 8xx powerpc: Implement csum_ipv6_magic in assembly powerpc/32: Optimise __csum_partial() powerpc/lib: Adjust .balign inside string functions for PPC32 ...
| * powerpc64/ftrace: Disable ftrace during kvm entry/exitNaveen N. Rao2018-05-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During guest entry/exit, we switch over to/from the guest MMU context and we cannot take exceptions in the hypervisor code. Since ftrace may be enabled and since it can result in us taking a trap, disable ftrace by setting paca->ftrace_enabled to zero. There are two paths through which we enter/exit a guest: 1. If we are the vcore runner, then we enter the guest via __kvmppc_vcore_entry() and we disable ftrace around this. This is always the case for Power9, and for the primary thread on Power8. 2. If we are a secondary thread in Power8, then we would be in nap due to SMT being disabled. We are woken up by an IPI to enter the guest. In this scenario, we enter the guest through kvm_start_guest(). We disable ftrace at this point. In this scenario, ftrace would only get re-enabled on the secondary thread when SMT is re-enabled (via start_secondary()). Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | KVM: PPC: Book 3S HV: Do ptesync in radix guest exit pathPaul Mackerras2018-05-171-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A radix guest can execute tlbie instructions to invalidate TLB entries. After a tlbie or a group of tlbies, it must then do the architected sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation has been processed by all CPUs in the system before it can rely on no CPU using any translation that it just invalidated. In fact it is the ptesync which does the actual synchronization in this sequence, and hardware has a requirement that the ptesync must be executed on the same CPU thread as the tlbies which it is expected to order. Thus, if a vCPU gets moved from one physical CPU to another after it has done some tlbies but before it can get to do the ptesync, the ptesync will not have the desired effect when it is executed on the second physical CPU. To fix this, we do a ptesync in the exit path for radix guests. If there are any pending tlbies, this will wait for them to complete. If there aren't, then ptesync will just do the same as sync. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: Snapshot timebase offset on guest entryPaul Mackerras2018-05-171-45/+44Star
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the HV KVM guest entry/exit code adds the timebase offset from the vcore struct to the timebase on guest entry, and subtracts it on guest exit. Which is fine, except that it is possible for userspace to change the offset using the SET_ONE_REG interface while the vcore is running, as there is only one timebase offset per vcore but potentially multiple VCPUs in the vcore. If that were to happen, KVM would subtract a different offset on guest exit from that which it had added on guest entry, leading to the timebase being out of sync between cores in the host, which then leads to bad things happening such as hangs and spurious watchdog timeouts. To fix this, we add a new field 'tb_offset_applied' to the vcore struct which stores the offset that is currently applied to the timebase. This value is set from the vcore tb_offset field on guest entry, and is what is subtracted from the timebase on guest exit. Since it is zero when the timebase offset is not applied, we can simplify the logic in kvmhv_start_timing and kvmhv_accumulate_time. In addition, we had secondary threads reading the timebase while running concurrently with code on the primary thread which would eventually add or subtract the timebase offset from the timebase. This occurred while saving or restoring the DEC register value on the secondary threads. Although no specific incorrect behaviour has been observed, this is a race which should be fixed. To fix it, we move the DEC saving code to just before we call kvmhv_commence_exit, and the DEC restoring code to after the point where we have waited for the primary thread to switch the MMU context and add the timebase offset. That way we are sure that the timebase contains the guest timebase value in both cases. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* Merge tag 'powerpc-4.17-1' of ↵Linus Torvalds2018-04-071-10/+177
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for 4PB user address space on 64-bit, opt-in via mmap(). - Removal of POWER4 support, which was accidentally broken in 2016 and no one noticed, and blocked use of some modern instructions. - Workarounds so that the hypervisor can enable Transactional Memory on Power9. - A series to disable the DAWR (Data Address Watchpoint Register) on Power9. - More information displayed in the meltdown/spectre_v1/v2 sysfs files. - A vpermxor (Power8 Altivec) implementation for the raid6 Q Syndrome. - A big series to make the allocation of our pacas (per cpu area), kernel page tables, and per-cpu stacks NUMA aware when using the Radix MMU on Power9. And as usual many fixes, reworks and cleanups. Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook, Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe, Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring, Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun" * tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits) powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep powerpc/64s: Fix POWER9 DD2.2 and above in cputable features powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead" powerpc: iomap.c: introduce io{read|write}64_{lo_hi|hi_lo} powerpc: io.h: move iomap.h include so that it can use readq/writeq defs cxl: Fix possible deadlock when processing page faults from cxllib powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it powerpc/mm/radix: Update command line parsing for disable_radix powerpc/mm/radix: Parse disable_radix commandline correctly. powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix powerpc/mm/keys: Update documentation and remove unnecessary check powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop() powerpc/powernv: Always stop secondaries before reboot/shutdown powerpc: hard disable irqs in smp_send_stop loop powerpc: use NMI IPI for smp_send_stop powerpc/powernv: Fix SMT4 forcing idle code ...
| * powerpc/kvm: Fix guest boot failure on Power9 since DAWR changesAneesh Kumar K.V2018-03-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SLOF checks for 'sc 1' (hypercall) support by issuing a hcall with H_SET_DABR. Since the recent commit e8ebedbf3131 ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9") changed H_SET_DABR to return H_UNSUPPORTED on Power9, we see guest boot failures, the symptom is the boot seems to just stop in SLOF, eg: SLOF *************************************************************** QEMU Starting Build Date = Sep 24 2017 12:23:07 FW Version = buildd@ release 20170724 <no further output> SLOF can cope if H_SET_DABR returns H_HARDWARE. So wwitch the return value to H_HARDWARE instead of H_UNSUPPORTED so that we don't break the guest boot. That does mean we return a different error to PowerVM in this case, but that's probably not a big concern. Fixes: e8ebedbf3131 ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * Merge branch 'topic/paca' into nextMichael Ellerman2018-03-311-2/+1Star
| |\ | | | | | | | | | | | | | | | | | | | | | Bring in yet another series that touches KVM code, and might need to be merged into the kvm-ppc branch to resolve conflicts. This required some changes in pnv_power9_force_smt4_catch/release() due to the paca array becomming an array of pointers.
| | * powerpc/64s: Do not allocate lppaca if we are not virtualizedNicholas Piggin2018-03-301-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "lppaca" is a structure registered with the hypervisor. This is unnecessary when running on non-virtualised platforms. One field from the lppaca (pmcregs_in_use) is also used by the host, so move the host part out into the paca (lppaca field is still updated in guest mode). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix non-pseries build with some #ifdefs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | KVM: PPC: Book3S HV: Handle migration with POWER9 disabled DAWRMichael Neuling2018-03-271-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POWER9 with the DAWR disabled causes problems for partition migration. Either we have to fail the migration (since we lose the DAWR) or we silently drop the DAWR and allow the migration to pass. This patch does the latter and allows the migration to pass (at the cost of silently losing the DAWR). This is not ideal but hopefully the best overall solution. This approach has been acked by Paulus. With this patch kvmppc_set_one_reg() will store the DAWR in the vcpu but won't actually set it on POWER9 hardware. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9Michael Neuling2018-03-271-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POWER7 compat mode guests can use h_set_dabr on POWER9. POWER9 should use the DAWR but since it's disabled there we can't. This returns H_UNSUPPORTED on a h_set_dabr() on POWER9 where the DAWR is disabled. Current Linux guests ignore this error, so they will silently not get the DAWR (sigh). The same error code is being used by POWERVM in this case. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend statePaul Mackerras2018-03-231-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where the contents of the TEXASR can get corrupted while a thread is in fake suspend state. The workaround is for the instruction emulation code to use the value saved at the most recent guest exit in real suspend mode. We achieve this by simply not saving the TEXASR into the vcpu struct on an exit in fake suspend state. We also have to take care to set the orig_texasr field only on guest exit in real suspend state. This also means that on guest entry in fake suspend state, TEXASR will be restored to the value it had on the last exit in real suspend state, effectively counteracting any hardware-caused corruption. This works because TEXASR may not be written in suspend state. With this, the guest might see the wrong values in TEXASR if it reads it while in suspend state, but will see the correct value in non-transactional state (e.g. after a treclaim), and treclaim will work correctly. With this workaround, the code will actually run slightly faster, and will operate correctly on systems without the TEXASR bug (since TEXASR may not be written in suspend state, and is only changed by failure recording, which will have already been done before we get into fake suspend state). Therefore these changes are not made subject to a CPU feature bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend modeSuraj Jitindar Singh2018-03-231-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where a treclaim performed in fake suspend mode can cause subsequent reads from the XER register to return inconsistent values for the SO (summary overflow) bit. The inconsistent SO bit state can potentially be observed on any thread in the core. We have to do the treclaim because that is the only way to get the thread out of suspend state (fake or real) and into non-transactional state. The workaround for the bug is to force the core into SMT4 mode before doing the treclaim. This patch adds the code to do that, conditional on the CPU_FTR_P9_TM_XER_SO_BUG feature bit. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9Paul Mackerras2018-03-231-5/+138
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specifically, the core does not have enough storage to store a complete checkpoint of all the architected state for all four threads. The DD2.2 version of POWER9 includes hardware modifications designed to allow hypervisor software to implement workarounds for these problems. This patch implements those workarounds in KVM code so that KVM guests see a full, working transactional memory implementation. The problems center around the use of TM suspended state, where the CPU has a checkpointed state but execution is not transactional. The workaround is to implement a "fake suspend" state, which looks to the guest like suspended state but the CPU does not store a checkpoint. In this state, any instruction that would cause a transition to transactional state (rfid, rfebb, mtmsrd, tresume) or would use the checkpointed state (treclaim) causes a "soft patch" interrupt (vector 0x1500) to the hypervisor so that it can be emulated. The trechkpt instruction also causes a soft patch interrupt. On POWER9 DD2.2, we avoid returning to the guest in any state which would require a checkpoint to be present. The trechkpt in the guest entry path which would normally create that checkpoint is replaced by either a transition to fake suspend state, if the guest is in suspend state, or a rollback to the pre-transactional state if the guest is in transactional state. Fake suspend state is indicated by a flag in the PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and reads back as 0. On exit from the guest, if the guest is in fake suspend state, we still do the treclaim instruction as we would in real suspend state, in order to get into non-transactional state, but we do not save the resulting register state since there was no checkpoint. Emulation of the instructions that cause a softpatch interrupt is handled in two paths. If the guest is in real suspend mode, we call kvmhv_p9_tm_emulation_early() to handle the cases where the guest is transitioning to transactional state. This is called before we do the treclaim in the guest exit path; because we haven't done treclaim, we can get back to the guest with the transaction still active. If the instruction is a case that kvmhv_p9_tm_emulation_early() doesn't handle, or if the guest is in fake suspend state, then we proceed to do the complete guest exit path and subsequently call kvmhv_p9_tm_emulation() in host context with the MMU on. This handles all the cases including the cases that generate program interrupts (illegal instruction or TM Bad Thing) and facility unavailable interrupts. The emulation is reasonably straightforward and is mostly concerned with checking for exception conditions and updating the state of registers such as MSR and CR0. The treclaim emulation takes care to ensure that the TEXASR register gets updated as if it were the guest treclaim instruction that had done failure recording, not the treclaim done in hypervisor state in the guest exit path. With this, the KVM_CAP_PPC_HTM capability returns true (1) even if transactional memory is not available to host userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | KVM: PPC: Book3S HV: Fix duplication of host SLB entriesPaul Mackerras2018-03-231-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 6964e6a4e489 ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded", 2018-01-11), we have been seeing occasional machine check interrupts on POWER8 systems when running KVM guests, due to SLB multihit errors. This turns out to be due to the guest exit code reloading the host SLB entries from the SLB shadow buffer when the SLB was not previously cleared in the guest entry path. This can happen because the path which skips from the guest entry code to the guest exit code without entering the guest now does the skip before the SLB is cleared and loaded with guest values, but the host values are loaded after the point in the guest exit path that we skip to. To fix this, we move the code that reloads the host SLB values up so that it occurs just before the point in the guest exit code (the label guest_bypass:) where we skip to from the guest entry path. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Fixes: 6964e6a4e489 ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded") Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>