openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2018-05-26	1	-0/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull KVM fixes from Radim Krčmář: "PPC: - Close a hole which could possibly lead to the host timebase getting out of sync. - Three fixes relating to PTEs and TLB entries for radix guests. - Fix a bug which could lead to an interrupt never getting delivered to the guest, if it is pending for a guest vCPU when the vCPU gets offlined. s390: - Fix false negatives in VSIE validity check (Cc stable) x86: - Fix time drift of VMX preemption timer when a guest uses LAPIC timer in periodic mode (Cc stable) - Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow migration from hosts that don't need retpoline mitigation (Cc stable) - Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and CPUID.OSXSAVE (Cc stable) - Report correct RIP after Hyper-V hypercall #UD (introduced in -rc6)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix #UD address of failed Hyper-V hypercalls kvm: x86: IA32_ARCH_CAPABILITIES is always supported KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed x86/kvm: fix LAPIC timer drift when guest uses periodic mode KVM: s390: vsie: fix < 8k check for the itdba KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change KVM: PPC: Book3S HV: Make radix clear pte when unmapping KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
\| *	KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry	Paul Mackerras	2018-05-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the HV KVM guest entry/exit code adds the timebase offset from the vcore struct to the timebase on guest entry, and subtracts it on guest exit. Which is fine, except that it is possible for userspace to change the offset using the SET_ONE_REG interface while the vcore is running, as there is only one timebase offset per vcore but potentially multiple VCPUs in the vcore. If that were to happen, KVM would subtract a different offset on guest exit from that which it had added on guest entry, leading to the timebase being out of sync between cores in the host, which then leads to bad things happening such as hangs and spurious watchdog timeouts. To fix this, we add a new field 'tb_offset_applied' to the vcore struct which stores the offset that is currently applied to the timebase. This value is set from the vcore tb_offset field on guest entry, and is what is subtracted from the timebase on guest exit. Since it is zero when the timebase offset is not applied, we can simplify the logic in kvmhv_start_timing and kvmhv_accumulate_time. In addition, we had secondary threads reading the timebase while running concurrently with code on the primary thread which would eventually add or subtract the timebase offset from the timebase. This occurred while saving or restoring the DEC register value on the secondary threads. Although no specific incorrect behaviour has been observed, this is a race which should be fixed. To fix it, we move the DEC saving code to just before we call kvmhv_commence_exit, and the DEC restoring code to after the point where we have waited for the primary thread to switch the MMU context and add the timebase offset. That way we are sure that the timebase contains the guest timebase value in both cases. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* \|	Merge tag 'powerpc-4.17-7' of ↵	Linus Torvalds	2018-05-25	2	-0/+7
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Just one fix, to make sure the PCR (Processor Compatibility Register) is reset on boot. Otherwise if we're running in compat mode in a guest (eg. pretending a Power9 is a Power8) and the host kernel oopses and kdumps then the kdump kernel's userspace will be running in Power8 mode, and will SIGILL if it uses Power9-only instructions. Thanks to Michael Neuling" * tag 'powerpc-4.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Clear PCR on boot
\| * \|	powerpc/64s: Clear PCR on boot	Michael Neuling	2018-05-18	2	-0/+7
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clear the PCR (Processor Compatibility Register) on boot to ensure we are not running in a compatibility mode. We've seen this cause problems when a crash (and kdump) occurs while running compat mode guests. The kdump kernel then runs with the PCR set and causes problems. The symptom in the kdump kernel (also seen in petitboot after fast-reboot) is early userspace programs taking sigills on newer instructions (seen in libc). Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* /	powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit	Nicholas Piggin	2018-05-22	3	-2/+180
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths. This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs. Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected. Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching. Firmware advertisement is not implemented yet, so CPU flush types are hard coded. Thanks to Michal Suchánek for bug fixes and review. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	powerpc: Fix deadlock with multiple calls to smp_send_stop	Nicholas Piggin	2018-04-27	1	-16/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	smp_send_stop can lock up the IPI path for any subsequent calls, because the receiving CPUs spin in their handler function. This started becoming a problem with the addition of an smp_send_stop call in the reboot path, because panics can reboot after doing their own smp_send_stop. The NMI IPI variant was fixed with ac61c11566 ("powerpc: Fix smp_send_stop NMI IPI handling"), which leaves the smp_call_function variant. This is fixed by having smp_send_stop only ever do the smp_call_function once. This is a bit less robust than the NMI IPI fix, because any other call to smp_call_function after smp_send_stop could deadlock, but that has always been the case, and it was not been a problem before. Fixes: f2748bdfe1573 ("powerpc/powernv: Always stop secondaries before reboot/shutdown") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Fix smp_send_stop NMI IPI handling	Nicholas Piggin	2018-04-25	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count over the handler function call, which causes later smp_send_nmi_ipi() callers to spin until the call is finished. The stop_this_cpu() function never returns, so the busy count is never decremeted, which can cause the system to hang in some cases. For example panic() will call smp_send_stop() early on which calls stop_this_cpu() on other CPUs, then later in the reboot path, pnv_restart() will call smp_send_stop() again, which hangs. Fix this by adding a special case to the stop_this_cpu() handler to decrement the busy count, because it will never return. Now that the NMI/non-NMI versions of stop_this_cpu() are different, split them out into separate functions rather than doing #ifdef tricks to share the body between the two functions. Fixes: 6bed3237624e3 ("powerpc: use NMI IPI for smp_send_stop") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out the functions, tweak change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mce: Fix a bug where mce loops on memory UE.	Mahesh Salgaonkar	2018-04-24	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current code extracts the physical address for UE errors and then hooks it up into memory failure infrastructure. On successful extraction of physical address it wrongly sets "handled = 1" which means this UE error has been recovered. Since MCE handler gets return value as handled = 1, it assumes that error has been recovered and goes back to same NIP. This causes MCE interrupt again and again in a loop leading to hard lockup. Also, initialize phys_addr to ULONG_MAX so that we don't end up queuing undesired page to hwpoison. Without this patch we see: Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 ... Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned ... Watchdog CPU:38 Hard LOCKUP After this patch we see: Severe Machine check interrupt [Not recovered] NIP: [00007fffaae585f4] PID: 7168 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffaafe28ac Physical address: 00002017c0bd0000 find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4 Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered Fixes: 01eaac2b0591 ("powerpc/mce: Hookup ierror (instruction) UE errors") Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/kvm: Fix lockups when running KVM guests on Power8	Michael Ellerman	2018-04-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running KVM guests on Power8 we can see a lockup where one CPU stops responding. This often leads to a message such as: watchdog: CPU 136 detected hard LOCKUP on other CPUs 72 Task dump for CPU 72: qemu-system-ppc R running task 10560 20917 20908 0x00040004 And then backtraces on other CPUs, such as: Task dump for CPU 48: ksmd R running task 10032 1519 2 0x00000804 Call Trace: ... --- interrupt: 901 at smp_call_function_many+0x3c8/0x460 LR = smp_call_function_many+0x37c/0x460 pmdp_invalidate+0x100/0x1b0 __split_huge_pmd+0x52c/0xdb0 try_to_unmap_one+0x764/0x8b0 rmap_walk_anon+0x15c/0x370 try_to_unmap+0xb4/0x170 split_huge_page_to_list+0x148/0xa30 try_to_merge_one_page+0xc8/0x990 try_to_merge_with_ksm_page+0x74/0xf0 ksm_scan_thread+0x10ec/0x1ac0 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x78 This is caused by commit 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle"), which added a check in pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and test of kvm_hstate.hwthread_req. The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when entering the guest, so it can then come out to cede with KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after setting hwthread_req to 1, but because hwthread_state is still KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we wake up from idle and won't go to kvm_start_guest. From there the thread will return somewhere garbage and crash. Fix it by skipping the store of hwthread_state, but not the test of hwthread_req, when coming out of idle. It's OK to skip the sync in that case because hwthread_req will have been set on the same thread, so there is no synchronisation required. Fixes: 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/eeh: Fix enabling bridge MMIO windows	Michael Neuling	2018-04-19	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On boot we save the configuration space of PCIe bridges. We do this so when we get an EEH event and everything gets reset that we can restore them. Unfortunately we save this state before we've enabled the MMIO space on the bridges. Hence if we have to reset the bridge when we come back MMIO is not enabled and we end up taking an PE freeze when the driver starts accessing again. This patch forces the memory/MMIO and bus mastering on when restoring bridges on EEH. Ideally we'd do this correctly by saving the configuration space writes later, but that will have to come later in a larger EEH rewrite. For now we have this simple fix. The original bug can be triggered on a boston machine by doing: echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound On boston, this PHB has a PCIe switch on it. Without this patch, you'll see two EEH events, 1 expected and 1 the failure we are fixing here. The second EEH event causes the anything under the PHB to disappear (i.e. the i40e eth). With this patch, only 1 EEH event occurs and devices properly recover. Fixes: 652defed4875 ("powerpc/eeh: Check PCIe link after reset") Cc: stable@vger.kernel.org # v3.11+ Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/64s: Default l1d_size to 64K in RFI fallback flush	Madhavan Srinivasan	2018-04-17	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there is no d-cache-size property in the device tree, l1d_size could be zero. We don't actually expect that to happen, it's only been seen on mambo (simulator) in some configurations. A zero-size l1d_size leads to the loop in the asm wrapping around to 2^64-1, and then walking off the end of the fallback area and eventually causing a page fault which is fatal. Just default to 64K which is correct on some CPUs, and sane enough to not cause a crash on others. Fixes: aa8a5e0062ac9 ('powerpc/64s: Add support for RFI flush of L1-D cache') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Rewrite comment and change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	Merge tag 'powerpc-4.17-2' of ↵	Linus Torvalds	2018-04-15	3	-29/+19
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes when loading modules built with a different CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic. - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions from firmware. - Remove tlbie trace points from KVM code that's called in real mode, because it causes crashes. - Fix checkstops caused by invalid tlbiel on Power9 Radix. - Ensure the set of CPU features we "know" are always enabled is actually the minimal set when we build with support for firmware supplied CPU features. Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin. * tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features powerpc/mm/radix: Fix checkstops caused by invalid tlbiel KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode powerpc/8xx: Fix build with hugetlbfs enabled powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops powerpc/fscr: Enable interrupts earlier before calling get_user() powerpc/64s: Fix section mismatch warnings from setup_rfi_flush() powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
\| *	powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features	Michael Ellerman	2018-04-13	1	-13/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cpu_has_feature() mechanism has an optimisation where at build time we construct a mask of the CPU feature bits that will always be true for the given .config, based on the platform/bitness/etc. that we are building for. That is incompatible with DT CPU features, where the set of CPU features is dependent on feature flags that are given to us by firmware. The result is that some feature bits can not be disabled by DT CPU features. Or more accurately, they can be disabled but they will still appear in the ALWAYS mask, meaning cpu_has_feature() will always return true for them. In the past this hasn't really been a problem because on Book3S 64 (where we support DT CPU features), the set of ALWAYS bits has been very small. That was because we always built for POWER4 and later, meaning the set of common bits was small. The only bit that could be cleared by DT CPU features that was also in the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in the alignment handler to create a fake DSISR. That code was itself deleted in 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults") (Sep 2017). However the set of ALWAYS features changed with the recent commit db5ae1c155af ("powerpc/64s: Refine feature sets for little endian builds") which restricted the set of feature flags when building little endian to Power7 or later. That caused the ALWAYS mask to become much larger for little endian builds. The result is that the following feature bits can currently not be disabled by DT CPU features: CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT, CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY, CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR. To fix it we need to mask the set of ALWAYS features with the base set of DT CPU features, ie. the features that are always enabled by DT CPU features. That way there are no bits in the ALWAYS mask that are not also always set by DT CPU features. Fixes: db5ae1c155af ("powerpc/64s: Refine feature sets for little endian builds") Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/fscr: Enable interrupts earlier before calling get_user()	Anshuman Khandual	2018-04-10	1	-15/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function get_user() can sleep while trying to fetch instruction from user address space and causes the following warning from the scheduler. BUG: sleeping function called from invalid context Though interrupts get enabled back but it happens bit later after get_user() is called. This change moves enabling these interrupts earlier covering the function get_user(). While at this, lets check for kernel mode and crash as this interrupt should not have been triggered from the kernel context. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()	Michael Ellerman	2018-04-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The recent LPM changes to setup_rfi_flush() are causing some section mismatch warnings because we removed the __init annotation on setup_rfi_flush(): The function setup_rfi_flush() references the function __init ppc64_bolted_size(). the function __init memblock_alloc_base(). The references are actually in init_fallback_flush(), but that is inlined into setup_rfi_flush(). These references are safe because: - only pseries calls setup_rfi_flush() at runtime - pseries always passes L1D_FLUSH_FALLBACK at boot - so the fallback flush area will always be allocated - so the check in init_fallback_flush() will always return early: /* Only allocate the fallback flush area once (at boot time). */ if (l1d_flush_fallback_area) return; - and therefore we won't actually call the freed init routines. We should rework the code to make it safer by default rather than relying on the above, but for now as a quick-fix just add a __ref annotation to squash the warning. Fixes: abf110f3e1ce ("powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* \|	kernel/kexec_file.c: allow archs to set purgatory load address	Philipp Rudo	2018-04-14	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For s390 new kernels are loaded to fixed addresses in memory before they are booted. With the current code this is a problem as it assumes the kernel will be loaded to an 'arbitrary' address. In particular, kexec_locate_mem_hole searches for a large enough memory region and sets the load address (kexec_bufer->mem) to it. Luckily there is a simple workaround for this problem. By returning 1 in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This allows the architecture to set kbuf->mem by hand. While the trick works fine for the kernel it does not for the purgatory as here the architectures don't have access to its kexec_buffer. Give architectures access to the purgatories kexec_buffer by changing kexec_load_purgatory to take a pointer to it. With this change architectures have access to the buffer and can edit it as they need. A nice side effect of this change is that we can get rid of the purgatory_info->purgatory_load_address field. As now the information stored there can directly be accessed from kbuf->mem. Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	kexec_file,x86,powerpc: factor out kexec_file_ops functions	AKASHI Takahiro	2018-04-14	2	-37/+4
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As arch_kexec_kernel_image_{probe,load}(), arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig() are almost duplicated among architectures, they can be commonalized with an architecture-defined kexec_file_ops array. So let's factor them out. Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge tag 'powerpc-4.17-1' of ↵	Linus Torvalds	2018-04-07	40	-521/+828
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for 4PB user address space on 64-bit, opt-in via mmap(). - Removal of POWER4 support, which was accidentally broken in 2016 and no one noticed, and blocked use of some modern instructions. - Workarounds so that the hypervisor can enable Transactional Memory on Power9. - A series to disable the DAWR (Data Address Watchpoint Register) on Power9. - More information displayed in the meltdown/spectre_v1/v2 sysfs files. - A vpermxor (Power8 Altivec) implementation for the raid6 Q Syndrome. - A big series to make the allocation of our pacas (per cpu area), kernel page tables, and per-cpu stacks NUMA aware when using the Radix MMU on Power9. And as usual many fixes, reworks and cleanups. Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook, Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe, Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring, Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun" * tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits) powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep powerpc/64s: Fix POWER9 DD2.2 and above in cputable features powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead" powerpc: iomap.c: introduce io{read\|write}64_{lo_hi\|hi_lo} powerpc: io.h: move iomap.h include so that it can use readq/writeq defs cxl: Fix possible deadlock when processing page faults from cxllib powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it powerpc/mm/radix: Update command line parsing for disable_radix powerpc/mm/radix: Parse disable_radix commandline correctly. powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix powerpc/mm/keys: Update documentation and remove unnecessary check powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop() powerpc/powernv: Always stop secondaries before reboot/shutdown powerpc: hard disable irqs in smp_send_stop loop powerpc: use NMI IPI for smp_send_stop powerpc/powernv: Fix SMT4 forcing idle code ...
\| *	powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep	Nicholas Piggin	2018-04-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	POWER8 restores AMOR when waking from deep sleep, but POWER9 does not, because it does not go through the subcore restore. Have POWER9 restore it in core restore. Fixes: ee97b6b99f42 ("powerpc/mm/radix: Setup AMOR in HV mode to allow key 0") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit	Nicholas Piggin	2018-04-05	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pkey code added a CPU_FTR_PKEY bit, but did not add it to the dt_cpu_ftrs feature set. Although capability is supported by all processors in the base dt_cpu_ftrs set for 64s, it's a significant and sufficiently well defined feature to make it optional. So add it as a quirk for now, which can be versioned out then controlled by the firmware (once dt_cpu_ftrs gains versioning support). Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits	Nicholas Piggin	2018-04-05	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Presently the dt_cpu_ftrs restore_cpu will only add bits to the LPCR for secondaries, but some bits must be removed (e.g., UPRT for HPT). Not clearing these bits on secondaries causes checkstops when booting with disable_radix. restore_cpu can not just set LPCR, because it is also called by the idle wakeup code which relies on opal_slw_set_reg to restore the value of LPCR, at least on P8 which does not save LPCR to stack in the idle code. Fix this by including a mask of bits to clear from LPCR as well, which is used by restore_cpu. This is a little messy now, but it's a minimal fix that can be backported. Longer term, the idle SPR save/restore code can be reworked to completely avoid calls to restore_cpu, then restore_cpu would be able to unconditionally set LPCR to match boot processor environment. Fixes: 5a61ef74f269f ("powerpc/64s: Support new device tree binding for discovering CPU features") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead"	Michael Ellerman	2018-04-05	1	-16/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As described in that commit: When stop is executed with EC=ESL=0, it appears to execute like a normal instruction (resuming from NIP when woken by interrupt). So all the save/restore handling can be avoided completely. This is true, except in the case of an NMI interrupt (sreset or machine check) interrupting the instruction. In that case, the NMI gets an "interrupt occurred while the processor was in power-saving mode" indication. The power-save wakeup code uses that bit to decide whether to restore some registers (e.g., LR). Because these are no longer saved, this causes random register corruption. It may be possible to restore this optimisation by detecting the case of no register loss on the wakeup side, and avoid restoring in that case, but that's not a minor fix because the wakeup code itself uses some registers that would be live (e.g., LR). Fixes: b9ee31e100e7 ("powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc: iomap.c: introduce io{read\|write}64_{lo_hi\|hi_lo}	Logan Gunthorpe	2018-04-05	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These functions will be introduced into the generic iomap.c so they can deal with PIO accesses in hi-lo/lo-hi variants. Thus, the powerpc version of iomap.c will need to provide the same functions even though, in this arch, they are identical to the regular io{read\|write}64 functions. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Horia Geantă <horia.geanta@nxp.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/mm/radix: Update command line parsing for disable_radix	Aneesh Kumar K.V	2018-04-04	2	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kernel parameter disable_radix takes different options disable_radix=yes\|no\|1\|0 or just disable_radix. prom_init parsing is not supporting these options. Fixes: 1fd6c0220710 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead	Nicholas Piggin	2018-04-04	1	-29/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When stop is executed with EC=ESL=0, it appears to execute like a normal instruction (resuming from NIP when woken by interrupt). So all the save/restore handling can be avoided completely. In particular NV GPRs do not have to be saved, and MSR does not have to be switched back to kernel MSR. So move the test for EC=ESL=0 sleep states out to power9_idle_stop, and return directly to the caller after stop in that case. This improves performance for ping-pong benchmark with the stop0_lite idle state by 2.54% for 2 threads in the same core, and 2.57% for different cores. Performance increase with HV_POSSIBLE defined will be improved further by avoiding the hwsync. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop()	Michael Ellerman	2018-04-04	1	-12/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 3d4fbffdd703 ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug") that added power9_offline_stop() was written before commit 7672691a08c8 ("powerpc/powernv: Provide a way to force a core into SMT4 mode"). When merging the former I failed to notice that it caused us to skip the force-SMT4 logic for offline CPUs. The result is that offlined CPUs will not correctly participate in the force-SMT4 logic, which presumably will result in badness (not tested). Reconcile the two commits by making power9_offline_stop() a pre-cursor to power9_idle_stop(), so that they share the force-SMT4 logic. This is based on an original commit from Nick, all breakage is my own. Fixes: 3d4fbffdd703 ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
\| *	powerpc: hard disable irqs in smp_send_stop loop	Nicholas Piggin	2018-04-03	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hard lockup watchdog can fire under local_irq_disable on platforms with irq soft masking. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc: use NMI IPI for smp_send_stop	Nicholas Piggin	2018-04-03	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the NMI IPI rather than smp_call_function for smp_send_stop. Have stopped CPUs hard disable interrupts rather than just soft disable. This function is used in crash/panic/shutdown paths to bring other CPUs down as quickly and reliably as possible, and minimizing their potential to cause trouble. Avoiding the Linux smp_call_function infrastructure and (if supported) using true NMI IPIs makes this more robust. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/powernv: Fix SMT4 forcing idle code	Nicholas Piggin	2018-04-03	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PSSCR value is not stored to PACA_REQ_PSSCR if the CPU does not have the XER[SO] bug. Fix this by storing up-front, outside the workaround code. The initial test is not required because it is a slow path. The workaround is made to depend on CONFIG_KVM_BOOK3S_HV_POSSIBLE, to match pnv_power9_force_smt4_catch() where it is used. Drop the comment on pnv_power9_force_smt4_catch() as it's no longer true. Fixes: 7672691a08c8 ("powerpc/powernv: Provide a way to force a core into SMT4 mode") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc: Move default security feature flags	Mauricio Faria de Oliveira	2018-04-03	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the definition of the default security feature flags (i.e., enabled by default) closer to the security feature flags. This can be used to restore current flags to the default flags. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc: Don't write to DABR on >= Power8 if DAWR is disabled	Nicholas Piggin	2018-04-03	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flush_thread() calls __set_breakpoint() via set_debug_reg_defaults() without checking ppc_breakpoint_available(). On Power8 or later CPUs which have the DAWR feature disabled that will cause a write to the DABR which is incorrect as those CPUs don't have a DABR. Fix it two ways, by checking ppc_breakpoint_available() in set_debug_reg_defaults(), and also by reworking __set_breakpoint() to only write to DABR on Power7 or earlier. Fixes: 9654153158d3 ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rework the logic in __set_breakpoint()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc: Fix oops due to bad access of lppaca on bare metal	Aneesh Kumar K.V	2018-04-03	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") removed allocation of lppaca on bare metal platforms. But with CONFIG_PPC_SPLPAR enabled, we still access the lppaca on bare metal in some code paths. Fix this but adding runtime checks for SPLPAR (shared processor LPAR). Fixes: 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP	Matt Evans	2018-04-01	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the user context when single_step_exception() prepares the SIGTRAP delivery. The resulting branch-trap-within-the-SIGTRAP-handler isn't healthy. Commit 2538c2d08f46141550a1e68819efa8fe31c6e3dc broke this, by replacing an MSR mask operation of ~(MSR_SE \| MSR_BE) with a call to clear_single_step() which only clears MSR_SE. This patch adds a new helper, clear_br_trace(), which clears the debug trap before invoking the signal handler. This helper is a NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s: Remove POWER4 support	Nicholas Piggin	2018-03-31	2	-42/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	POWER4 has been broken since at least the change 49d09bf2a6 ("powerpc/64s: Optimise MSR handling in exception handling"), which requires mtmsrd L=1 support. This was introduced in ISA v2.01, and POWER4 supports ISA v2.00. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU features	Nicholas Piggin	2018-03-31	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and above (which is what the cputable setup does). Fix DT CPU features quirk setup to match. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Merge with upstream changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s: Set assembler machine type to POWER4	Nicholas Piggin	2018-03-31	2	-9/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than override the machine type in .S code (which can hide wrong or ambiguous code generation for the target), set the type to power4 for all assembly. This also means we need to be careful not to build power4-only code when we're not building for Book3S, such as the "power7" versions of copyuser/page/memcpy. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s/idle: avoid sync for KVM state when waking from idle	Nicholas Piggin	2018-03-31	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When waking from a CPU idle instruction (e.g., nap or stop), the sync for ordering the KVM secondary thread state can be avoided if there wakeup is coming from a kernel context rather than KVM context. This improves performance for ping-pong benchmark with the stop0 idle state by 0.46% for 2 threads in the same core, and 1.02% for different cores. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug	Nicholas Piggin	2018-03-31	1	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a new function to invoke stop, power9_offline_stop, which is like power9_idle_stop but used by the cpu hotplug code. Move KVM secondary state manipulation code to the offline case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s: sreset panic if there is no debugger or crash dump handlers	Nicholas Piggin	2018-03-31	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	system_reset_exception does most of its own crash handling now, invoking the debugger or crash dumps if they are registered. If not, then it goes through to die() to print stack traces, and then is supposed to panic (according to comments). However after die() prints oopses, it does its own handling which doesn't allow system_reset_exception to panic (e.g., it may just kill the current process). This patch causes sreset exceptions to return from die after it prints messages but before acting. This also stops die from invoking the debugger on 0x100 crashes. system_reset_exception similarly calls the debugger. It had been thought this was harmless (because if the debugger was disabled, neither call would fire, and if it was enabled the first call would return). However in some cases like xmon 'X' command, the debugger returns 0, which currently causes it to be entered again (first in system_reset_exception, then in die), which is confusing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64s: return more carefully from sreset NMI	Nicholas Piggin	2018-03-31	1	-3/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	System Reset, being an NMI, must return more carefully than other interrupts. It has traditionally returned via the nromal return from exception path, but that has a number of problems. - r13 does not get restored if returning to kernel. This is for interrupts which may cause a context switch, which sreset will never do. Interrupting OPAL (which uses a different r13) is one place where this causes breakage. - It may cause several other problems returning to kernel with preempt or TIF_EMULATE_STACK_STORE if it hits at the wrong time. It's safer just to have a simple restore and return, like machine check which is the other NMI. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/eeh: Fix race with driver un/bind	Michael Neuling	2018-03-31	1	-26/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current EEH callbacks can race with a driver unbind. This can result in a backtraces like this: EEH: Frozen PHB#0-PE#1fc detected EEH: PE location: S000009, PHB location: N/A CPU: 2 PID: 2312 Comm: kworker/u258:3 Not tainted 4.15.6-openpower1 #2 Workqueue: nvme-wq nvme_reset_work [nvme] Call Trace: dump_stack+0x9c/0xd0 (unreliable) eeh_dev_check_failure+0x420/0x470 eeh_check_failure+0xa0/0xa4 nvme_reset_work+0x138/0x1414 [nvme] process_one_work+0x1ec/0x328 worker_thread+0x2e4/0x3a8 kthread+0x14c/0x154 ret_from_kernel_thread+0x5c/0xc8 nvme nvme1: Removing after probe failure status: -19 <snip> cpu 0x23: Vector: 300 (Data Access) at [c000000ff50f3800] pc: c0080000089a0eb0: nvme_error_detected+0x4c/0x90 [nvme] lr: c000000000026564: eeh_report_error+0xe0/0x110 sp: c000000ff50f3a80 msr: 9000000000009033 dar: 400 dsisr: 40000000 current = 0xc000000ff507c000 paca = 0xc00000000fdc9d80 softe: 0 irq_happened: 0x01 pid = 782, comm = eehd Linux version 4.15.6-openpower1 (smc@smc-desktop) (gcc version 6.4.0 (Buildroot 2017.11.2-00008-g4b6188e)) #2 SM P Tue Feb 27 12:33:27 PST 2018 enter ? for help eeh_report_error+0xe0/0x110 eeh_pe_dev_traverse+0xc0/0xdc eeh_handle_normal_event+0x184/0x4c4 eeh_handle_event+0x30/0x288 eeh_event_handler+0x124/0x170 kthread+0x14c/0x154 ret_from_kernel_thread+0x5c/0xc8 The first part is an EEH (on boot), the second half is the resulting crash. nvme probe starts the nvme_reset_work() worker thread. This worker thread starts touching the device which see a device error (EEH) and hence queues up an event in the powerpc EEH worker thread. nvme_reset_work() then continues and runs nvme_remove_dead_ctrl_work() which results in unbinding the driver from the device and hence releases all resources. At the same time, the EEH worker thread starts doing the EEH .error_detected() driver callback, which no longer works since the resources have been freed. This fixes the problem in the same way the generic PCIe AER code (in drivers/pci/pcie/aer/aerdrv_core.c) does. It makes the EEH code hold the device_lock() while performing the driver EEH callbacks and associated code. This ensures either the callbacks are no longer register, or if they are registered the driver will not be removed from underneath us. This has been broken forever. The EEH call backs were first introduced in 2005 (in 77bd7415610) but it's not clear if a lock was needed back then. Fixes: 77bd74156101 ("[PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines") Cc: stable@vger.kernel.org # v2.6.16+ Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/kexec_file: Fix error code when trying to load kdump kernel	Thiago Jung Bauermann	2018-03-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kexec_file_load() on powerpc doesn't support kdump kernels yet, so it returns -ENOTSUPP in that case. I've recently learned that this errno is internal to the kernel and isn't supposed to be exposed to userspace. Therefore, change to -EOPNOTSUPP which is defined in an uapi header. This does indeed make kexec-tools happier. Before the patch, on ppc64le: # ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz kexec_file_load failed: Unknown error 524 After the patch: # ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz kexec_file_load failed: Operation not supported Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()") Cc: stable@vger.kernel.org # v4.10+ Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Reviewed-by: Simon Horman <horms@verge.net.au> Reviewed-by: Dave Young <dyoung@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	powerpc/64e: Fix oops due to deferral of paca allocation	Michael Ellerman	2018-03-31	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On 64-bit Book3E systems, in setup_tlb_core_data() we reference other CPUs pacas. But in commit 59f577743d71 ("powerpc/64: Defer paca allocation until memory topology is discovered") the allocation of non-boot-CPU pacas was deferred until later in boot. This leads to an oops: CPU maps initialized for 1 thread per core Unable to handle kernel paging request for data at address 0x8888888888888918 Faulting instruction address: 0xc000000000e2f0d0 Oops: Kernel access of bad area, sig: 11 [#1] NIP .setup_tlb_core_data+0xdc/0x160 Call Trace: .setup_tlb_core_data+0x5c/0x160 (unreliable) .setup_arch+0x80/0x348 .start_kernel+0x7c/0x598 start_here_common+0x1c/0x40 Luckily setup_tlb_core_data() is called immediately prior to smp_setup_pacas(). So simply switching their order is sufficient to fix the oops and seems unlikely to have any other unwanted side effects. Fixes: 59f577743d71 ("powerpc/64: Defer paca allocation until memory topology is discovered") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| *	Merge branch 'topic/paca' into next	Michael Ellerman	2018-03-31	11	-195/+271
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bring in yet another series that touches KVM code, and might need to be merged into the kvm-ppc branch to resolve conflicts. This required some changes in pnv_power9_force_smt4_catch/release() due to the paca array becomming an array of pointers.
\| \| *	powerpc/64: Allocate per-cpu stacks node-local if possible	Nicholas Piggin	2018-03-30	1	-19/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| \| *	powerpc/64: Allocate pacas per node	Nicholas Piggin	2018-03-30	2	-6/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Per-node allocations are possible on 64s with radix that does not have the bolted SLB limitation. Hash would be able to do the same if all CPUs had the bottom of their node-local memory bolted as well. This is left as an exercise for the reader. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add dummy definition of boot_cpuid for !SMP] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| \| *	powerpc/64: Defer paca allocation until memory topology is discovered	Nicholas Piggin	2018-03-30	3	-70/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rename the dummy allocate_pacas() to fix 32-bit build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| \| *	powerpc/setup: Add cpu_to_phys_id array	Nicholas Piggin	2018-03-30	2	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Build an array that finds hardware CPU number from logical CPU number in firmware CPU discovery. Use that rather than setting paca of other CPUs directly, to begin with. Subsequent patch will not have pacas allocated at this point. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix SMP=n build by adding #ifdef in arch_match_cpu_phys_id()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| \| *	powerpc/64: move default SPR recording	Nicholas Piggin	2018-03-30	3	-17/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move this into the early setup code, and don't iterate over CPU masks. We don't want to call into sysfs so early from setup, and a future patch won't initialize CPU masks by the time this is called. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fold in incremental fix from Nick for DSCR handling] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| \| *	powerpc/mm/numa: move numa topology discovery earlier	Nicholas Piggin	2018-03-30	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split sparsemem initialisation from basic numa topology discovery. Move the parsing earlier in boot, before pacas are allocated. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>