openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
*	powerpc/powernv/ioda1: Improve DMA32 segment track	Gavin Shan	2016-05-11	2	-56/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In current implementation, the DMA32 segments required by one specific PE isn't calculated with the information hold in the PE independently. It conflicts with the PCI hotplug design: PE centralized, meaning the PE's DMA32 segments should be calculated from the information hold in the PE independently. This introduces an array (@dma32_segmap) for every PHB to track the DMA32 segmeng usage. Besides, this moves the logic calculating PE's consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's DMA32 segments are calculated/allocated from the information hold in the PE (DMA32 weight). Also the logic is improved: we try to allocate as much DMA32 segments as we can. It's acceptable that number of DMA32 segments less than the expected number are allocated. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Remove DMA32 PE list	Gavin Shan	2016-05-11	2	-112/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according to their DMA32 weight. The PEs on the list are iterated to setup their TCE32 tables at system booting time. The list is used for once at boot time and no need to keep it. This moves the logic calculating DMA32 weight of PHB and PE to pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount are useless and they're removed. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE	Gavin Shan	2016-05-11	1	-13/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there is one macro (TCE32_TABLE_SIZE) representing the TCE table size for one DMA32 segment. The constant representing the DMA32 segment size (1 << 28) is still used in the code. This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32 segment size. the TCE table size can be calcualted when the page has fixed 4KB size. So all the related calculation depends on one macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()	Gavin Shan	2016-05-11	1	-4/+5
\| \| \| \| \| \| \| \| \| \|	This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe() as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe(). No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv/ioda1: M64 support on P7IOC	Gavin Shan	2016-05-11	1	-3/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enables M64 window on P7IOC, which has been enabled on PHB3. Different from PHB3 where 16 M64 BARs are supported and each of them can be owned by one particular PE# exclusively or divided evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each of them are divided to 8 segments. So every P7IOC PHB supports 128 M64 segments in total. P7IOC has M64DT, which helps mapping one particular M64 segment# to arbitrary PE#. PHB3 doesn't have M64DT, indicating that one M64 segment can only be pinned to the fixed PE#. In order to unified M64 support M64 on P7IOC and PHB3, we just provide 128 M64 segments on every P7IOC PHB and each of them is pinned to the fixed PE# by bypassing the function of M64DT. In turn, we just need different phb->init_m64() for P7IOC and PHB3 and maps M64 segment in pnv_ioda_reserve_m64_pe() for P7IOC, most of the code are shared by them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Rename M64 related functions	Gavin Shan	2016-05-11	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \|	This renames those functions picking PE number based on consumed M64 segments, mapping M64 segments to PEs as those functions are going to be shared by IODA1/IODA2 in next patch. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Track M64 segment consumption	Gavin Shan	2016-05-11	2	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When unplugging PCI devices, their parent PEs might be offline. The consumed M64 resource by the PEs should be released at that time. As we track M32 segment consumption, this introduces an array to the PHB to track the mapping between M64 segment and PE number. Note: M64 mapping isn't covered by pnv_ioda_setup_pe_seg() as IODA2 doesn't support the mapping explicitly while it's supported on IODA1. Until now, no M64 is supported on IODA1 in software. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: IO and M32 mapping based on PCI device resources	Gavin Shan	2016-05-11	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the IO and M32 segments are mapped to the corresponding PE based on the windows of the parent bridge of PE's primary bus. It's not going to work when the windows of root port or upstream port of the PCIe switch behind root port are extended to PHB's apertures in order to support hotplug in subsequent patch. This fixes the issue by mapping IO and M32 segments based on the resources of the PCI devices included in the PE, instead of the windows of the parent bridge of the PE's primary bus. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()	Gavin Shan	2016-05-11	1	-59/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pnv_ioda_setup_pe_seg() associates the IO and M32 segments with the owner PE. The code mapping segments should be fixed and immune from logic changes introduced to pnv_ioda_setup_pe_seg(). This moves the code mapping segments to helper pnv_ioda_setup_pe_res(). The data type for @rc is changed to "int64_t". Also, argument @hose is removed from pnv_ioda_setup_pe() as it can be got from @pe. No functional changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Fix initial IO and M32 segmap	Gavin Shan	2016-05-11	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two arrays for IO and M32 segment maps on every PHB. The index of the arrays are segment number and the value stored in the corresponding element is PE number, indicating the segment is assigned to the PE. Initially, all elements in those two arrays are zeroes, meaning all segments are assigned to PE#0. It's wrong. This fixes the initial values in the elements of those two arrays to IODA_INVALID_PE, meaning all segments aren't assigned to any PE. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Data type unsigned int for PE number	Gavin Shan	2016-05-11	4	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This changes the data type of PE number from "int" to "unsigned int" in order to match the fact PE number is never negative: * The number of PE to which the specified PCI device is attached. * The PE number map for SRIOV VFs. * The returned PE number from pnv_ioda_alloc_pe(). * The returned PE number from pnv_ioda2_pick_m64_pe(). Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Rename PE# fields in struct pnv_phb	Gavin Shan	2016-05-11	4	-33/+33
\| \| \| \| \| \| \| \| \| \|	This renames the fields related to PE number in "struct pnv_phb" for better reflecting of their usages as Alexey suggested. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Reorder fields in struct pnv_phb	Gavin Shan	2016-05-11	1	-4/+3
\| \| \| \| \| \| \| \| \|	This moves those fields in struct pnv_phb that are related to PE allocation around. No logical change. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Drop phb->bdfn_to_pe()	Gavin Shan	2016-05-11	2	-10/+0
\| \| \| \| \| \| \| \| \|	The last usage of pnv_phb::bdfn_to_pe() was removed in ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn"), so drop it. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Cleanup on pci_controller_ops instances	Gavin Shan	2016-05-11	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This cleans up on below data struct instances to use tab instead of space indent of statement to avoid complains from scripts/checkpatch.pl. No logical changes introduced. @pnv_pci_ioda_controller_ops @pnv_npu_ioda_controller_ops Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pci: Cleanup on struct pci_controller_ops	Gavin Shan	2016-05-11	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each PHB has one instance of "struct pci_controller_ops" that includes various callbacks called by PCI subsystem. In the definition of this struct, some callbacks have explicit names for its arguments, but the left don't have. This adds all explicit names of the arguments to the callbacks in "struct pci_controller_ops" so that the code looks consistent. Also, argument name @dev is replaced by @pdev as the later one is the preferred name for PCI device. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Add test to check if TM SPRs are corrupted	Rashmica Gupta	2016-05-11	3	-1/+146
\| \| \| \| \| \| \| \| \| \| \|	Testing that the TM SPRs are behaving the way they should. Uses more threads than cpus to see if the following register values persist with context switching: - the FS (failure summary) flag in TEXASR - TFIAR and TFHAR Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Add TM test to check if TAR is corrupted	Rashmica Gupta	2016-05-11	3	-1/+92
\| \| \| \| \| \| \| \| \| \|	If the transaction is aborted, the TAR should be rolled back to the checkpointed value before the transaction began. The value written to the TAR when the transaction is suspended should only remain there if the transaction completes successfully. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Add test for forking inside transaction	Rashmica Gupta	2016-05-11	3	-1/+44
\| \| \| \| \| \| \| \|	This test does a fork syscall inside a transaction. Basic sniff test to see if we can enter the kernel during a transaction. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Standardise TM calls	Rashmica Gupta	2016-05-11	2	-13/+7
\| \| \| \| \| \| \| \|	Currently tbegin, tend etc are written as opcodes or asm instructions. So standardise these to asm instructions. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Make reg.h common to all powerpc selftests	Rashmica Gupta	2016-05-11	4	-8/+13
\| \| \| \| \| \| \| \| \| \|	Currently there is a reg.h in pmu/ebb that has defines that are useful in other powerpc selftests so move this up into selftests/powerpc folder. Also include in utils.h - as this is often used in self tests. Add in some other useful register defines. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Rename machine_check_pSeries_early() to powernv	Mahesh Salgaonkar	2016-05-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The routine machine_check_pSeries_early() is only used on powernv, not pseries. Hence rename machine_check_pSeries_early() to machine_check_powernv_early(). Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Check periodically the coherent platform function's state	Christophe Lombard	2016-05-11	2	-28/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the PowerVM environment, the PHYP CoherentAccel component manages the state of the Coherent Accelerator Processor Interface adapter and virtualizes CAPI resources, handles CAPP, PSL, PSL Slice errors - and interrupts - and provides a new set of hcalls for the OS APIs to utilize Accelerator Function Unit (AFU). During the course of operation, a coherent platform function can encounter errors. Some possible reason for errors are: • Hardware recoverable and unrecoverable errors • Transient and over-threshold correctable errors PHYP implements its own state model for the coherent platform function. The state of the AFU is available through a hcall. The current implementation of the cxl driver, for the PowerVM environment, checks this state of the AFU only when an action is requested - open a device, ioctl command, memory map, attach/detach a process - from an external driver - cxlflash, libcxl. If an error is detected the cxl driver handles the error according the content of the Power Architecture Platform Requirements document. But in case of low-level troubles (or error injection), the PHYP component may reset the card and change the AFU state. The PHYP interface doesn't provide any way to be notified when that happens thus implies that the cxl driver: • cannot handle immediatly the state change of the AFU. • cannot notify other drivers (cxlflash, ...) The purpose of this patch is to wake up the cpu periodically to check the current state of each AFU and to see if we need to enter an error recovery path. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Add kernel API to allow a context to operate with relocate disabled	Ian Munsie	2016-05-11	5	-2/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cxl devices typically access memory using an MMU in much the same way as the CPU, and each context includes a state register much like the MSR in the CPU. Like the CPU, the state register includes a bit to enable relocation, which we currently always enable. In some cases, it may be desirable to allow a device to access memory using real addresses instead of effective addresses, so this adds a new API, cxl_set_translation_mode, that can be used to disable relocation on a given kernel context. This can allow for the creation of a special privileged context that the device can use if it needs relocation disabled, and can use regular contexts at times when it needs relocation enabled. This interface is only available to users of the kernel API for obvious reasons, and will never be supported in a virtualised environment. This will be used by the upcoming cxl support in the mlx5 driver. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Ensure PSL interrupt is configured for contexts with no AFU IRQs	Ian Munsie	2016-05-11	2	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the cxl kernel API, it is possible to create a context and start it without allocating any interrupts. Since we assign or allocate the PSL interrupt when allocating AFU interrupts this will lead to a situation where we start the context with no means to take any faults. The user API is not affected as it always goes through the cxl interrupt allocation code paths and will have the PSL interrupt allocated or assigned, even if no AFU interrupts were requested. This checks that at least one interrupt is configured at the time of attach, and if not it will assign the multiplexed PSL interrupt for powernv, or allocate a single interrupt for PowerVM. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Remove duplicate #defines	Ian Munsie	2016-05-11	1	-9/+0
\| \| \| \| \| \| \| \| \|	These defines are not used, but other equivalent definitions (CXL_SPA_SW_CMD_*) are used. Remove the unused defines. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Handle num_of_processes larger than can fit in the SPA	Ian Munsie	2016-05-11	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	num_of_process is a 16 bit field, theoretically allowing an AFU to support 16K processes, however the scheduled process area currently has a maximum size of 1MB, which limits the maximum number of processes to 7704. Some AFUs may not necessarily care what the limit is and just want to be able to use the maximum by setting the field to 16K. To allow these to work, detect this situation and use the maximum size for the SPA. Downgrade the WARN_ON to a dev_warn. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm: Improve readability of update_mmu_cache()	Gavin Shan	2016-05-11	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function is used to update the MMU with software PTE. It can be called by data access exception handler (0x300) or instruction access exception handler (0x400). If the function is called by 0x400 handler, the local variable @access is set to _PAGE_EXEC to indicate the software PTE should have that flag set. When the function is called by 0x300 handler, @access is set to zero. This improves the readability of the function by replacing if statements with switch. No logical changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm: define TOP_ZONE as a constant	Oliver O'Halloran	2016-05-11	1	-12/+5
\| \| \| \| \| \| \| \| \| \| \| \|	The zone that contains the top of memory will be either ZONE_NORMAL or ZONE_HIGHMEM depending on the kernel config. There are two functions that require this information and both of them use an #ifdef to set a local variable (top_zone). This is a little silly so lets just make it a constant. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Cc: linux-mm@kvack.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/sstep: Fix emulation fall-through	Oliver O'Halloran	2016-05-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a switch fallthough in instr_analyze() which can cause an invalid instruction to be emulated as a different, valid, instruction. The rld* (opcode 30) case extracts a sub-opcode from bits 3:1 of the instruction word. However, the only valid values of this field are 001 and 000. These cases are correctly handled, but the others are not which causes execution to fall through into case 31. Breaking out of the switch causes the instruction to be marked as unknown and allows the caller to deal with the invalid instruction in a manner consistent with other invalid instructions. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/sstep: Fix sstep.c compile on powerpcspe	Lennart Sorensen	2016-05-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()") introduced ldarx and stdcx into the instructions in sstep.c, which are not accepted by the assembler on powerpcspe, but does seem to be accepted by the normal powerpc assembler even in 32 bit mode. Wrap these two instructions in a __powerpc64__ check like it is everywhere else in the file. Fixes: be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()") Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/xmon: Fix SPR read/write commands and add command to dump SPRs	Paul Mackerras	2016-05-11	3	-61/+122
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	xmon has commands for reading and writing SPRs, but they don't work currently for several reasons. They attempt to synthesize a small function containing an mfspr or mtspr instruction and call it. However, the instructions are on the stack, which is usually not executable. Also, for 64-bit we set up a procedure descriptor, which is fine for the big-endian ABIv1, but not correct for ABIv2. Finally, the code uses the infrastructure for catching memory errors, but that only catches data storage interrupts and machine check interrupts, but a failed mfspr/mtspr can generate a program interrupt or a hypervisor emulation assist interrupt, or be a no-op. Instead of trying to synthesize a function on the fly, this adds two new functions, xmon_mfspr() and xmon_mtspr(), which take an SPR number as an argument and read or write the SPR. Because there is no Power ISA instruction which takes an SPR number in a register, we have to generate one of each possible mfspr and mtspr instruction, for all 1024 possible SPRs. Thus we get just over 8k bytes of code for each of xmon_mfspr() and xmon_mtspr(). However, this 16kB of code pales in comparison to the > 130kB of PPC opcode tables used by the xmon disassembler. To catch interrupts caused by the mfspr/mtspr instructions, we add a new 'catch_spr_faults' flag. If an interrupt occurs while it is set, we come back into xmon() via program_check_interrupt(), _exception() and die(), see that catch_spr_faults is set and do a longjmp to bus_error_jmp, back into read_spr() or write_spr(). This adds a couple of other nice features: first, a "Sa" command that attempts to read and print out the value of all 1024 SPRs. If any mfspr instruction acts as a no-op, then the SPR is not implemented and not printed. Secondly, the Sr and Sw commands detect when an SPR is not implemented (i.e. mfspr is a no-op) and print a message to that effect rather than printing a bogus value. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	perf tools: Fix perf regs mask generation	Naveen N. Rao	2016-05-11	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	On some architectures (powerpc in particular), the number of registers exceeds what can be represented in an integer bitmask. Ensure we generate the proper bitmask on such platforms. Fixes: 71ad0f5e4 ("perf tools: Support for DWARF CFI unwinding on post processing") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	perf/powerpc: Add support for unwinding perf-stackdump	Chandan Kumar	2016-05-11	3	-0/+98
\| \| \| \| \| \| \| \| \| \|	Adds support for unwinding user stack dump by linking with libunwind. Signed-off-by: Chandan Kumar <chandan.kumar@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Add HAVE_PERF_USER_STACK_DUMP support	Chandan Kumar	2016-05-11	3	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	With perf regs support enabled for powerpc, in commit ed4a4ef85cf5 ("powerpc/perf: Add support for sampling interrupt register state"), the support for obtaining perf user stack dump is already enabled. This patch declares the support for same and also updates documentation to mark the support for perf-regs and perf-stackdump. Signed-off-by: Chandan Kumar <chandan.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm/hash64: Fix subpage protection with 4K HPTE config	Michael Ellerman	2016-05-11	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With Linux page size of 64K and hardware only supporting 4K HPTE, if we use subpage protection, we always fail for the subpage 0 as shown below (using the selftest subpage_prot test): 520175565: (4520111850): Failed at 0x3fffad4b0000 (p=13,sp=0,w=0), want=fault, got=pass ! 4520890210: (4520826495): Failed at 0x3fffad5b0000 (p=29,sp=0,w=0), want=fault, got=pass ! 4521574251: (4521510536): Failed at 0x3fffad6b0000 (p=45,sp=0,w=0), want=fault, got=pass ! 4522258324: (4522194609): Failed at 0x3fffad7b0000 (p=61,sp=0,w=0), want=fault, got=pass ! This is because hash preload wrongly inserts the HPTE entry for subpage 0 without looking at the subpage protection information. Fix it by teaching should_hash_preload() not to preload if we have subpage protection configured for that range. It appears this has been broken since it was introduced in 2008. Fixes: fa28237cfcc5 ("[POWERPC] Provide a way to protect 4k subpages when using 64k pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Rework into should_hash_preload() to avoid build fails w/SLICES=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm/hash64: Factor out hash preload psize check	Michael Ellerman	2016-05-11	1	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \|	Currently we have a check in hash_preload() against the psize, which is only included when CONFIG_PPC_MM_SLICES is enabled. We want to expand this check in a subsequent patch, so factor it out to allow that. As a bonus it removes the #ifdef in the C code. Unfortunately we can't put this in the existing CONFIG_PPC_MM_SLICES block because it would require a forward declaration. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Update of_remove_property() call sites to remove null checking	Suraj Jitindar Singh	2016-05-11	4	-26/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After obtaining a property from of_find_property() and before calling of_remove_property() most code checks to ensure that the property returned from of_find_property() is not null. The previous patch moved this check to the start of the function of_remove_property() in order to avoid the case where this check isn't done and a null value is passed. This ensures the check is always conducted before taking locks and attempting to remove the property. Thus it is no longer necessary to perform a check for null values before invoking of_remove_property(). Update of_remove_property() call sites in order to remove redundant checking for null property value as check is now performed within the of_remove_property function(). Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> [mpe: Unbreak some lines which are just >80 chars for readability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	drivers/of: Add check for null property in of_remove_property()	Suraj Jitindar Singh	2016-05-11	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The validity of the property input argument to of_remove_property() is never checked within the function and thus it is possible to pass a null value. It happens that this will be picked up in __of_remove_property() as no matching property of the device node will be found and thus an error will be returned, however once again there is no explicit check for a null value. By the time this is detected 2 locks have already been acquired which is completely unnecessary if the property to remove is null. Add an explicit check in the function of_remove_property() for a null property value and return -ENODEV in this case, this is consistent with what the previous return value would have been when the null value was not detected and passed to __of_remove_property(). By moving an explicit check for the property paramenter into the of_remove_property() function, this will remove the need to perform this check in calling code before invocation of the of_remove_property() function. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pseries: Add null property check to pseries_discover_pic()	Suraj Jitindar Singh	2016-05-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	The return value of of_get_property() isn't checked before it is passed to the strstr() function, if it happens that the return value is null then this will result in a null pointer being dereferenced. Add a check to see if the return value of of_get_property() is null and if it is continue straight on to the next node. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv/pci: Fix cfg_dbg() & replace with pr_devel()	Alexey Kardashevskiy	2016-05-11	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \|	When cfg_dbg() is enabled (i.e. mapped to printk()), gcc produces errors as the __func__ parameter is missing (pnv_pci_cfg_read() has one); this adds the missing parameter. cfg_dbg() is just an inferior version of pr_devel() so use the latter instead. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Fix subpage_prot test to return !0 on failure	Michael Ellerman	2016-05-11	1	-8/+10
\| \| \| \| \| \| \| \| \| \| \|	It's helpful for automated testing if the test returns error codes back to the calling program. While we're here fix all the usages of %p to remove the double 0x, ie. %p already includes 0x. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
*	selftests/powerpc: Test cp_abort during context switch	Chris Smart	2016-05-11	5	-0/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test that performing a copy paste sequence in userspace on P9 does not result in a leak of the copy into the paste of another process. This is based on Anton Blanchard's context_switch benchmarking code. It sets up two processes tied to the same CPU, one which copies and one which pastes. The paste should never succeed and the test fails if it does. This is a test for commit, "8a64904 powerpc: Add support for userspace P9 copy paste." Patch created with much assistance from Michael Neuling <mikey@neuling.org> Signed-off-by: Chris Smart <chris@distroguy.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Remove unnecessary CONFIG_SMP #ifdefs	Chris Smart	2016-05-11	1	-6/+0
\| \| \| \| \| \| \| \| \| \|	The code in machine_restart/power_off/halt() includes #ifdefs around calls to smp_send_stop(), however these are not required as include/linux/smp.h includes an empty version of this function for CONFIG_SMP=n builds. Signed-off-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Remove unused remnants from A2 cpu	Rashmica Gupta	2016-05-11	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	Support for the A2 cpu was removed in commit fb5a515704d7 ("powerpc: Remove platforms/wsp and associated pieces"), and the externs: __setup_cpu_a2 and __restore_cpu_a2 are still around and unused, so remove them. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm/slice: Remove slice_mm_new_context()	Aneesh Kumar K.V	2016-05-11	2	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The usage in mm mmu_context_nohash.c is bogus, because we set the context.id value to MMU_NO_CONTEXT 4 lines previously in the same function, meaning slice_mm_new_context() will always be true. The book3s 64 usage was removed in the previous commit. So remove it as unused. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm/subpage: Initialise user psize correctly	Aneesh Kumar K.V	2016-05-11	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of the radix support we switched Book3s64 to use a value of ~0 for MMU_NO_CONTEXT. That is because id 0 is special on radix. However that broke the logic in init_new_context(). The code there needs to differentiate between a newly allocated context and one inherited via fork. Previously it worked because a newly allocated context has an id of zero (because it was just memset() to zero), which used to match MMU_NO_CONTEXT, and therefore slice_mm_new_context() did the right thing. Instead check against a context.id value of zero instead of using slice_mm_new_context(). Without this patch we never call slice_set_user_psize(), and end up with a slice psize value of zero and we always end up using 4K HPTE. Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm/radix: Fix CONFIG_PPC_MMU_STD_64 typo	Valentin Rothberg	2016-05-11	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	It's CONFIG_PPC_STD_MMU_64 not ... CONFIG_PPC_MMU_STD_64. Fixes: 11ffc1cfa4c2 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm/radix: Document software bits for radix	Aneesh Kumar K.V	2016-05-11	1	-2/+6
\| \| \| \| \| \| \|	Add #defines for Power ISA 3.0 software defined bits. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm/radix: Use firmware feature to enable Radix MMU	Aneesh Kumar K.V	2016-05-11	1	-0/+1
\| \| \| \| \| \| \| \| \|	We use the existing "ibm,pa-features" device-tree property to enable Radix MMU mode. This means we default to hash mode unless firmware tells us it's OK to start using Radix mode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>