openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
*	powerpc/iommu: Add it_page_shift field to determine iommu page size	Alistair Popple	2013-12-30	7	-13/+24
\| \| \| \| \| \| \| \|	This patch adds a it_page_shift field to struct iommu_table and initiliases it to 4K for all platforms. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/iommu: Update constant names to reflect their hardcoded page size	Alistair Popple	2013-12-30	11	-92/+94
\| \| \| \| \| \| \| \| \| \| \|	The powerpc iommu uses a hardcoded page size of 4K. This patch changes the name of the IOMMU_PAGE_* macros to reflect the hardcoded values. A future patch will use the existing names to support dynamic page sizes. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Remove unused REDBOOT Kconfig parameter	Michael Opdenacker	2013-12-30	3	-5/+0
\| \| \| \| \| \| \| \| \|	This removes the REDBOOT Kconfig parameter, which was no longer used anywhere in the source code and Makefiles. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Fix "attempt to move .org backwards" error	Mahesh Salgaonkar	2013-12-30	1	-140/+138
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With recent machine check patch series changes, The exception vectors starting from 0x4300 are now overflowing with allyesconfig. Fix that by moving machine_check_common and machine_check_handle_early code out of that region to make enough room for exception vector area. Fixes this build error reportes by Stephen: arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:958: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:959: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:983: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:984: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:1003: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:1013: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:1014: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:1015: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:1016: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:1017: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:1018: Error: attempt to move .org backwards [Moved the code further down as it introduced link errors due to too long relative branches to the masked interrupts handlers from the exception prologs. Also removed the useless feature section --BenH ] Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Get FSP memory errors and plumb into memory poison ↵	Mahesh Salgaonkar	2013-12-09	3	-0/+199
\| \| \| \| \| \| \| \| \| \| \|	infrastructure. Get the memory errors reported by opal and plumb it into memory poison infrastructure. This patch uses new messaging channel infrastructure to pull the fsp memory errors to linux. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Add config option for hwpoisoning.	Mahesh Salgaonkar	2013-12-09	1	-0/+6
\| \| \| \| \| \| \| \|	Add config option to enable generic memory hwpoisoning infrastructure for ppc64. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/mm: Enable _PAGE_NUMA for book3s	Aneesh Kumar K.V	2013-12-09	3	-1/+72
\| \| \| \| \| \| \| \|	We steal the _PAGE_COHERENCE bit and use that for indicating NUMA ptes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/mm: Only check for _PAGE_PRESENT in set_pte/pmd functions	Aneesh Kumar K.V	2013-12-09	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	We want to make sure we don't use these function when updating a pte or pmd entry that have a valid hpte entry, because these functions don't invalidate them. So limit the check to _PAGE_PRESENT bit. Numafault core changes use these functions for updating _PAGE_NUMA bits. That should be ok because when _PAGE_NUMA is set we can be sure that hpte entries are not present. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/mm: Free up _PAGE_COHERENCE for numa fault use later	Aneesh Kumar K.V	2013-12-09	5	-8/+26
\| \| \| \| \| \| \| \| \| \| \| \|	Set memory coherence always on hash64 config. If a platform cannot have memory coherence always set they can infer that from _PAGE_NO_CACHE and _PAGE_WRITETHRU like in lpar. So we dont' really need a separate bit for tracking _PAGE_COHERENCE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/mm: Use HPTE constants when updating hpte bits	Aneesh Kumar K.V	2013-12-09	2	-3/+4
\| \| \| \| \| \| \| \| \|	Even though we have same value for linux PTE bits and hash PTE pits use the hash pte bits wen updating hash pte Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Dynamically allocate slb_shadow from memblock	Jeremy Kerr	2013-12-09	1	-7/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the slb_shadow buffer is our largest symbol: [jk@pablo linux]$ nm --size-sort -r -S obj/vmlinux \| head -1 c000000000da0000 0000000000040000 d slb_shadow - we allocate 128 bytes per cpu; so 256k with NR_CPUS=2048. As we have constant initialisers, it's allocated in .text, causing a larger vmlinux image. We may also allocate unecessary slb_shadow buffers (> no. pacas), since we use the build-time NR_CPUS rather than the run-time nr_cpu_ids. We could move this to the bss, but then we still have the NR_CPUS vs nr_cpu_ids potential for overallocation. This change dynamically allocates the slb_shadow array, during initialise_pacas(). At a cost of 104 bytes of text, we save 256k of data: [jk@pablo linux]$ size obj/vmlinux{.orig,} text data bss dec hex filename 9202795 5244676 1169576 15617047 ee4c17 obj/vmlinux.orig 9202899 4982532 1169576 15355007 ea4c7f obj/vmlinux Tested on pseries. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Make slb_shadow a local	Jeremy Kerr	2013-12-09	3	-4/+2
\| \| \| \| \| \| \| \|	The only external user of slb_shadow is the pseries lpar code, and it can access through the paca array instead. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pci: Use dev_is_pci() to check whether it is pci device	Yijing Wang	2013-12-09	1	-1/+1
\| \| \| \| \| \| \| \|	Use PCI standard marco dev_is_pci() instead of directly compare pci_bus_type to check whether it is pci device. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE	Aneesh Kumar K.V	2013-12-09	2	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. On archs like ppc64 that don't use _PAGE_PROTNONE and also have a separate page table outside linux pagetable, we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. We still need to make sure we use the numa faulting logic only when CONFIG_NUMA_BALANCING is set. This implies the migrate-on-fault (Lazy migration) via mbind will only work if CONFIG_NUMA_BALANCING is set. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powernv: Remove get/set_rtc_time when they are not present	Michael Neuling	2013-12-05	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Currently we continue to poll get/set_rtc_time even when we know they are not working. This changes it so that if it fails at boot time we remove the ppc_md get/set_rtc_time hooks so that we don't end up polling known broken calls. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add real mode cache inhibited IO accessors	Michael Ellerman	2013-12-05	1	-0/+16
\| \| \| \| \| \| \| \|	These accessors allow us to do cache inhibited accesses when in real mode. They should only be used in real mode. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Increase EEH recovery timeout for SR-IOV	Brian King	2013-12-05	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	In order to support concurrent adapter firmware download to SR-IOV adapters on pSeries, each VF will see an EEH event where the slot will remain in the unavailable state for the duration of the adapter firmware update, which can take as long as 5 minutes. Extend the EEH recovery timeout to account for this. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Output PHB diag-data	Gavin Shan	2013-12-05	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \|	When hitting frozen PE or fenced PHB, it's always indicative to have dumped PHB diag-data for further analysis and diagnosis. However, we never dump that for the cases. The patch intends to dump PHB diag-data at the backend of eeh_ops::get_log() for PowerNV platform. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Move PHB-diag dump functions around	Gavin Shan	2013-12-05	3	-205/+144
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to the completion of PCI enumeration, we actively detects EEH errors on PCI config cycles and dump PHB diag-data if necessary. The EEH backend also dumps PHB diag-data in case of frozen PE or fenced PHB. However, we are using different functions to dump the PHB diag-data for those 2 cases. The patch merges the functions for dumping PHB diag-data to one so that we can avoid duplicate code. Also, we never dump PHB3 diag-data during PCI config cycles with frozen PE. The patch fixes it as well. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	PPC: POWERNV: move iommu_add_device earlier	Alexey Kardashevskiy	2013-12-05	6	-53/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation of IOMMU on sPAPR does not use iommu_ops and therefore does not call IOMMU API's bus_set_iommu() which 1) sets iommu_ops for a bus 2) registers a bus notifier Instead, PCI devices are added to IOMMU groups from subsys_initcall_sync(tce_iommu_init) which does basically the same thing without using iommu_ops callbacks. However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158) implements iommu_ops and when tce_iommu_init is called, every PCI device is already added to some group so there is a conflict. This patch does 2 things: 1. removes the loop in which PCI devices were added to groups and adds explicit iommu_add_device() calls to add devices as soon as they get the iommu_table pointer assigned to them. 2. moves a bus notifier to powernv code in order to avoid conflict with the notifier from Freescale driver. iommu_add_device() and iommu_del_device() are public now. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Move SG list structure to header file	Vasant Hegde	2013-12-05	2	-25/+28
\| \| \| \| \| \| \| \|	Move SG list and entry structure to header file so that it can be used in other places as well. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Infrastructure to read opal messages in generic format.	Mahesh Salgaonkar	2013-12-05	3	-1/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Opal now has a new messaging infrastructure to push the messages to linux in a generic format for different type of messages using only one event bit. The format of the opal message is as below: struct opal_msg { uint32_t msg_type; uint32_t reserved; uint64_t params[8]; }; This patch allows clients to subscribe for notification for specific message type. It is upto the subscriber to decipher the messages who showed interested in receiving specific message type. The interface to subscribe for notification is: int opal_message_notifier_register(enum OpalMessageType msg_type, struct notifier_block *nb) The notifier will fetch the opal message when available and notify the subscriber with message type and the opal message. It is subscribers responsibility to copy the message data before returning from notifier callback. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/windfarm: Remove superfluous name casts	Geert Uytterhoeven	2013-12-05	2	-2/+2
\| \| \| \| \| \| \| \| \|	wf_sensor.name is "const char *" Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Machine check exception handling.	Mahesh Salgaonkar	2013-12-05	3	-1/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add basic error handling in machine check exception handler. - If MSR_RI isn't set, we can not recover. - Check if disposition set to OpalMCE_DISPOSITION_RECOVERED. - Check if address at fault is inside kernel address space, if not then send SIGBUS to process if we hit exception when in userspace. - If address at fault is not provided then and if we get a synchronous machine check while in userspace then kill the task. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Remove machine check handling in OPAL.	Mahesh Salgaonkar	2013-12-05	1	-6/+2
\| \| \| \| \| \| \| \|	Now that we are ready to handle machine check directly in linux, do not register with firmware to handle machine check exception. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: Queue up and process delayed MCE events.	Mahesh Salgaonkar	2013-12-05	5	-98/+168
\| \| \| \| \| \| \| \| \| \| \|	When machine check real mode handler can not continue into host kernel in V mode, it returns from the interrupt and we loose MCE event which never gets logged. In such a situation queue up the MCE event so that we can log it later when we get back into host kernel with r1 pointing to kernel stack e.g. during syscall exit. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: Decode and save machine check event.	Mahesh Salgaonkar	2013-12-05	6	-39/+434
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we handle machine check in linux, the MCE decoding should also take place in linux host. This info is crucial to log before we go down in case we can not handle the machine check errors. This patch decodes and populates a machine check event which contain high level meaning full MCE information. We do this in real mode C code with ME bit on. The MCE information is still available on emergency stack (in pt_regs structure format). Even if we take another exception at this point the MCE early handler will allocate a new stack frame on top of current one. So when we return back here we still have our MCE information safe on current stack. We use per cpu buffer to save high level MCE information. Each per cpu buffer is an array of machine check event structure indexed by per cpu counter mce_nest_count. The mce_nest_count is incremented every time we enter machine check early handler in real mode to get the current free slot (index = mce_nest_count - 1). The mce_nest_count is decremented once the MCE info is consumed by virtual mode machine exception handler. This patch provides save_mce_event(), get_mce_event() and release_mce_event() generic routines that can be used by machine check handlers to populate and retrieve the event. The routine release_mce_event() will free the event slot so that it can be reused. Caller can invoke get_mce_event() with a release flag either to release the event slot immediately OR keep it so that it can be fetched again. The event slot can be also released anytime by invoking release_mce_event(). This patch also updates kvm code to invoke get_mce_event to retrieve generic mce event rather than paca->opal_mce_evt. The KVM code always calls get_mce_event() with release flags set to false so that event is available for linus host machine If machine check occurs while we are in guest, KVM tries to handle the error. If KVM is able to handle MC error successfully, it enters the guest and delivers the machine check to guest. If KVM is not able to handle MC error, it exists the guest and passes the control to linux host machine check handler which then logs MC event and decides how to handle it in linux host. In failure case, KVM needs to make sure that the MC event is available for linux host to consume. Hence KVM always calls get_mce_event() with release flags set to false and later it invokes release_mce_event() only if it succeeds to handle error. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power8.	Mahesh Salgaonkar	2013-12-05	3	-0/+41
\| \| \| \| \| \| \| \| \| \|	This patch handles the memory errors on power8. If we get a machine check exception due to SLB or TLB errors, then flush SLBs/TLBs and reload SLBs to recover. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: Flush SLB/TLBs if we get SLB/TLB machine check errors on power7.	Mahesh Salgaonkar	2013-12-05	5	-0/+227
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we get a machine check exception due to SLB or TLB errors, then flush SLBs/TLBs and reload SLBs to recover. We do this in real mode before turning on MMU. Otherwise we would run into nested machine checks. If we get a machine check when we are in guest, then just flush the SLBs and continue. This patch handles errors for power7. The next patch will handle errors for power8 Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: Add flush_tlb operation in cpu_spec.	Mahesh Salgaonkar	2013-12-05	4	-25/+44
\| \| \| \| \| \| \| \| \| \| \|	This patch introduces flush_tlb operation in cpu_spec structure. This will help us to invoke appropriate CPU-side flush tlb routine. This patch adds the foundation to invoke CPU specific flush routine for respective architectures. Currently this patch introduce flush_tlb for p7 and p8. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: Introduce a early machine check hook in cpu_spec.	Mahesh Salgaonkar	2013-12-05	2	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the early machine check function pointer in cputable for CPU specific early machine check handling. The early machine handle routine will be called in real mode to handle SLB and TLB errors. We can not reuse the existing machine_check hook because it is always invoked in kernel virtual mode and we would already be in trouble if we get SLB or TLB errors. This patch just sets up a mechanism to invoke CPU specific handler. The subsequent patches will populate the function pointer. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: Return from interrupt if coming from evil context.	Mahesh Salgaonkar	2013-12-05	2	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can get machine checks from any context. We need to make sure that we handle all of them correctly. If we are coming from hypervisor user-space, we can continue in host kernel in virtual mode to deliver the MC event. If we got woken up from power-saving mode then we may come in with one of the following state: a. No state loss b. Supervisor state loss c. Hypervisor state loss For (a) and (b), we go back to nap again. State (c) is fatal, keep spinning. For all other context which we not sure of queue up the MCE event and return from the interrupt. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: handle machine check in Linux host.	Mahesh Salgaonkar	2013-12-05	3	-0/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move machine check entry point into Linux. So far we were dependent on firmware to decode MCE error details and handover the high level info to OS. This patch introduces early machine check routine that saves the MCE information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate stack frame on emergency stack and set the r1 accordingly. This allows us to be prepared to take another exception without loosing context. One thing to note here that, if we get another machine check while ME bit is off then we risk a checkstop. Hence we restrict ourselves to save only MCE information and register saved on PACA_EXMC save are before we turn the ME bit on. We use paca->in_mce flag to differentiate between first entry and nested machine check entry which helps proper use of emergency stack. We increment paca->in_mce every time we enter in early machine check handler and decrement it while leaving. When we enter machine check early handler first time (paca->in_mce == 0), we are sure nobody is using MC emergency stack and allocate a stack frame at the start of the emergency stack. During subsequent entry (paca->in_mce > 0), we know that r1 points inside emergency stack and we allocate separate stack frame accordingly. This prevents us from clobbering MCE information during nested machine checks. The early machine check handler changes are placed under CPU_FTR_HVMODE section. This makes sure that the early machine check handler will get executed only in hypervisor kernel. This is the code flow: Machine Check Interrupt \| V 0x200 vector ME=0, IR=0, DR=0 \| V +-----------------------------------------------+ \|machine_check_pSeries_early: \| ME=0, IR=0, DR=0 \| Alloc frame on emergency stack \| \| Save srr1, srr0, dar and dsisr on stack \| +-----------------------------------------------+ \| (ME=1, IR=0, DR=0, RFID) \| V machine_check_handle_early ME=1, IR=0, DR=0 \| V +-----------------------------------------------+ \| machine_check_early (r3=pt_regs) \| ME=1, IR=0, DR=0 \| Things to do: (in next patches) \| \| Flush SLB for SLB errors \| \| Flush TLB for TLB errors \| \| Decode and save MCE info \| +-----------------------------------------------+ \| (Fall through existing exception handler routine.) \| V machine_check_pSerie ME=1, IR=0, DR=0 \| (ME=1, IR=1, DR=1, RFID) \| V machine_check_common ME=1, IR=1, DR=1 . . . Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: Introduce exclusive emergency stack for machine check exception.	Mahesh Salgaonkar	2013-12-05	3	-1/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces exclusive emergency stack for machine check exception. We use emergency stack to handle machine check exception so that we can save MCE information (srr1, srr0, dar and dsisr) before turning on ME bit and be ready for re-entrancy. This helps us to prevent clobbering of MCE information in case of nested machine checks. The reason for using emergency stack over normal kernel stack is that the machine check might occur in the middle of setting up a stack frame which may result into improper use of kernel stack. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/book3s: Split the common exception prolog logic into two section.	Mahesh Salgaonkar	2013-12-05	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch splits the common exception prolog logic into three parts to facilitate reuse of existing code in the next patch. This patch also re-arranges few instructions in such a way that the second part now deals with saving register values from paca save area to stack frame, and the third part deals with saving current register values to stack frame. The second and third part will be reused in the machine check exception routine in the subsequent patch. Please note that this patch does not introduce or change existing code logic. Instead it is just a code movement and instruction re-ordering. Patch Acked-by Paul. But made some minor modification (explained above) to address Paul's comment in the later patch(3). Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Replace CONFIG_POWERNV_MSI with just CONFIG_PPC_POWERNV	Michael Ellerman	2013-12-02	2	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently have a user visible CONFIG_POWERNV_MSI option, but it doesn't actually disable MSI for powernv. The MSI code is always built, what it does disable is the inclusion of the MSI bitmap code, which leads to a build error. eg, with PPC_POWERNV=y and POWERNV_MSI=n we get: arch/powerpc/platforms/built-in.o: In function `.pnv_teardown_msi_irqs': pci.c:(.text+0x3558): undefined reference to `.msi_bitmap_free_hwirqs' We don't really need a POWERNV_MSI symbol, just have the MSI bitmap code depend directly on PPC_POWERNV. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Reviewed-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: CONFIG_PSERIES_MSI should depend on PPC_PSERIES	Michael Ellerman	2013-12-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Previously PSERIES_MSI depended on PPC_PSERIES via EEH. However in commit 317f06d "powerpc/eeh: Move common part to kernel directory" we made CONFIG_EEH selectable on POWERNV. That leaves us with PSERIES_MSI being live even when PSERIES=n. Fix it by making PSERIES_MSI depend directly on PPC_PSERIES. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Reviewed-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/kernel/sysfs: Cleanup set up macros for PMC/non-PMC SPRs	Madhavan Srinivasan	2013-12-02	1	-34/+38
\| \| \| \| \| \| \| \| \| \| \| \|	Currently PMC (Performance Monitor Counter) setup macros are used for other SPRs. Since not all SPRs are PMC related, this patch modifies the exisiting macro and uses it to setup both PMC and non PMC SPRs accordingly. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Make irq_stat.timers_irqs counting more specific	fan.du	2013-12-02	3	-5/+13
\| \| \| \| \| \| \| \| \| \|	Current irq_stat.timers_irqs counting doesn't discriminate timer event handler and other timer interrupt(like arch_irq_work_raise). Sometimes we need to know exactly how much interrupts timer event handler fired, so let's be more specific on this. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: purge all the prefetched instructions for the coherent icache flush	Kevin Hao	2013-12-02	3	-2/+22
\| \| \| \| \| \| \| \| \| \| \|	As Benjamin Herrenschmidt has indicated, we still need a dummy icbi to purge all the prefetched instructions from the ifetch buffers for the snooping icache. We also need a sync before the icbi to order the actual stores to memory that might have modified instructions with the icbi. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: kernel: remove useless code which related with 'max_cpus'	Chen Gang	2013-12-02	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since not need 'max_cpus' after the related commit, the related code are useless too, need be removed. The related commit: c1aa687 powerpc: Clean up obsolete code relating to decrementer and timebase The related warning: arch/powerpc/kernel/smp.c:323:43: warning: parameter ‘max_cpus’ set but not used [-Wunused-but-set-parameter] Signed-off-by: Chen Gang <gang.chen@asianux.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/boot: Ignore .dtb files.	Ian Campbell	2013-12-02	1	-0/+1
\| \| \| \| \| \| \| \| \|	Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/dts/virtex440: Declare address/size-cells for phy device	Ian Campbell	2013-12-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a warning: DTC arch/powerpc/boot/virtex440-ml507.dtb Warning (reg_format): "reg" property in /plb@0/xps-ll-temac@81c00000/ethernet@81c00000/phy@7 has invalid length (4 bytes) (#address-cells == 2, #size-cells == 1) Warning (avoid_default_addr_size): Relying on default #address-cells value for /plb@0/xps-ll-temac@81c00000/ethernet@81c00000/phy@7 Warning (avoid_default_addr_size): Relying on default #size-cells value for /plb@0/xps-ll-temac@81c00000/ethernet@81c00000/phy@7 Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Gernot Vormayr <gvormayr@gmail.com> Cc: devicetree@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/4xx: Fix warning in kilauea.dtb	Ian Campbell	2013-12-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently I see: DTC arch/powerpc/boot/kilauea.dtb Warning (reg_format): "reg" property in /plb/ppc4xx-msi@C10000000 has invalid length (12 bytes) (#address-cells == 1, #size-cells == 1) It appears that unlike the other platforms handled by 3fb7933850fa "powerpc/4xx: Adding PCIe MSI support" this platform does not use address-cells=2. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Josh Boyer <jwboyer@gmail.com> Cc: Rupjyoti Sarmah <rsarmah@apm.com> Cc: Tirumala R Marri <tmarri@apm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: devicetree@vger.kernel.org (open list:OPEN FIRMWARE AND...) Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Use patch_exception to update the debug exception handler	Kevin Hao	2013-12-02	1	-5/+1
\| \| \| \| \|	Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Move the patch_exception to a common place	Kevin Hao	2013-12-02	3	-19/+22
\| \| \| \| \| \| \|	So that it can be used by other codes. No function change. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/ps3: Remove inline marking of EXPORT_SYMBOL functions	Denis Efremov	2013-12-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	EXPORT_SYMBOL and inline directives are contradictory to each other. The patch fixes this inconsistency. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Denis Efremov <yefremov.denis@gmail.com> Acked-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/windfarm: Fix XServe G5 fan control Makefile issue	Benjamin Herrenschmidt	2013-11-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	We are missing building windfarm_max6690_sensor.o when building CONFIG_WINDFARM_RM31. Usually all the windfarm drivers are built and thus this isn't a problem but some more "tailored" setups (Gentoo ?) building only that driver are not working because the require sensor module is missing. Reported-by: Stanislav Ponomarev <devhexorg@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	arch/powerpc/kernel: Use %12.12s instead of %12s to avoid memory overflow	Chen Gang	2013-11-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for tmp_part->header.name: it is "Terminating null required only for names < 12 chars". so need to limit the %.12s for it in printk additional info: %12s limit the width, not for the original string output length if name length is more than 12, it still can be fully displayed. if name length is less than 12, the ' ' will be filled before name. %.12s truly limit the original string output length (precision) Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/signals: Improved mark VSX not saved with small contexts fix	Michael Neuling	2013-11-25	2	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a recent patch: commit c13f20ac48328b05cd3b8c19e31ed6c132b44b42 Author: Michael Neuling <mikey@neuling.org> powerpc/signals: Mark VSX not saved with small contexts We fixed an issue but an improved solution was later discussed after the patch was merged. Firstly, this patch doesn't handle the 64bit signals case, which could also hit this issue (but has never been reported). Secondly, the original patch isn't clear what MSR VSX should be set to. The new approach below always clears the MSR VSX bit (to indicate no VSX is in the context) and sets it only in the specific case where VSX is available (ie. when VSX has been used and the signal context passed has space to provide the state). This reverts the original patch and replaces it with the improved solution. It also adds a 64 bit version. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>