openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	powerpc/mm: Fix build of Book3E/64 with 64K pages	Benjamin Herrenschmidt	2016-07-07	1	-0/+1
\| \| \| \| \|	Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Use "Delta" rather than "Error" in normal output	Michael Ellerman	2016-07-07	1	-1/+1
\| \| \| \| \| \| \| \|	Use "Delta" to refer to the difference between measurements, rather than "Error", so scripts that look for "Error" aren't confused into thinking there was a failure. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/boot: Add OPAL console to epapr wrappers	Oliver O'Halloran	2016-07-05	7	-2/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an OPAL console backend to the powerpc boot wrapper so that decompression failures inside the wrapper can be reported to the user. This is important since it typically indicates data corruption in the firmware and other nasty things. Currently this only works when building a little endian kernel. When compiling a 64 bit BE kernel the wrapper is always build 32 bit to be compatible with some 32 bit firmwares. BE support will be added at a later date. Another limitation of this is that only the "raw" type of OPAL console is supported, however machines that provide a hvsi console also provide a raw console so this is not an issue in practice. Actually-written-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Move #ifdef __powerpc64__ to avoid warnings on 32-bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm: Add a parameter to disable 1TB segs	Oliver O'Halloran	2016-07-05	2	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the kernel command line parameter "no_tb_segs" which forces the kernel to use 256MB rather than 1TB segments. Forcing the use of 256MB segments makes it considerably easier to test code that depends on an SLB miss occurring. Suggested-by: Michael Neuling <mikey@neuling.org> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/timer: Large Decrementer support	Oliver O'Halloran	2016-07-05	3	-11/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Power ISAv3 adds a large decrementer (LD) mode which increases the size of the decrementer register. The size of the enlarged decrementer register is between 32 and 64 bits with the exact size being dependent on the implementation. When in LD mode, reads are sign extended to 64 bits and a decrementer exception is raised when the high bit is set (i.e the value goes below zero). Writes however are truncated to the physical register width so some care needs to be taken to ensure that the high bit is not set when reloading the decrementer. This patch adds support for using the LD inside the host kernel on processors that support it. When LD mode is supported firmware will supply the ibm,dec-bits property for CPU nodes to allow the kernel to determine the maximum decrementer value. Enabling LD mode is a hypervisor privileged operation so the kernel can only enable it manually when running in hypervisor mode. Guests that support LD mode can request it using the "ibm,client-architecture-support" firmware call (not implemented in this patch) or some other platform specific method. If this property is not supplied then the traditional decrementer width of 32 bit is assumed and LD mode will not be enabled. This patch was based on initial work by Jack Miller. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Avoid -maltivec when using clang integrated assembler	Anton Blanchard	2016-07-05	1	-1/+1
\| \| \| \| \| \| \| \|	Check the assembler supports -maltivec by wrapping it with call as-option. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pseries: Fix error return value in cmm_mem_going_offline()	Rasmus Villemoes	2016-07-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	cmm_mem_going_offline() is (only) called from cmm_memory_cb(), which sends the return value through notifier_from_errno(). The latter expects 0 or -errno (notifier_to_errno(notifier_from_errno(x)) is 0 for any x >= 0, so passing a positive value cannot make sense). Hence negate ENOMEM. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/rtas: Fix array overrun in ppc_rtas() syscall	Andrew Donnellan	2016-07-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If ppc_rtas() is called with args.nargs == 16 and args.nret == 0, args.rets is set to point to &args.args[16], which is beyond the end of the args.args array. This results in a minor read overrun of the array when we check the first return code (which, per PAPR, is a required output of all RTAS calls) to see if there's been a hardware error. Change the nargs/nret check to ensure nargs is <= 15, allowing room for the status code. Users shouldn't be calling with nret == 0, but there's no real harm if they do, so we don't stop them. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Test unaligned copy and paste	Chris Smart	2016-07-05	10	-1/+332
\| \| \| \| \| \| \| \|	Test that an ISA 3.0 compliant machine performing an unaligned copy, copy_first, paste or paste_last is sent a SIGBUS. Signed-off-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Send SIGBUS on unaligned copy and paste	Chris Smart	2016-07-05	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calling ISA 3.0 instructions copy, copy_first, paste and paste_last generates an alignment fault when copying or pasting unaligned data (128 byte). We catch this and send SIGBUS to the userspace process that caused it. We do not emulate these because paste may contain additional metadata when pasting to a co-processor and paste_last is the synchronisation point for preceding copy/paste sequences. Thanks to Michael Neuling <mikey@neuling.org> for his help. Signed-off-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Import Anton's mmap & futex micro benchmarks	Michael Ellerman	2016-07-05	4	-1/+86
\| \| \| \| \| \| \|	These are useful little loops for smoke testing performance. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Fix generation of vector instructions/types in context_switch	Cyril Bur	2016-07-05	2	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently it doesn't appear the resulting binary actually uses any Altivec or VSX instructions the solution is to explicitly tell GCC to use vector instructions and use vector types in the code. Part of this this issue can be GCC version specific: GCC 4.9.x is happy to use Altivec and VSX instructions if altivec.h is includedi (and possibly if vector types are used), this also means that 4.9.x will use VSX instructions even if only -maltivec is passed. It is also possible that Altivec instructions will be used even without -maltivec or -mabi=altivec. GCC 5.2.x complains about the lack of -maltivec parameter if altivec.h is included and will not use VSX unless -mvsx is present on commandline. GCC 5.3.0 has a regression that means __attribute__((__target__("no-vsx")) fails to build. A fix is targeted for 5.4. Furthermore LTO (Link Time Optimisation) doesn't play well with __attribute__((__target__("no-vsx")), LTO can cause GCC to forget about the attribute and compile with VSX instructions regardless. Be wary when enabling -flfo for this test. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Fix usage message in context_switch	Cyril Bur	2016-07-05	1	-3/+3
\| \| \| \| \| \| \| \| \|	When we inverted the behaviour of the flags we forgot to update the usage message. Fixes: 51c21e72eb99 ("selftests/powerpc: Make context_switch touch FP/altivec/vector by default") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc/pmu: Use signed long to read perf_event_paranoid	Cyril Bur	2016-07-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Excerpt from man 2 perf_event_open: /proc/sys/kernel/perf_event_paranoid The perf_event_paranoid file can be set to restrict access to the performance counters. 2 allow only user-space measurements. 1 allow both kernel and user measurements (default). 0 allow access to CPU-specific data but not raw tracepoint samples. -1 no restrictions. require_paranoia_below() should return 0 if perf_event_paranoid is below a specified level, the value from perf_event_paranoid is read into an unsigned long so the incorrect value is returned when perf_event_paranoid is set to -1. Without this patch applied there is the same number of selftests/powerpc which skip when /proc/sys/kernel/perf_event_paranoid is set to 1 or -1 but no skips when set to zero. With this patch applied there are no skipped selftests/powerpc test when /proc/sys/kernel/perf_event_paranoid is set to 0 or -1. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/perf: Export Power9 generic and cache events to sysfs	Madhavan Srinivasan	2016-07-05	1	-0/+59
\| \| \| \| \| \| \| \|	Export the generic hardware and cache perf events for Power9 to sysfs, so users can determine the PMU event monitored. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/perf: Power9 PMU support	Madhavan Srinivasan	2016-07-05	2	-1/+272
\| \| \| \| \| \| \|	This patch adds base enablement for the power9 PMU. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/perf: Add power9 event list macros for generic and cache events	Madhavan Srinivasan	2016-07-05	1	-0/+55
\| \| \| \| \| \| \|	Add macros for the generic and cache events on Power9 Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/perf: factor out power8 __init_pmu code	Madhavan Srinivasan	2016-07-05	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \|	Factor out the power8 pmu init functions to share with power9. Monitor Mode Control Register S(MMCRS) and Monitor Mode Control Register H(MMCRH) registers are dropped in Power9. These registers are added to new function which are included for power8 init. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/perf: factor out power8 pmu functions	Madhavan Srinivasan	2016-07-05	4	-254/+273
\| \| \| \| \| \| \| \| \| \|	Factor out some of the power8 pmu functions to new file "isa207-common.c" to share with power9 pmu code. Only code movement and no logic change Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/perf: factor out power8 pmu macros and defines	Madhavan Srinivasan	2016-07-05	2	-217/+234
\| \| \| \| \| \| \| \| \|	Factor out some of the power8 pmu macros to new a header file to share with power9 pmu code. Just code movement and no logic change. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/fadump: Fix build error introduced by recent cleanup	Michael Ellerman	2016-07-05	1	-1/+1
\| \| \| \| \| \| \| \| \|	We spent so much time bike-shedding the printk() we missed that the next line was missing a semi-colon. And it seems none of our defconfigs turn on CONFIG_FA_DUMP. Fixes: 4a03749f140c ("powerpc/fadump: Trivial fix of spelling mistake, clean up message") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Add driver for operator panel on FSP machines	Suraj Jitindar Singh	2016-06-29	8	-0/+253
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement new character device driver to allow access from user space to the operator panel display present on IBM Power Systems machines with FSPs. This will allow status information to be presented on the display which is visible to a user. The driver implements a character buffer which a user can read/write by accessing the device (/dev/op_panel). This buffer is then displayed on the operator panel display. Any attempt to write past the last character position will have no effect and attempts to write more characters than the size of the display will be truncated. The device may only be accessed by a single process at a time. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/opal: Add inline function to get rc from an ASYNC_COMP opal_msg	Suraj Jitindar Singh	2016-06-29	7	-8/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	An opal_msg of type OPAL_MSG_ASYNC_COMP contains the return code in the params[1] struct member. However this isn't intuitive or obvious when reading the code and requires that a user look at the skiboot documentation or opal-api.h to verify this. Add an inline function to get the return code from an opal_msg and update call sites accordingly. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	devicetree/bindings: Add binding for operator panel on FSP machines	Suraj Jitindar Singh	2016-06-29	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \|	Add a binding to Documentation/devicetree/bindings/powerpc/opal (oppanel-opal.txt) for the operator panel which is present on IBM Power Systems machines with FSPs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Add set and get private data to context struct	Michael Neuling	2016-06-28	3	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides AFU drivers a means to associate private data with a cxl context. This is particularly intended for make the new callbacks for driver specific events easier for AFU drivers to use, as they can easily get back to any private data structures they may use. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Add mechanism for delivering AFU driver specific events	Philippe Bergheaud	2016-06-28	6	-9/+149
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds an afu_driver_ops structure with fetch_event() and event_delivered() callbacks. An AFU driver such as cxlflash can fill this out and associate it with a context to enable passing custom AFU specific events to userspace. This also adds a new kernel API function cxl_context_pending_events(), that the AFU driver can use to notify the cxl driver that new specific events are ready to be delivered, and wake up anyone waiting on the context wait queue. The current count of AFU driver specific events is stored in the field afu_driver_events of the context structure. The cxl driver checks the afu_driver_events count during poll, select, read, etc. calls to check if an AFU driver specific event is pending, and calls fetch_event() to obtain and deliver that event. This way, the cxl driver takes care of all the usual locking semantics around these calls and handles all the generic cxl events, so that the AFU driver only needs to worry about it's own events. fetch_event() return a struct cxl_event_afu_driver_reserved, allocated by the AFU driver, and filled in with the specific event information and size. Total event size (header + data) should not be greater than CXL_READ_MIN_SIZE (4K). Th cxl driver prepends an appropriate cxl event header, copies the event to userspace, and finally calls event_delivered() to return the status of the operation to the AFU driver. The event is identified by the context and cxl_event_afu_driver_reserved pointers. Since AFU drivers provide their own means for userspace to obtain the AFU file descriptor (i.e. cxlflash uses an ioctl on their scsi file descriptor to obtain the AFU file descriptor) and the generic cxl driver will never use this event, the ABI of the event is up to each individual AFU driver. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Fix spelling mistake "Retrived" -> "Retrieved"	Colin Ian King	2016-06-28	1	-1/+1
\| \| \| \| \| \| \|	Trivial fix to spelling mistake in pr_debug() message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/fadump: Trivial fix of spelling mistake, clean up message	Colin Ian King	2016-06-28	1	-3/+2
\| \| \| \| \| \| \| \| \| \|	Fix trivial spelling mistake "rgistration". Also use pr_err() instead of printk() and unsplit the string to keep it all on one line. Signed-off-by: Colin Ian King <colin.king@canonical.com> [mpe: Keep rc on the same line, splitting it doesn't help] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pci: Reduce log level of PCI I/O space warning	Benjamin Herrenschmidt	2016-06-24	1	-3/+3
\| \| \| \| \| \| \| \| \|	If a PHB has no I/O space, there's no need to make it look like something bad happened, a pr_debug() is plenty enough since this is the case of all our modern POWER chips. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/ebpf/jit: Implement JIT compiler for extended BPF	Naveen N. Rao	2016-06-24	8	-3/+1315
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PPC64 eBPF JIT compiler. Enable with: echo 1 > /proc/sys/net/core/bpf_jit_enable or echo 2 > /proc/sys/net/core/bpf_jit_enable ... to see the generated JIT code. This can further be processed with tools/net/bpf_jit_disasm. With CONFIG_TEST_BPF=m and 'modprobe test_bpf': test_bpf: Summary: 305 PASSED, 0 FAILED, [297/297 JIT'ed] ... on both ppc64 BE and LE. The details of the approach are documented through various comments in the code. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/bpf/jit: Isolate classic BPF JIT specifics into a separate header	Naveen N. Rao	2016-06-24	4	-121/+143
\| \| \| \| \| \| \| \| \| \|	Break out classic BPF JIT specifics into a separate header in preparation for eBPF JIT implementation. Note that ppc32 will still need the classic BPF JIT. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/bpf/jit: A few cleanups	Naveen N. Rao	2016-06-24	2	-10/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Per the ISA, ADDIS actually uses RT, rather than RS. Though the result is the same, make the usage clear. 2. The multiply instruction used is a 32-bit multiply. Rename PPC_MUL() to PPC_MULW() to make the same clear. 3. PPC_STW[U] take the entire 16-bit immediate value and do not require word-alignment, per the ISA. Change the macros to use IMM_L(). 4. A few white-space cleanups to satisfy checkpatch.pl. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/bpf/jit: Introduce rotate immediate instructions	Naveen N. Rao	2016-06-24	2	-9/+13
\| \| \| \| \| \| \| \| \| \| \|	Since we will be using the rotate immediate instructions for extended BPF JIT, let's introduce macros for the same. And since the shift immediate operations use the rotate immediate instructions, let's redo those macros to use the newly introduced instructions. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/bpf/jit: Optimize 64-bit Immediate loads	Naveen N. Rao	2016-06-24	1	-6/+11
\| \| \| \| \| \| \| \| \| \|	Similar to the LI32() optimization, if the value can be represented in 32-bits, use LI32(). Also handle loading a few specific forms of immediate values in an optimum manner. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/bpf/jit: Fix/enhance 32-bit Load Immediate implementation	Naveen N. Rao	2016-06-24	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing LI32() macro can sometimes result in a sign-extended 32-bit load that does not clear the top 32-bits properly. As an example, loading 0x7fffffff results in the register containing 0xffffffff7fffffff. While this does not impact classic BPF JIT implementation (since that only uses the lower word for all operations), we would like to share this macro between classic BPF JIT and extended BPF JIT, wherein the entire 64-bit value in the register matters. Fix this by first doing a shifted LI followed by ORI. An additional optimization is with loading values between -32768 to -1, where we now only need a single LI. The new implementation now generates the same or less number of instructions. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: set power_save func after the idle states are initialized	Shreyas B. Prabhu	2016-06-23	2	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pnv_init_idle_states() discovers supported idle states from the device tree and does the required initialization. Set power_save function pointer only after this initialization is done Otherwise on machines which don't support nap, eg. Power9, the kernel will crash when it tries to nap. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Print correct PHB type names	Gavin Shan	2016-06-21	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're initializing "IODA1" and "IODA2" PHBs though they are IODA2 and NPU PHBs as below kernel log indicates. Initializing IODA1 OPAL PHB /pciex@3fffe40700000 Initializing IODA2 OPAL PHB /pciex@3fff000400000 This fixes the PHB names. After it's applied, we get: Initializing IODA2 PHB (/pciex@3fffe40700000) Initializing NPU PHB (/pciex@3fff000400000) Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	PCI/hotplug: PowerPC PowerNV PCI hotplug driver	Gavin Shan	2016-06-21	4	-0/+750
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds standalone driver to support PCI hotplug for PowerPC PowerNV platform that runs on top of skiboot firmware. The firmware identifies hotpluggable slots and marked their device tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans device tree nodes to create/register PCI hotplug slot accordingly. The PCI slots are organized in fashion of tree, which means one PCI slot might have parent PCI slot and parent PCI slot possibly contains multiple child PCI slots. At the plugging time, the parent PCI slot is populated before its children. The child PCI slots are removed before their parent PCI slot can be removed from the system. If the skiboot firmware doesn't support slot status retrieval, the PCI slot device node shouldn't have property "ibm,reset-by-firmware". In that case, none of valid PCI slots will be detected from device tree. The skiboot firmware doesn't export the capability to access attention LEDs yet and it's something for TBD. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Functions to get/set PCI slot state	Gavin Shan	2016-06-21	5	-1/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This exports 4 functions, which base on the corresponding OPAL APIs to get/set PCI slot status. Those functions are going to be used by PowerNV PCI hotplug driver: pnv_pci_get_device_tree() opal_get_device_tree() pnv_pci_get_presence_state() opal_pci_get_presence_state() pnv_pci_get_power_state() opal_pci_get_power_state() pnv_pci_set_power_state() opal_pci_set_power_state() Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Introduce pnv_pci_get_slot_id()	Gavin Shan	2016-06-21	2	-0/+40
\| \| \| \| \| \| \| \| \| \|	This introduces pnv_pci_get_slot_id() to get the hotpluggable PCI slot ID from the corresponding device node. It will be used by hotplug driver. Requested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Use PCI slot reset infrastructure	Gavin Shan	2016-06-21	1	-1/+40
\| \| \| \| \| \| \| \| \| \| \| \| \|	The (OPAL) firmware might provide the PCI slot reset capability which is identified by property "ibm,reset-by-firmware" on the PCI slot associated device node. This routes the reset request to firmware if "ibm,reset-by-firmware" exists in the PCI slot device node. Otherwise, the reset is done inside kernel as before. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Support PCI slot ID	Gavin Shan	2016-06-21	3	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The reset and poll functionality from (OPAL) firmware supports PHB and PCI slot at same time. They are identified by ID. This supports PCI slot ID by: * Rename the argument name for opal_pci_reset() and opal_pci_poll() accordingly * Rename pnv_eeh_phb_poll() to pnv_eeh_poll() and adjust its argument name. * One macro is added to produce PCI slot ID. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pci: Delay populating pdn	Gavin Shan	2016-06-21	9	-59/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pdn (struct pci_dn) instances are allocated from memblock or bootmem when creating PCI controller (hoses) in setup_arch(). PCI hotplug, which will be supported by proceeding patches, releases PCI device nodes and their corresponding pdn on unplugging event. The memory chunks for pdn instances allocated from memblock or bootmem are hard to reused after being released. This delays creating pdn by pci_devs_phb_init() from setup_arch() to core_initcall() so that they are allocated from slab. The memory consumed by pdn can be released to system without problem during PCI unplugging time. It indicates that pci_dn is unavailable in setup_arch() and the the fixup on pdn (like AGP's) can't be carried out that time. We have to do that in pcibios_root_bridge_prepare() on maple/pasemi/powermac platforms where/when the pdn is available. pcibios_root_bridge_prepare is called from subsys_initcall() which is executed after core_initcall() so the code flow does not change. At the mean while, the EEH device is created when pdn is populated, meaning pdn and EEH device have same life cycle. In turn, we needn't call eeh_dev_init() to create EEH device explicitly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pci: Update bridge windows on PCI plug	Gavin Shan	2016-06-21	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On the PCI plugging event, PCI slot's subordinate devices are scanned and their (IO and MMIO) resources are assigned. Platform dependent resources (PE#, IO/MMIO/DMA windows) are allocated or created on updating windows of the slot's upstream bridge. This updates the windows of the hot plugged slot's upstream bridge in pcibios_finish_adding_to_bus() so that the platform resources (PE#, IO/MMIO/DMA segments) are allocated or created accordingly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Dynamically release PE	Gavin Shan	2016-06-21	2	-0/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This supports releasing PEs dynamically. A reference count is introduced to PE representing number of PCI devices associated with the PE. The reference count is increased when PCI device joins the PE and decreased when PCI device leaves the PE in pnv_pci_release_device(). When the count becomes zero, the PE and its consumed resources are released. Note that the count is accessed concurrently. So a counter with "int" type is enough here. In order to release the sources consumed by the PE, couple of helper functions are introduced as below: * pnv_pci_ioda1_unset_window() - Unset IODA1 DMA32 window * pnv_pci_ioda1_release_dma_pe() - Release IODA1 DMA32 segments * pnv_pci_ioda2_release_dma_pe() - Release IODA2 DMA resource * pnv_ioda_release_pe_seg() - Unmap IO/M32/M64 segments Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Make pnv_ioda_deconfigure_pe() visible	Gavin Shan	2016-06-21	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	pnv_ioda_deconfigure_pe() is visible only when CONFIG_PCI_IOV is enabled. The function will be used to tear down PE's associated mapping in PCI hotplug path that doesn't depend on CONFIG_PCI_IOV. This makes pnv_ioda_deconfigure_pe() visible and not depend on CONFIG_PCI_IOV. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Extend PCI bridge resources	Gavin Shan	2016-06-21	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PCI slots are associated with root port or downstream ports of the PCIe switch connected to root port. When adapter is hot added to the PCI slot, it usually requests more IO or memory resource from the directly connected parent bridge (port) and update the bridge's windows accordingly. The resource windows of upstream bridges can't be updated automatically. It possibly leads to unbalanced resource across the bridges: The window of downstream bridge is overruning that of upstream bridge. The IO or MMIO path won't work. This resolves the above issue by extending bridge windows of root port and upstream port of the PCIe switch connected to the root port to PHB's windows. The windows of root port and bridge behind that are extended to the PHB's windows to accomodate the PCI hotplug happening in future. The PHB's 64KB 32-bits MSI region is included in bridge's M32 windows (in hardware) though it's excluded in the corresponding resource, as the bridge's M32 windows have 1MB as their minimal alignment. We observed EEH error during system boot when the MSI region is included in bridge's M32 window. This excludes top 1MB (including 64KB 32-bits MSI region) region from bridge's M32 windows when extending them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Setup PE for root bus	Gavin Shan	2016-06-21	2	-10/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no parent bridge for root bus, meaning pcibios_setup_bridge() isn't invoked for root bus. The PE for root bus is the ancestor of other PEs in PELTV. It means we need PE for root bus populated before all others. This populates the PE for root bus in pcibios_setup_bridge() path if it's not populated yet. The PE number next to the reserved one is used as the PE# to avoid holes in continuous M64 space. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Create PEs in pcibios_setup_bridge()	Gavin Shan	2016-06-21	1	-115/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the PEs and their associated resources are assigned in ppc_md.pcibios_fixup() except those used by SRIOV VFs. The function is called for once after PCI probing and resources assignment is completed. So it's obviously not hotplug friendly. This creates PEs dynamically in pcibios_setup_bridge() that is called for the event during system bootup and PCI hotplug: updating PCI bridge's windows after resource assignment/reassignment are done. In partial hotplug case, not all PCI devices included to one particular PE are unplugged and plugged again, we just need unbinding/binding the hot added PCI devices with the corresponding PE without creating new one. The change is applied to IODA1 and IODA2 PHBs only. The behaviour on NPU PHBs aren't changed. There are no PCI bridges on NPU PHBs, meaning pcibios_setup_bridge() won't be invoked there. We have to use old path (pnv_pci_ioda_fixup()) to setup PEs on NPU PHBs. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Allocate PE# in reverse order	Gavin Shan	2016-06-21	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PE number for one particular PE can be allocated dynamically or reserved according to the consumed M64 (64-bits prefetchable) segments of the PE. The M64 segment can't be remapped to arbitrary PE, meaning the PE number is determined according to the index of the consumed M64 segment. As below figure shows, M64 resource grows from low to high end, meaning the PE (number) reserved according to M64 segment grows from low to high end as well, so does the dynamically allocated PE number. It will lead to conflict: PE number (M64 segment) reserved by dynamic allocation is required by hot added PCI adapter at later point. It fails the PCI hotplug because of the PE number can't be reserved based on the index of the consumed M64 segment. +---+---+---+---+---+--------------------------------+-----+ \| 0 \| 1 \| 2 \| 3 \| 4 \| ....... \| 255 \| +---+---+---+---+---+--------------------------------+-----+ PE number for dynamic allocation -----------------> PE number reserved for M64 segment -----------------> To resolve above conflicts, this forces the PE number to be allocated dynamically in reverse order. With this patch applied, the PE numbers are reserved in ascending order, but allocated dynamically in reverse order. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>