summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cardsAndrew Donnellan2016-07-143-18/+251
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new API, cxl_check_and_switch_mode() to allow for switching of bi-modal CAPI cards, such as the Mellanox CX-4 network card. When a driver requests to switch a card to CAPI mode, use PCI hotplug infrastructure to remove all PCI devices underneath the slot. We then write an updated mode control register to the CAPI VSEC, hot reset the card, and reprobe the card. As the card may present a different set of PCI devices after the mode switch, use the infrastructure provided by the pnv_php driver and the OPAL PCI slot management facilities to ensure that: * the old devices are removed from both the OPAL and Linux device trees * the new devices are probed by OPAL and added to the OPAL device tree * the new devices are added to the Linux device tree and probed through the regular PCI device probe path As such, introduce a new option, CONFIG_CXL_BIMODAL, with a dependency on the pnv_php driver. Refactor existing code that touches the mode control register in the regular single mode case into a new function, setup_cxl_protocol_area(). Co-authored-by: Ian Munsie <imunsie@au1.ibm.com> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power stateAndrew Donnellan2016-07-141-1/+1
| | | | | | | | | | | | | | When calling pnv_php_set_slot_power_state() with state == OPAL_PCI_SLOT_OFFLINE, remove devices from the device tree as if we're dealing with OPAL_PCI_SLOT_POWER_OFF. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* PCI/hotplug: pnv_php: export symbols and move struct types needed by cxlAndrew Donnellan2016-07-142-27/+33
| | | | | | | | | | | | | | | | | The cxl driver will use infrastructure from pnv_php to handle device tree updates when switching bi-modal CAPI cards into CAPI mode. To enable this, export pnv_php_find_slot() and pnv_php_set_slot_power_state(), and add corresponding declarations, as well as the definition of struct pnv_php_slot, to asm/pnv-pci.h. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Workaround PE=0 hardware limitation in Mellanox CX4Ian Munsie2016-07-143-1/+4
| | | | | | | | | | | | | | | | | | | | | | | The CX4 card cannot cope with a context with PE=0 due to a hardware limitation, resulting in: [ 34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939 [ 34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting Since the kernel API allocates a default context very early during device init that will almost certainly get Process Element ID 0 there is no easy way for us to extend the API to allow the Mellanox to inform us of this limitation ahead of time. Instead, work around the issue by extending the XSL structure to include a minimum PE to allocate. Although the bug is not in the XSL, it is the easiest place to work around this limitation given that the CX4 is currently the only card that uses an XSL. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Add support for interrupts on the Mellanox CX4Ian Munsie2016-07-148-0/+202
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where interrupts are routed from the networking hardware to the XSL using the MSIX table, and from there will be transformed back into an MSIX interrupt using the cxl style interrupts (i.e. using IVTE entries and ranges to map a PE and AFU interrupt number to an MSIX address). We want to hide the implementation details of cxl interrupts as much as possible. To this end, we use a special version of the MSI setup & teardown routines in the PHB while in cxl mode to allocate the cxl interrupts and configure the IVTE entries in the process element. This function does not configure the MSIX table - the CX4 card uses a custom format in that table and it would not be appropriate to fill that out in generic code. The rest of the functionality is similar to the "Full MSI-X mode" described in the CAIA, and this could be easily extended to support other adapters that use that mode in the future. The interrupts will be associated with the default context. If the maximum number of interrupts per context has been limited (e.g. by the mlx5 driver), it will automatically allocate additional kernel contexts to associate extra interrupts as required. These contexts will be started using the same WED that was used to start the default context. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Add preliminary workaround for CX4 interrupt limitationIan Munsie2016-07-146-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | The Mellanox CX4 has a hardware limitation where only 4 bits of the AFU interrupt number can be passed to the XSL when sending an interrupt, limiting it to only 15 interrupts per context (AFU interrupt number 0 is invalid). In order to overcome this, we will allocate additional contexts linked to the default context as extra address space for the extra interrupts - this will be implemented in the next patch. This patch adds the preliminary support to allow this, by way of adding a linked list in the context structure that we use to keep track of the contexts dedicated to interrupts, and an API to simultaneously iterate over the related context structures, AFU interrupt numbers and hardware interrupt numbers. The point of using a single API to iterate these is to hide some of the details of the iteration from external code, and to reduce the number of APIs that need to be exported via base.c to allow built in code to call. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Add kernel APIs to get & set the max irqs per contextIan Munsie2016-07-142-0/+37
| | | | | | | | | | | | These APIs will be used by the Mellanox CX4 support. While they function standalone to configure existing behaviour, their primary purpose is to allow the Mellanox driver to inform the cxl driver of a hardware limitation, which will be used in a future patch. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Add support for using the kernel API with a real PHBIan Munsie2016-07-143-2/+22
| | | | | | | | | | | | | | | | | This hooks up support for using the kernel API with a real PHB. After the AFU initialisation has completed it calls into the PHB code to pass it the AFU that will be used by other peer physical functions on the adapter. The cxl_pci_to_afu API is extended to work with peer PCI devices, retrieving the peer AFU from the PHB. This API may also now return an error if it is called on a PCI device that is not associated with either a cxl vPHB or a peer PCI device to an AFU, and this error is propagated down. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Add support for the cxl kernel api on the real phbIan Munsie2016-07-144-1/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the peer model of the cxl kernel api to the PowerNV PHB, in which physical function 0 represents the cxl function on the card (an XSL in the case of the CX4), which other physical functions will use for memory access and interrupt services. It is referred to as the peer model as these functions are peers of one another, as opposed to the Virtual PHB model which forms a hierarchy. This patch exports APIs to enable the peer mode, check if a PCI device is attached to a PHB in this mode, and to set and get the peer AFU for this mode. The cxl driver will enable this mode for supported cards by calling pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note that this mode is enabled, and switch out it's controller_ops for the cxl version. The cxl version of the controller_ops struct implements it's own versions of the enable_device_hook and release_device to handle refcounting on the peer AFU and to allocate a default context for the device. Once enabled, the cxl kernel API may not be disabled on a PHB. Currently there is no safe way to disable cxl mode short of a reboot, so until that changes there is no reason to support the disable path. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Do not create vPHB if there are no AFU configuration recordsIan Munsie2016-07-142-0/+14
| | | | | | | | | | | | | | | | | | | | | | The vPHB model of the cxl kernel API is a hierarchy where the AFU is represented by the vPHB, and it's AFU configuration records are exposed as functions under that vPHB. If there are no AFU configuration records we will create a vPHB with nothing under it, which is a waste of resources and will opt us into EEH handling despite not having anything special to handle. This also does not make sense for cards using the peer model of the cxl kernel API, where the other functions of the device are exposed via additional peer physical functions rather than AFU configuration records. This model will also not work with the existing EEH handling in the cxl driver, as that is designed around the vPHB model. Skip creating the vPHB for AFUs without any AFU configuration records, and opt out of EEH handling for them. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Allow a default context to be associated with an external pci_devIan Munsie2016-07-147-28/+97
| | | | | | | | | | | | | | | | | The cxl kernel API has a concept of a default context associated with each PCI device under the virtual PHB. The Mellanox CX4 will also use the cxl kernel API, but it does not use a virtual PHB - rather, the AFU appears as a physical function as a peer to the networking functions. In order to allow the kernel API to work with those networking functions, we will need to associate a default context with them as well. To this end, refactor the corresponding code to do this in vphb.c and export it so that it can be called from the PHB code. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Move cxl_afu_get / cxl_afu_put to baseIan Munsie2016-07-143-12/+17
| | | | | | | | | | | | | | | | The Mellanox CX4 uses a model where the AFU is one physical function of the device, and is used by other peer physical functions of the same device. This will require those other devices to grab a reference on the AFU when they are initialised to make sure that it does not go away during their lifetime. Move the AFU refcount functions to base.c so they can be called from the PHB code. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Enable bus mastering for devices using CAPP DMA modeIan Munsie2016-07-141-0/+3
| | | | | | | | | | | | Devices that use CAPP DMA mode (such as the Mellanox CX4) require bus master to be enabled in order for the CAPI traffic to flow. This should be harmless to enable for other cxl devices, so unconditionally enable it in the adapter init flow. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Add cxl_slot_is_supported APIIan Munsie2016-07-142-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | This extends the check that the adapter is in a CAPI capable slot so that it may be called by external users in the kernel API. This will be used by the upcoming Mellanox CX4 support, which needs to know ahead of time if the card can be switched to cxl mode so that it can leave it in PCI mode if it is not. This API takes a parameter to check if CAPP DMA mode is supported, which it currently only allows on P8NVL systems, since that mode currently has issues accessing memory < 4GB on P8, and we cannot realistically avoid that. This API does not currently check if a CAPP unit is available (i.e. not already assigned to another PHB) on P8. Doing so would be racy since it is assigned on a first come first serve basis, and so long as CAPP DMA mode is not supported on P8 we don't need this, since the only anticipated user of this API requires CAPP DMA mode. Cc: Philippe Bergheaud <felix@linux.vnet.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Split cxl code out into a separate fileIan Munsie2016-07-144-156/+173
| | | | | | | | | | | | The support for using the Mellanox CX4 in cxl mode will require additions to the PHB code. In preparation for this, move the existing cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things more organised. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Use for_each_compatible_node() macroWei Yongjun2016-07-141-3/+2Star
| | | | | | | | | | | Use for_each_compatible_node() macro instead of open coding it. Generated by Coccinelle. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add a test for PROT_SAOMichael Ellerman2016-07-144-1/+51
| | | | | | | | | | | | PROT_SAO is a powerpc-specific flag to mmap(), and we rely on arch specific logic to allow it to be passed to mmap(). Add a small test to ensure mmap() accepts PROT_SAO. We don't have a good way to test that it actually causes the mapping to be created with the right flags, so for now we just touch the mapping so it's faulted in. In future we might be able to do something better. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xmon: Dump ISA 2.07 SPRsMichael Ellerman2016-07-142-0/+45
| | | | Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xmon: Dump ISA 2.06 SPRsMichael Ellerman2016-07-141-0/+31
| | | | Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xmon: Adjust spacing of existing SPRs to make room for moreMichael Ellerman2016-07-141-5/+6
| | | | | | Purely to make it pleasing to the eye. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xmon: Move static regno into its only userMichael Ellerman2016-07-141-1/+1
| | | | Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xmon: Remove unused externsMichael Ellerman2016-07-141-5/+0Star
| | | | | | None of these are used, or have been since we merged ppc & ppc64. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/crash: Rearrange loop condition to avoid out of bounds array accessSuraj Jitindar Singh2016-07-141-1/+1
| | | | | | | | | | | | | | | The array crash_shutdown_handles[] has size CRASH_HANDLER_MAX, thus when we loop over the elements of the list we check crash_shutdown_handles[i] && i < CRASH_HANDLER_MAX. However this means that when we increment i to CRASH_HANDLER_MAX we will perform an out of bound array access checking the first condition before exiting on the second condition. To avoid the out of bounds access, simply reorder the loop conditions. Fixes: 1d1451655bad ("powerpc: Add array bounds checking to crash_shutdown_handlers") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Don't test for machine type in smp_setup_cpu_maps()Benjamin Herrenschmidt2016-07-131-1/+1
| | | | | | | The subsequent test for RTAS along with the LPAR test are sufficient Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/rtas: Don't test for machine type in rtas_initialize()Benjamin Herrenschmidt2016-07-131-1/+1
| | | | | | | | The test is unnecessary, the FW_FEATURE_LPAR is sufficient as there exist no other LPAR type that has RTAS. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/85xx/mpc85xx_rdb: Don't use the flat device-tree after bootBenjamin Herrenschmidt2016-07-131-2/+1Star
| | | | | Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/85xx/mpc85xx_ds: Don't use the flat device-tree after bootBenjamin Herrenschmidt2016-07-131-3/+1Star
| | | | | Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/85xx/ge_imp3a: Don't use the flat device-tree after bootBenjamin Herrenschmidt2016-07-131-2/+1Star
| | | | | | | | ge_imp3a_pic_init() is called way beyond the unflattening of the tree, it shouldn't be using of_flat_dt_* Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/cell: Don't use flat device-tree after bootBenjamin Herrenschmidt2016-07-131-2/+1Star
| | | | | | | | Some bit of SPU code was using the FDT rather than the expanded device-tree. Fix it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Move epapr_paravirt_early_init() to early_init_devtree()Benjamin Herrenschmidt2016-07-113-6/+2Star
| | | | | | | | | | The function is called by both 32-bit and 64-bit early setup right after early_init_devtree(). All it does is run yet another early DT parser which is precisely what early_init_devtree() is about, so move it in there. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Add comment explaining the purpose of setup_kdump_trampoline()Benjamin Herrenschmidt2016-07-111-2/+4
| | | | | | | | | | | | Anything in early_setup() needs to be justified to be there, in this case, we need the trampolines before we can take exceptions and thus before we turn on the MMU. Also remove a pretty meaningless and misplaced debug message Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fix comment formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Update obsolete comments in setup_32.c about entry conditionsBenjamin Herrenschmidt2016-07-111-3/+5
| | | | | | | | | | | | early_init() is called in-place before kernel relocation and using whatever MMU setup exists at the point the kernel is entered. machine_init() is called after relocation and after some initial mapping of PAGE_OFFSET has been established (typically using BATs on 6xx/7xx/7xxx processors or some form of bolted TLB on others). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Ignore CAPI adapters misplaced in switched slotsPhilippe Bergheaud2016-07-081-0/+29
| | | | | | | | | | | One should not attempt to switch a PHB into CAPI mode if there is a switch between the PHB and the adapter. This patch modifies the cxl driver to ignore CAPI adapters misplaced in switched slots. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: make base more explicitly non-modularPaul Gortmaker2016-07-081-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | The Kconfig/Makefile currently controlling compilation of this code is: drivers/misc/cxl/Kconfig:config CXL_BASE drivers/misc/cxl/Kconfig: bool drivers/misc/cxl/Makefile:obj-$(CONFIG_CXL_BASE) += base.o ...meaning that it currently is not being built as a module by anyone. Lets convert the one module_init into device_initcall so that when reading the driver it more clear that it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We don't replace module.h with init.h since the file is doing other modular stuff (module_get/put) even though it is built-in. Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Refine slice error debug messagesPhilippe Bergheaud2016-07-084-8/+57
| | | | | | | | | | | | | | | | | | | The PSL Slice Error Register (PSL_SERR_An) reports implementation dependent AFU errors, in the form of a bitmap. The PSL_SERR_An register content is printed in the form of hex dump debug message. This patch decodes the PSL_ERR_An register contents, and prints a specific error message for each possible error bit. It also dumps the secondary registers AFU_ERR_An and PSL_DSISR_An, that may contain extra debug information. This patch also removes the large WARN message that used to report the cxl slice error interrupt, and replaces it by a short informative message, that draws attention to AFU implementation errors. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Fix NULL pointer dereference on kernel contexts with no AFU interruptsIan Munsie2016-07-082-3/+2Star
| | | | | | | | | | | | | | If a kernel context is initialised and does not have any AFU interrupts allocated it will cause a NULL pointer dereference when the context is detached since the irq_names list will not have been initialised. Move the initialisation of the irq_names list into the cxl_context_init routine so that it will be valid for the entire lifetime of the context and will not cause a NULL pointer dereference. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Workaround XSL bug that does not clear the RA bit after a resetIan Munsie2016-07-081-0/+10
| | | | | | | | | | | | | An issue was noted in our debug logs where the XSL would leave the RA bit asserted after an AFU reset operation, which would effectively prevent further AFU reset operations from working. Workaround the issue by clearing the RA bit with an MMIO write if it is still asserted after any AFU control operation. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Fix bug where AFU disable operation had no effectIan Munsie2016-07-083-7/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AFU disable operation has a bug where it will not clear the enable bit and therefore will have no effect. To date this has likely been masked by fact that we perform an AFU reset before the disable, which also has the effect of clearing the enable bit, making the following disable operation effectively a noop on most hardware. This patch modifies the afu_control function to take a parameter to clear from the AFU control register so that the disable operation can clear the appropriate bit. This bug was uncovered on the Mellanox CX4, which uses an XSL rather than a PSL. On the XSL the reset operation will not complete while the AFU is enabled, meaning the enable bit was still set at the start of the disable and as a result this bug was hit and the disable also timed out. Because of this difference in behaviour between the PSL and XSL, this patch now makes the reset dependent on the card using a PSL to avoid waiting for a timeout on the XSL. It is entirely possible that we may be able to drop the reset altogether if it turns out we only ever needed it due to this bug - however I am not willing to drop it without further regression testing and have added comments to the code explaining the background. This also fixes a small issue where the AFU_Cntl register was read outside of the lock that protects it. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Fix allocating a minimum of 2 pages for the SPAIan Munsie2016-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | The Scheduled Process Area is allocated dynamically with enough pages to fit at least as many processes as the AFU descriptor indicated. Since the calculation is non-trivial, it does this by calculating how many processes could fit in an allocation of a given order, and increasing that order until it can fit enough processes or hits the maximum supported size. Currently, it will start this search using a SPA of 2 pages instead of 1. This can waste a page of memory if the AFU's maximum number of supported processes was small enough to fit in one page. Fix the algorithm to start the search at 1 page. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Fix allowing bogus AFU descriptors with 0 maximum processesIan Munsie2016-07-081-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the AFU descriptor of an AFU directed AFU indicates that it supports 0 maximum processes, we will accept that value and attempt to use it. The SPA will still be allocated (with 2 pages due to another minor bug and room for 958 processes), and when a context is allocated we will pass the value of 0 to idr_alloc as the maximum. However, idr_alloc will treat that as meaning no maximum and will allocate a context number and we return a valid context. Conceivably, this could lead to a buffer overflow of the SPA if more than 958 contexts were allocated, however this is mitigated by the fact that there are no known AFUs in the wild with a bogus AFU descriptor like this, and that only the root user is allowed to flash an AFU image to a card. Add a check when validating the AFU descriptor to reject any with 0 maximum processes. We do still allow a dedicated process only AFU to indicate that it supports 0 contexts even though that is forbidden in the architecture, as in that case we ignore the value and use 1 instead. This is just on the off-chance that such a dedicated process AFU may exist (not that I am aware of any), since their developers are less likely to have cared about this value at all. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/configs: Remove old symbols from defconfigsAndrew Donnellan2016-07-0896-253/+84Star
| | | | | | | | | | | | | | | | | | | | | | | | | | Update defconfigs to remove old symbols and comments referencing old symbols. Dropped: * AVERAGE * INET_LRO * EXT3_DEFAULTS_TO_ORDERED * EXT3_FS_XATTR * I2O * INFINIBAND_AMSO1100 * INFINIBAND_EHCA * IP1000 Replaced: * BLK_DEV_XIP -> BLK_DEV_RAM_DAX * CLK_PPC_CORENET -> CLK_QORIQ * EXT2_FS_XIP -> FS_DAX * EXT3_FS* -> EXT4_FS* Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Fix typo in comment reference to CONFIG_TRACE_IRQFLAGSAndrew Donnellan2016-07-081-1/+1
| | | | | Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/ps3: Fix typo in comment reference to CONFIG_PS3_REPOSITORY_WRITEAndrew Donnellan2016-07-081-1/+1
| | | | | Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/eeh: Fix pr_debug()s in eeh_cache.cAndrew Donnellan2016-07-081-4/+4
| | | | | | | | | | | | | eeh_cache.c doesn't build cleanly with -DDEBUG when CONFIG_PHYS_ADDR_T_64BIT is set, as a couple of pr_debug()s use "%lx" for resource_size_t parameters. Use "%pap" instead, as it's the correct format specifier for types deriving from phys_addr_t. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/opal: Wake up kopald polling thread before waiting for eventsBenjamin Herrenschmidt2016-07-083-2/+17
| | | | | | | | | | | | | | | | On some environments (prototype machines, some simulators, etc...) there is no functional interrupt source to signal completion, so we rely on the fairly slow OPAL heartbeat. In a number of cases, the calls complete very quickly or even immediately. We've observed that it helps a lot to wakeup the OPAL heartbeat thread before waiting for event in those cases, it will call OPAL immediately to collect completions for anything that finished fast enough. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-By: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Add MTD_BLOCK to powernv_defconfigMichael Neuling2016-07-081-0/+1
| | | | | | | | This is so we can use the powernv_flash mtd driver as an block device. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: start rtasd before PCI probingGreg Kurz2016-07-081-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A strange behaviour is observed when comparing PCI hotplug in QEMU, between x86 and pseries. If you consider the following steps: - start a VM - add a PCI device via the QEMU monitor before the rtasd has started (for example starting the VM in paused state, or hotplug during FW or boot loader) - resume the VM execution The x86 kernel detects the PCI device, but the pseries one does not. This happens because the rtasd kernel worker is currently started under device_initcall, while PCI probing happens earlier under subsys_initcall. As a consequence, if we have a pending RTAS event at boot time, a message is printed and the event is dropped. This patch moves all the initialization of rtasd to arch_initcall, which is run before subsys_call: this way, logging_enabled is true when the RTAS event pops up and it is not lost anymore. The proc fs bits stay at device_initcall because they cannot be run before fs_initcall. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pci: Assign fixed PHB number based on device-tree propertiesGuilherme G. Piccoli2016-07-071-3/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The domain/PHB field of PCI addresses has its value obtained from a global variable, incremented each time a new domain (represented by struct pci_controller) is added on the system. The domain addition process happens during boot or due to PHB hotplug add. As recent kernels are using predictable naming for network interfaces, the network stack is more tied to PCI naming. This can be a problem in hotplug scenarios, because PCI addresses will change if devices are removed and then re-added. This situation seems unusual, but it can happen if a user wants to replace a NIC without rebooting the machine, for example. This patch changes the way PCI domain values are generated: now, we use device-tree properties to assign fixed PHB numbers to PCI addresses when available (meaning pSeries and PowerNV cases). We also use a bitmap to allow dynamic PHB numbering when device-tree properties are not used. This bitmap keeps track of used PHB numbers and if a PHB is released (by hotplug operations for example), it allows the reuse of this PHB number, avoiding PCI address to change in case of device remove and re-add soon after. No functional changes were introduced. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> [mpe: Drop unnecessary machine_is(pseries) test] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/kernel: Drop unused extern for current_setMichael Ellerman2016-07-071-3/+0Star
| | | | Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pci: Fix build with PCI_IOV=y and EEH=nMichael Ellerman2016-07-071-4/+5
| | | | | | | | | | | | | Despite attempting to fix this in commit fb36e9073693 ("powerpc/pci: Fix SRIOV not building without EEH enabled"), the build is still broken when PCI_IOV=y and EEH=n (eg. g5_defconfig with PCI_IOV=y): arch/powerpc/kernel/pci_dn.c: In function ‘remove_dev_pci_data’: arch/powerpc/kernel/pci_dn.c:230:18: error: unused variable ‘edev’ Incorporate Ben's idea of using __maybe_unused to avoid so many #ifdefs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>