summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kvm
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'staging-4.12-rc1' of ↵Linus Torvalds2017-05-061-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO updates from Greg KH: "Here is the big staging tree update for 4.12-rc1. It's a big one, adding about 350k new lines of crap^Wcode, mostly all in a big dump of media drivers from Intel. But there's other new drivers in here as well, yet-another-wifi driver, new IIO drivers, and a new crypto accelerator. We also deleted a bunch of stuff, mostly in patch cleanups, but also the Android ION code has shrunk a lot, and the Android low memory killer driver was finally deleted, much to the celebration of the -mm developers. All of these have been in linux-next with a few build issues that will show up when you merge to your tree" Merge conflicts in the new rtl8723bs driver (due to the wifi changes this merge window) handled as per linux-next, courtesy of Stephen Rothwell. * tag 'staging-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1182 commits) staging: fsl-mc/dpio: add cpu <--> LE conversion for dpaa2_fd staging: ks7010: remove line continuations in quoted strings staging: vt6656: use tabs instead of spaces staging: android: ion: Fix unnecessary initialization of static variable staging: media: atomisp: fix range checking on clk_num staging: media: atomisp: fix misspelled word in comment staging: media: atomisp: kmap() can't fail staging: atomisp: remove #ifdef for runtime PM functions staging: atomisp: satm include directory is gone atomisp: remove some more unused files atomisp: remove hmm_load/store/clear indirections atomisp: kill off mmgr_free atomisp: clean up the hmm init/cleanup indirections atomisp: handle allocation calls before init in the hmm layer staging: fsl-dpaa2/eth: Add maintainer for Ethernet driver staging: fsl-dpaa2/eth: Add TODO file staging: fsl-dpaa2/eth: Add trace points staging: fsl-dpaa2/eth: Add driver specific stats staging: fsl-dpaa2/eth: Add ethtool support staging: fsl-dpaa2/eth: Add Freescale DPAA2 Ethernet driver ...
| * cma: Store a name in the cma structureLaura Abbott2017-04-181-1/+2
| | | | | | | | | | | | | | | | | | Frameworks that may want to enumerate CMA heaps (e.g. Ion) will find it useful to have an explicit name attached to each region. Store the name in each CMA structure. Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge tag 'powerpc-4.12-1' of ↵Linus Torvalds2017-05-056-34/+41
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Larger virtual address space on 64-bit server CPUs. By default we use a 128TB virtual address space, but a process can request access to the full 512TB by passing a hint to mmap(). - Support for the new Power9 "XIVE" interrupt controller. - TLB flushing optimisations for the radix MMU on Power9. - Support for CAPI cards on Power9, using the "Coherent Accelerator Interface Architecture 2.0". - The ability to configure the mmap randomisation limits at build and runtime. - Several small fixes and cleanups to the kprobes code, as well as support for KPROBES_ON_FTRACE. - Major improvements to handling of system reset interrupts, correctly treating them as NMIs, giving them a dedicated stack and using a new hypervisor call to trigger them, all of which should aid debugging and robustness. - Many fixes and other minor enhancements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi" * tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it powerpc/powernv: Fix TCE kill on NVLink2 powerpc/mm/radix: Drop support for CPUs without lockless tlbie powerpc/book3s/mce: Move add_taint() later in virtual mode powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body powerpc/smp: Document irq enable/disable after migrating IRQs powerpc/mpc52xx: Don't select user-visible RTAS_PROC powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset() powerpc/eeh: Clean up and document event handling functions powerpc/eeh: Avoid use after free in eeh_handle_special_event() cxl: Mask slice error interrupts after first occurrence cxl: Route eeh events to all drivers in cxl_pci_error_detected() cxl: Force context lock during EEH flow powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST powerpc/xmon: Teach xmon oops about radix vectors powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids powerpc/pseries: Enable VFIO powerpc/powernv: Fix iommu table size calculation hook for small tables powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc powerpc: Add arch/powerpc/tools directory ...
| * Merge branch 'topic/xive' (early part) into nextMichael Ellerman2017-04-125-29/+32
| |\ | | | | | | | | | | | | | | | This merges the arch part of the XIVE support, leaving the final commit with the KVM specific pieces dangling on the branch for Paul to merge via the kvm-ppc tree.
| | * powerpc: Consolidate variants of real-mode MMIOsBenjamin Herrenschmidt2017-04-102-15/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have all sort of variants of MMIO accessors for the real mode instructions. This creates a clean set of accessors based on Linux normal naming conventions, replacing all occurrences of the old ones in the tree. I have purposefully removed the "out/in" variants in favor of only including __raw variants. Any code using these is already pretty much hand tuned to operate in a very specific environment. I've fixed up the 2 users (only one of them actually needed a barrier in the first place). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * powerpc/kvm: Make kvmppc_xics_create_icp staticBenjamin Herrenschmidt2017-04-101-1/+1
| | | | | | | | | | | | | | | | | | | | | It's only used within the same file it's defined Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * powerpc/kvm: Massage order of #includeBenjamin Herrenschmidt2017-04-102-13/+13
| | | | | | | | | | | | | | | | | | | | | We traditionally have linux/ before asm/ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * powerpc/xive: Native exploitation of the XIVE interrupt controllerBenjamin Herrenschmidt2017-04-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The XIVE interrupt controller is the new interrupt controller found in POWER9. It supports advanced virtualization capabilities among other things. Currently we use a set of firmware calls that simulate the old "XICS" interrupt controller but this is fairly inefficient. This adds the framework for using XIVE along with a native backend which OPAL for configuration. Later, a backend allowing the use in a KVM or PowerVM guest will also be provided. This disables some fast path for interrupts in KVM when XIVE is enabled as these rely on the firmware emulation code which is no longer available when the XIVE is used natively by Linux. A latter patch will make KVM also directly exploit the XIVE, thus recovering the lost performance (and more). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings, tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV, fix build errors when SMP=n, fold in fixes from Ben: Don't call cpu_online() on an invalid CPU number Fix irq target selection returning out of bounds cpu# Extra sanity checks on cpu numbers ] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc: Remove unnecessary includes of asm/debug.hMichael Ellerman2017-04-101-1/+0Star
| | | | | | | | | | | | | | | | | | | | | These files don't seem to have any need for asm/debug.h, now that all it includes are the debugger hooks and breakpoint definitions. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc: Create asm/debugfs.h and move powerpc_debugfs_root thereMichael Ellerman2017-04-101-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | powerpc_debugfs_root is the dentry representing the root of the "powerpc" directory tree in debugfs. Currently it sits in asm/debug.h, a long with some other things that have "debug" in the name, but are otherwise unrelated. Pull it out into a separate header, which also includes linux/debugfs.h, and convert all the users to include debugfs.h instead of debug.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm/hash: Support 68 bit VAAneesh Kumar K.V2017-03-311-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inorder to support large effective address range (512TB), we want to increase the virtual address bits to 68. But we do have platforms like p4 and p5 that can only do 65 bit VA. We support those platforms by limiting context bits on them to 16. The protovsid -> vsid conversion is verified to work with both 65 and 68 bit va values. I also documented the restrictions in a table format as part of code comments. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm/hash: Abstract context id allocation for KVMMichael Ellerman2017-03-311-1/+1
| |/ | | | | | | | | | | | | | | | | | | KVM wants to be able to allocate an MMU context id, which it does currently by calling __init_new_context(). We're about to rework that code, so provide a wrapper for KVM so it can not worry about the details. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* / KVM: PPC: Book3S HV: Check for kmalloc errors in ioctlDan Carpenter2017-04-061-0/+4
|/ | | | | | | | | kzalloc() won't actually fail because sizeof(*resize) is small, but static checkers complain. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* power/mm: update pte_write and pte_wrprotect to handle savedwriteAneesh Kumar K.V2017-03-102-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We use pte_write() to check whethwer the pte entry is writable. This is mostly used to later mark the pte read only if it is writable. The other use of pte_write() is to check whether the pte_entry is writable so that hardware page table entry can be marked accordingly. This is used in kvm where we look at qemu page table entry and update hardware hash page table for the guest with correct write enable bit. With the above, for the first usage we should also check the savedwrite bit so that we can correctly clear the savedwite bit. For the later, we add a new variant __pte_write(). With this we can revert write_protect_page part of 595cd8f256d2 ("mm/ksm: handle protnone saved writes when making page write protect"). But I left it as it is as an example code for savedwrite check. Fixes: c137a2757b886 ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write") Link: http://lkml.kernel.org/r/1488203787-17849-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-03-042-6/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more KVM updates from Radim Krčmář: "Second batch of KVM changes for the 4.11 merge window: PPC: - correct assumption about ASDR on POWER9 - fix MMIO emulation on POWER9 x86: - add a simple test for ioperm - cleanup TSS (going through KVM tree as the whole undertaking was caused by VMX's use of TSS) - fix nVMX interrupt delivery - fix some performance counters in the guest ... and two cleanup patches" * tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: Fix pending events injection x86/kvm/vmx: remove unused variable in segment_base() selftests/x86: Add a basic selftest for ioperm x86/asm: Tidy up TSS limit code kvm: convert kvm.users_count from atomic_t to refcount_t KVM: x86: never specify a sample period for virtualized in_tx_cp counters KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9 KVM: PPC: Book3S HV: Fix software walk of guest process page tables
| * KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9Paul Mackerras2017-03-011-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | In HPT mode on POWER9, the ASDR register is supposed to record segment information for hypervisor page faults. It turns out that POWER9 DD1 does not record the page size information in the ASDR for faults in guest real mode. We have the necessary information in memory already, so by moving the checks for real mode that already existed, we can use the in-memory copy. Since a load is likely to be faster than reading an SPR, we do this unconditionally (not just for POWER9 DD1). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Fix software walk of guest process page tablesPaul Mackerras2017-03-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes some bugs in the code that walks the guest's page tables. These bugs cause MMIO emulation to fail whenever the guest is in virtial mode (MMU on), leading to the guest hanging if it tried to access a virtio device. The first bug was that when reading the guest's process table, we were using the whole of arch->process_table, not just the field that contains the process table base address. The second bug was that the mask used when reading the process table entry to get the radix tree base address, RPDB_MASK, had the wrong value. Fixes: 9e04ba69beec ("KVM: PPC: Book3S HV: Add basic infrastructure for radix guests") Fixes: e99833448c5f ("powerpc/mm/radix: Add partition table format & callback") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | sched/headers: Prepare to use <linux/rcuupdate.h> instead of ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <linux/rculist.h> in <linux/sched.h> We don't actually need the full rculist.h header in sched.h anymore, we will be able to include the smaller rcupdate.h header instead. But first update code that relied on the implicit header inclusion. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched/headers: Prepare to remove the <linux/mm_types.h> dependency from ↵Ingo Molnar2017-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <linux/sched.h> Update code that relied on sched.h including various MM types for them. This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <linux/sched/stat.h> We are going to split <linux/sched/stat.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/stat.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched/headers: Prepare to move signal wakeup & sigpending methods from ↵Ingo Molnar2017-03-022-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | mm: cma_alloc: allow to specify GFP maskLucas Stach2017-02-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most users of this interface just want to use it with the default GFP_KERNEL flags, but for cases where DMA memory is allocated it may be called from a different context. No functional change yet, just passing through the flag to the underlying alloc_contig_range function. Link: http://lkml.kernel.org/r/20170127172328.18574-2-l.stach@pengutronix.de Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alexander Graf <agraf@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmfDave Jiang2017-02-251-2/+2
|/ | | | | | | | | | | | | | | | | | | | | | | ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [arnd@arndb.de: fix ARM build] Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini2017-02-204-5/+17
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD Paul Mackerras writes: "Please do a pull from my kvm-ppc-next branch to get some fixes which I would like to have in 4.11. There are four small commits there; two are fixes for potential host crashes in the new HPT resizing code, and the other two are changes to printks to make KVM on PPC a little less noisy."
| * KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for nowPaul Mackerras2017-02-182-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new HPT resizing code added in commit b5baa6877315 ("KVM: PPC: Book3S HV: KVM-HV HPT resizing implementation", 2016-12-20) doesn't have code to handle the new HPTE format which POWER9 uses. Thus it would be best not to advertise it to userspace on POWER9 systems until it works properly. Also, since resize_hpt_rehash_hpte() contains BUG_ON() calls that could be hit on POWER9, let's prevent it from being called on POWER9 for now. Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug messageThomas Huth2017-02-171-2/+2
| | | | | | | | | | | | | | | | | | | | The average user likely does not know what a "htab" or "LPID" is, and it's annoying that these messages are quickly filling the dmesg log when you're doing boot cycle tests, so let's turn it into a debug message instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S PR: Ratelimit copy data failure error messagesVipin K Parashar2017-02-172-2/+4
| | | | | | | | | | | | | | | | | | | | kvm_ppc_mmu_book3s_32/64 xlat() logs "KVM can't copy data" error upon failing to copy user data to kernel space. This floods kernel log once such fails occur in short time period. Ratelimit this error to avoid flooding kernel logs upon copy data failures. Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Prevent double-free on HPT resize commit pathDavid Gibson2017-02-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | resize_hpt_release(), called once the HPT resize of a KVM guest is completed (successfully or unsuccessfully) frees the state structure for the resize. It is currently not safe to call with a NULL pointer. However, one of the error paths in kvm_vm_ioctl_resize_hpt_commit() can invoke it with a NULL pointer. This will occur if userspace improperly invokes KVM_PPC_RESIZE_HPT_COMMIT without previously calling KVM_PPC_RESIZE_HPT_PREPARE, or if it calls COMMIT twice without an intervening PREPARE. To fix this potential crash bug - and maybe others like it, make it safe (and a no-op) to call resize_hpt_release() with a NULL resize pointer. Found by Dan Carpenter with a static checker. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: race-free exit from KVM_RUN without POSIX signalsPaolo Bonzini2017-02-171-1/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick" a VCPU out of KVM_RUN through a POSIX signal. A signal is attached to a dummy signal handler; by blocking the signal outside KVM_RUN and unblocking it inside, this possible race is closed: VCPU thread service thread -------------------------------------------------------------- check flag set flag raise signal (signal handler does nothing) KVM_RUN However, one issue with KVM_SET_SIGNAL_MASK is that it has to take tsk->sighand->siglock on every KVM_RUN. This lock is often on a remote NUMA node, because it is on the node of a thread's creator. Taking this lock can be very expensive if there are many userspace exits (as is the case for SMP Windows VMs without Hyper-V reference time counter). As an alternative, we can put the flag directly in kvm_run so that KVM can see it: VCPU thread service thread -------------------------------------------------------------- raise signal signal handler set run->immediate_exit KVM_RUN check run->immediate_exit Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: PPC: Book 3S: Fix error return in kvm_vm_ioctl_create_spapr_tce()Wei Yongjun2017-02-091-0/+1
| | | | | | | | Fix to return error code -ENOMEM from the memory alloc error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras2017-02-082-30/+14Star
|\ | | | | | | | | | | | | | | This merges in a fix which touches both PPC and KVM code, which was therefore put into a topic branch in the powerpc tree. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * powerpc/powernv: Remove separate entry for OPAL real mode callsBenjamin Herrenschmidt2017-02-072-30/+14Star
| | | | | | | | | | | | | | | | | | All entry points already read the MSR so they can easily do the right thing. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | KVM: PPC: Book3S HV: Advertise availablity of HPT resizing on KVM HVDavid Gibson2017-01-311-0/+3
| | | | | | | | | | | | | | | | This updates the KVM_CAP_SPAPR_RESIZE_HPT capability to advertise the presence of in-kernel HPT resizing on KVM HV. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: KVM-HV HPT resizing implementationDavid Gibson2017-01-311-1/+187
| | | | | | | | | | | | | | | | | | | | This adds the "guts" of the implementation for the HPT resizing PAPR extension. It has the code to allocate and clear a new HPT, rehash an existing HPT's entries into it, and accomplish the switchover for a KVM guest from the old HPT to the new one. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing implementationDavid Gibson2017-01-312-0/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a not yet working outline of the HPT resizing PAPR extension. Specifically it adds the necessary ioctl() functions, their basic steps, the work function which will handle preparation for the resize, and synchronization between these, the guest page fault path and guest HPT update path. The actual guts of the implementation isn't here yet, so for now the calls will always fail. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: Create kvmppc_unmap_hpte_helper()David Gibson2017-01-311-33/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The kvm_unmap_rmapp() function, called from certain MMU notifiers, is used to force all guest mappings of a particular host page to be set ABSENT, and removed from the reverse mappings. For HPT resizing, we will have some cases where we want to set just a single guest HPTE ABSENT and remove its reverse mappings. To prepare with this, we split out the logic from kvm_unmap_rmapp() to evict a single HPTE, moving it to a new helper function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT sizeDavid Gibson2017-01-312-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page table (HPT) that userspace expects a guest VM to have, and is also used to clear that HPT when necessary (e.g. guest reboot). At present, once the ioctl() is called for the first time, the HPT size can never be changed thereafter - it will be cleared but always sized as from the first call. With upcoming HPT resize implementation, we're going to need to allow userspace to resize the HPT at reset (to change it back to the default size if the guest changed it). So, we need to allow this ioctl() to change the HPT size. This patch also updates Documentation/virtual/kvm/api.txt to reflect the new behaviour. In fact the documentation was already slightly incorrect since 572abd5 "KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl" Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: Split HPT allocation from activationDavid Gibson2017-01-312-53/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT) and sets it up as the active page table for a VM. For the upcoming HPT resize implementation we're going to want to allocate HPTs separately from activating them. So, split the allocation itself out into kvmppc_allocate_hpt() and perform the activation with a new kvmppc_set_hpt() function. Likewise we split kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt() which unsets it as an active HPT, then frees it. We also move the logic to fall back to smaller HPT sizes if the first try fails into the single caller which used that behaviour, kvmppc_hv_setup_htab_rma(). This introduces a slight semantic change, in that previously if the initial attempt at CMA allocation failed, we would fall back to attempting smaller sizes with the page allocator. Now, we try first CMA, then the page allocator at each size. As far as I can tell this change should be harmless. To match, we make kvmppc_free_hpt() just free the actual HPT itself. The call to kvmppc_free_lpid() that was there, we move to the single caller. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: Don't store values derivable from HPT orderDavid Gibson2017-01-312-24/+22Star
| | | | | | | | | | | | | | | | | | | | | | Currently the kvm_hpt_info structure stores the hashed page table's order, and also the number of HPTEs it contains and a mask for its size. The last two can be easily derived from the order, so remove them and just calculate them as necessary with a couple of helper inlines. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: Gather HPT related variables into sub-structureDavid Gibson2017-01-313-78/+78
| | | | | | | | | | | | | | | | | | | | | | Currently, the powerpc kvm_arch structure contains a number of variables tracking the state of the guest's hashed page table (HPT) in KVM HV. This patch gathers them all together into a single kvm_hpt_info substructure. This makes life more convenient for the upcoming HPT resizing implementation. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: Rename kvm_alloc_hpt() for clarityDavid Gibson2017-01-312-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at all obvious from the name. In practice kvmppc_alloc_hpt() allocates an HPT by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate it with CMA only. To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma(). Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras2017-01-3111-121/+1203
|\| | | | | | | | | | | | | | | This merges in the POWER9 radix MMU host and guest support, which was put into a topic branch because it touches both powerpc and KVM code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * KVM: PPC: Book3S HV: Enable radix guest supportPaul Mackerras2017-01-314-27/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a few last pieces of the support for radix guests: * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and KVM_PPC_GET_RMMU_INFO ioctls for radix guests * On POWER9, allow secondary threads to be on/off-lined while guests are running. * Set up LPCR and the partition table entry for radix guests. * Don't allocate the rmap array in the kvm_memory_slot structure on radix. * Don't try to initialize the HPT for radix guests, since they don't have an HPT. * Take out the code that prevents the HV KVM module from initializing on radix hosts. At this stage, we only support radix guests if the host is running in radix mode, and only support HPT guests if the host is running in HPT mode. Thus a guest cannot switch from one mode to the other, which enables some simplifications. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * KVM: PPC: Book3S HV: Invalidate ERAT on guest entry/exit for POWER9 DD1Paul Mackerras2017-01-311-0/+6
| | | | | | | | | | | | | | | | | | On POWER9 DD1, we need to invalidate the ERAT (effective to real address translation cache) when changing the PIDR register, which we do as part of guest entry and exit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * KVM: PPC: Book3S HV: Allow guest exit path to have MMU onPaul Mackerras2017-01-313-17/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we allow LPCR[AIL] to be set for radix guests, then interrupts from the guest to the host can be delivered by the hardware with relocation on, and thus the code path starting at kvmppc_interrupt_hv can be executed in virtual mode (MMU on) for radix guests (previously it was only ever executed in real mode). Most of the code is indifferent to whether the MMU is on or off, but the calls to OPAL that use the real-mode OPAL entry code need to be switched to use the virtual-mode code instead. The affected calls are the calls to the OPAL XICS emulation functions in kvmppc_read_one_intr() and related functions. We test the MSR[IR] bit to detect whether we are in real or virtual mode, and call the opal_rm_* or opal_* function as appropriate. The other place that depends on the MMU being off is the optimization where the guest exit code jumps to the external interrupt vector or hypervisor doorbell interrupt vector, or returns to its caller (which is __kvmppc_vcore_entry). If the MMU is on and we are returning to the caller, then we don't need to use an rfid instruction since the MMU is already on; a simple blr suffices. If there is an external or hypervisor doorbell interrupt to handle, we branch to the relocation-on version of the interrupt vector. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movementPaul Mackerras2017-01-313-14/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With radix, the guest can do TLB invalidations itself using the tlbie (global) and tlbiel (local) TLB invalidation instructions. Linux guests use local TLB invalidations for translations that have only ever been accessed on one vcpu. However, that doesn't mean that the translations have only been accessed on one physical cpu (pcpu) since vcpus can move around from one pcpu to another. Thus a tlbiel might leave behind stale TLB entries on a pcpu where the vcpu previously ran, and if that task then moves back to that previous pcpu, it could see those stale TLB entries and thus access memory incorrectly. The usual symptom of this is random segfaults in userspace programs in the guest. To cope with this, we detect when a vcpu is about to start executing on a thread in a core that is a different core from the last time it executed. If that is the case, then we mark the core as needing a TLB flush and then send an interrupt to any thread in the core that is currently running a vcpu from the same guest. This will get those vcpus out of the guest, and the first one to re-enter the guest will do the TLB flush. The reason for interrupting the vcpus executing on the old core is to cope with the following scenario: CPU 0 CPU 1 CPU 4 (core 0) (core 0) (core 1) VCPU 0 runs task X VCPU 1 runs core 0 TLB gets entries from task X VCPU 0 moves to CPU 4 VCPU 0 runs task X Unmap pages of task X tlbiel (still VCPU 1) task X moves to VCPU 1 task X runs task X sees stale TLB entries That is, as soon as the VCPU starts executing on the new core, it could unmap and tlbiel some page table entries, and then the task could migrate to one of the VCPUs running on the old core and potentially see stale TLB entries. Since the TLB is shared between all the threads in a core, we only use the bit of kvm->arch.need_tlb_flush corresponding to the first thread in the core. To ensure that we don't have a window where we can miss a flush, this moves the clearing of the bit from before the actual flush to after it. This way, two threads might both do the flush, but we prevent the situation where one thread can enter the guest before the flush is finished. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix modePaul Mackerras2017-01-311-0/+14
| | | | | | | | | | | | | | | | | | | | If the guest is in radix mode, then it doesn't have a hashed page table (HPT), so all of the hypercalls that manipulate the HPT can't work and should return an error. This adds checks to make them return H_FUNCTION ("function not supported"). Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * KVM: PPC: Book3S HV: Implement dirty page logging for radix guestsPaul Mackerras2017-01-313-32/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds code to keep track of dirty pages when requested (that is, when memslot->dirty_bitmap is non-NULL) for radix guests. We use the dirty bits in the PTEs in the second-level (partition-scoped) page tables, together with a bitmap of pages that were dirty when their PTE was invalidated (e.g., when the page was paged out). This bitmap is stored in the first half of the memslot->dirty_bitmap area, and kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the bitmap that gets returned to userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * KVM: PPC: Book3S HV: MMU notifier callbacks for radix guestsPaul Mackerras2017-01-312-21/+97
| | | | | | | | | | | | | | | | | | | | | | | | This adapts our implementations of the MMU notifier callbacks (unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva) to call radix functions when the guest is using radix. These implementations are much simpler than for HPT guests because we have only one PTE to deal with, so we don't need to traverse rmap chains. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>