summaryrefslogtreecommitdiffstats
path: root/include/uapi/linux/kvm.h
Commit message (Collapse)AuthorAgeFilesLines
* Documentation: move Documentation/virtual to Documentation/virtChristoph Hellwig2019-07-241-2/+2
| | | | | | | | | | | | Renaming docs seems to be en vogue at the moment, so fix on of the grossly misnamed directories. We usually never use "virtual" as a shortcut for virtualization in the kernel, but always virt, as seen in the virt/ top-level directory. Fix up the documentation to match that. Fixes: ed16648eb5b8 ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: PMU Event FilterEric Hankland2019-07-111-0/+3
| | | | | | | | | | | | | | Some events can provide a guest with information about other guests or the host (e.g. L3 cache stats); providing the capability to restrict access to a "safe" set of events would limit the potential for the PMU to be used in any side channel attacks. This change introduces a new VM ioctl that sets an event filter. If the guest attempts to program a counter for any blacklisted or non-whitelisted event, the kernel counter won't be created, so any RDPMC/RDMSR will show 0 instances of that event. Signed-off-by: Eric Hankland <ehankland@google.com> [Lots of changes. All remaining bugs are probably mine. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: X86: Provide a capability to disable cstate msr read interceptsWanpeng Li2019-06-041-1/+3
| | | | | | | | | | | | | Allow guest reads CORE cstate when exposing host CPU power management capabilities to the guest. PKG cstate is restricted to avoid a guest to get the whole package information in multi-tenant scenario. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'kvmarm-for-v5.2' of ↵Paolo Bonzini2019-05-151-0/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 5.2 - guest SVE support - guest Pointer Authentication support - Better discrimination of perf counters between host and guests Conflicts: include/uapi/linux/kvm.h
| * KVM: arm64: Add capability to advertise ptrauth for guestAmit Daniel Kachhap2019-04-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch advertises the capability of two cpu feature called address pointer authentication and generic pointer authentication. These capabilities depend upon system support for pointer authentication and VHE mode. The current arm64 KVM partially implements pointer authentication and support of address/generic authentication are tied together. However, separate ABI requirements for both of them is added so that any future isolated implementation will not require any ABI changes. Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: Add a capability to advertise SVE supportDave Martin2019-03-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | To provide a uniform way to check for KVM SVE support amongst other features, this patch adds a suitable capability KVM_CAP_ARM_SVE, and reports it as present when SVE is available. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctlDave Martin2019-03-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some aspects of vcpu configuration may be too complex to be completed inside KVM_ARM_VCPU_INIT. Thus, there may be a requirement for userspace to do some additional configuration before various other ioctls will work in a consistent way. In particular this will be the case for SVE, where userspace will need to negotiate the set of vector lengths to be made available to the guest before the vcpu becomes fully usable. In order to provide an explicit way for userspace to confirm that it has finished setting up a particular vcpu feature, this patch adds a new ioctl KVM_ARM_VCPU_FINALIZE. When userspace has opted into a feature that requires finalization, typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a matching call to KVM_ARM_VCPU_FINALIZE is now required before KVM_RUN or KVM_GET_REG_LIST is allowed. Individual features may impose additional restrictions where appropriate. No existing vcpu features are affected by this, so current userspace implementations will continue to work exactly as before, with no need to issue KVM_ARM_VCPU_FINALIZE. As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a placeholder: no finalizable features exist yet, so ioctl is not required and will always yield EINVAL. Subsequent patches will add the finalization logic to make use of this ioctl for SVE. No functional change for existing userspace. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: Allow 2048-bit register access via ioctl interfaceDave Martin2019-03-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Arm SVE architecture defines registers that are up to 2048 bits in size (with some possibility of further future expansion). In order to avoid the need for an excessively large number of ioctls when saving and restoring a vcpu's registers, this patch adds a #define to make support for individual 2048-bit registers through the KVM_{GET,SET}_ONE_REG ioctl interface official. This will allow each SVE register to be accessed in a single call. There are sufficient spare bits in the register id size field for this change, so there is no ABI impact, providing that KVM_GET_REG_LIST does not enumerate any 2048-bit register unless userspace explicitly opts in to the relevant architecture-specific features. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | Merge tag 'kvm-ppc-next-5.2-2' of ↵Paolo Bonzini2019-05-151-0/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM update for 5.2 * Support for guests to access the new POWER9 XIVE interrupt controller hardware directly, reducing interrupt latency and overhead for guests. * In-kernel implementation of the H_PAGE_INIT hypercall. * Reduce memory usage of sparsely-populated IOMMU tables. * Several bug fixes. Second PPC KVM update for 5.2 * Fix a bug, fix a spelling mistake, remove some useless code.
| * | KVM: PPC: Book3S HV: XIVE: Introduce a new capability KVM_CAP_PPC_IRQ_XIVECédric Le Goater2019-04-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to let QEMU connect the vCPU presenters to the XIVE KVM device if required. The capability is not advertised for now as the full support for the XIVE native exploitation mode is not yet available. When this is case, the capability will be advertised on PowerNV Hypervisors only. Nested guests (pseries KVM Hypervisor) are not supported. Internally, the interface to the new KVM device is protected with a new interrupt mode: KVMPPC_IRQ_XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation modeCédric Le Goater2019-04-301-0/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the basic framework for the new KVM device supporting the XIVE native exploitation mode. The user interface exposes a new KVM device to be created by QEMU, only available when running on a L0 hypervisor. Support for nested guests is not available yet. The XIVE device reuses the device structure of the XICS-on-XIVE device as they have a lot in common. That could possibly change in the future if the need arise. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* / KVM: Introduce KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2Peter Xu2019-05-081-2/+3
|/ | | | | | | | | | The previous KVM_CAP_MANUAL_DIRTY_LOG_PROTECT has some problem which blocks the correct usage from userspace. Obsolete the old one and introduce a new capability bit for it. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUIDVitaly Kuznetsov2018-12-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | With every new Hyper-V Enlightenment we implement we're forced to add a KVM_CAP_HYPERV_* capability. While this approach works it is fairly inconvenient: the majority of the enlightenments we do have corresponding CPUID feature bit(s) and userspace has to know this anyways to be able to expose the feature to the guest. Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one cap to rule them all!") returning all Hyper-V CPUID feature leaves. Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible: Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000, 0x40000001) and we would probably confuse userspace in case we decide to return these twice. KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: introduce manual dirty log reprotectPaolo Bonzini2018-12-141-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are two problems with KVM_GET_DIRTY_LOG. First, and less important, it can take kvm->mmu_lock for an extended period of time. Second, its user can actually see many false positives in some cases. The latter is due to a benign race like this: 1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects them. 2. The guest modifies the pages, causing them to be marked ditry. 3. Userspace actually copies the pages. 4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though they were not written to since (3). This is especially a problem for large guests, where the time between (1) and (3) can be substantial. This patch introduces a new capability which, when enabled, makes KVM_GET_DIRTY_LOG not write-protect the pages it returns. Instead, userspace has to explicitly clear the dirty log bits just before using the content of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a 64-page granularity rather than requiring to sync a full memslot; this way, the mmu_lock is taken for small amounts of time, and only a small amount of time will pass between write protection of pages and the sending of their content. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'kvmarm-for-v4.20' of ↵Paolo Bonzini2018-10-191-0/+10
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 4.20 - Improved guest IPA space support (32 to 52 bits) - RAS event delivery for 32bit - PMU fixes - Guest entry hardening - Various cleanups
| * kvm: arm64: Allow tuning the physical address size for VMSuzuki K Poulose2018-10-031-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow specifying the physical address size limit for a new VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in bits[7:0] of the type field. For backward compatibility the value 0 is reserved and implies 40bits. Also, lift the limit of the IPA to host limit and allow lower IPA sizes (e.g, 32). The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE for the availability of this feature. The cap check returns the maximum limit for the physical address shift supported by the host. Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@kernel.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOADJim Mattson2018-10-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a per-VM capability which can be enabled by userspace so that the faulting linear address will be included with the information about a pending #PF in L2, and the "new DR6 bits" will be included with the information about a pending #DB in L2. With this capability enabled, the L1 hypervisor can now intercept #PF before CR2 is modified. Under VMX, the L1 hypervisor can now intercept #DB before DR6 and DR7 are modified. When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should generally provide an appropriate payload when injecting a #PF or #DB exception via KVM_SET_VCPU_EVENTS. However, to support restoring old checkpoints, this payload is not required. Note that bit 16 of the "new DR6 bits" is set to indicate that a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. This is the reverse of DR6.RTM, which is cleared in this scenario. This capability also enables exception.pending in struct kvm_vcpu_events, which allows userspace to distinguish between pending and injected exceptions. Reported-by: Jim Mattson <jmattson@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capabilityVitaly Kuznetsov2018-10-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enlightened VMCS is opt-in. The current version does not contain all fields supported by nested VMX so we must not advertise the corresponding VMX features if enlightened VMCS is enabled. Userspace is given the enlightened VMCS version supported by KVM as part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to be advertised to the nested hypervisor, currently done via a cpuid leaf for Hyper-V. Suggested-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | kvm/x86 : add coalesced pio supportPeng Hao2018-10-171-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coalesced pio is based on coalesced mmio and can be used for some port like rtc port, pci-host config port and so on. Specially in case of rtc as coalesced pio, some versions of windows guest access rtc frequently because of rtc as system tick. guest access rtc like this: write register index to 0x70, then write or read data from 0x71. writing 0x70 port is just as index and do nothing else. So we can use coalesced pio to handle this scene to reduce VM-EXIT time. When starting and closing a virtual machine, it will access pci-host config port frequently. So setting these port as coalesced pio can reduce startup and shutdown time. without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8 using perf tools: (guest OS : windows 7 64bit) IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x70:POUT 86 30.99% 74.59% 9us 29us 10.75us (+- 3.41%) 0xcf8:POUT 1119 2.60% 2.12% 2.79us 56.83us 3.41us (+- 2.23%) with my patch IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x70:POUT 106 32.02% 29.47% 0us 10us 1.57us (+- 7.38%) 0xcf8:POUT 1065 1.67% 0.28% 0.41us 65.44us 0.66us (+- 10.55%) Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: hyperv: implement PV IPI send hypercallsVitaly Kuznetsov2018-10-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Using hypercall for sending IPIs is faster because this allows to specify any number of vCPUs (even > 64 with sparse CPU set), the whole procedure will take only one VMEXIT. Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi hypercall can't be 'fast' (passing parameters through registers) but apparently this is not true, Windows always uses it as 'fast' so we need to support that. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl resultPaul Mackerras2018-10-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a KVM_PPC_NO_HASH flag to the flags field of the kvm_ppc_smmu_info struct, and arranges for it to be set when running as a nested hypervisor, as an unambiguous indication to userspace that HPT guests are not supported. Reporting the KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as indicating only that the new HPT features in ISA V3.0 are not supported, leaving it ambiguous whether pre-V3.0 HPT features are supported. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualizationPaul Mackerras2018-10-091-0/+1
|/ | | | | | | | | | | | | | With this, userspace can enable a KVM-HV guest to run nested guests under it. The administrator can control whether any nested guests can be run; setting the "nested" module parameter to false prevents any guests becoming nested hypervisors (that is, any attempt to enable the nested capability on a guest will fail). Guests which are already nested hypervisors will continue to be so. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* KVM: x86: Control guest reads of MSR_PLATFORM_INFODrew Schmitt2018-09-201-0/+1
| | | | | | | | | | | | | | | | Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access to reads of MSR_PLATFORM_INFO. Disabling access to reads of this MSR gives userspace the control to "expose" this platform-dependent information to guests in a clear way. As it exists today, guests that read this MSR would get unpopulated information if userspace hadn't already set it (and prior to this patch series, only the CPUID faulting information could have been populated). This existing interface could be confusing if guests don't handle the potential for incorrect/incomplete information gracefully (e.g. zero reported for base frequency). Signed-off-by: Drew Schmitt <dasch@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'kvmarm-for-v4.19' of ↵Paolo Bonzini2018-08-221-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 4.19 - Support for Group0 interrupts in guests - Cache management optimizations for ARMv8.4 systems - Userspace interface for RAS, allowing error retrival and injection - Fault path optimization - Emulated physical timer fixes - Random cleanups
| * arm64: KVM: export the capability to set guest SError syndromeDongjiu Geng2018-07-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the arm64 RAS Extension, user space can inject a virtual-SError with specified ESR. So user space needs to know whether KVM support to inject such SError, this interface adds this query for this capability. KVM will check whether system support RAS Extension, if supported, KVM returns true to user space, otherwise returns false. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | kvm: nVMX: Introduce KVM_CAP_NESTED_STATEJim Mattson2018-08-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For nested virtualization L0 KVM is managing a bit of state for L2 guests, this state can not be captured through the currently available IOCTLs. In fact the state captured through all of these IOCTLs is usually a mix of L1 and L2 state. It is also dependent on whether the L2 guest was running at the moment when the process was interrupted to save its state. With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM that is in VMX operation. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> [karahmed@ - rename structs and functions and make them ready for AMD and address previous comments. - handle nested.smm state. - rebase & a bit of refactoring. - Merge 7/8 and 8/8 into one patch. ] Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: s390: Add huge page enablement controlJanosch Frank2018-07-301-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | General KVM huge page support on s390 has to be enabled via the kvm.hpage module parameter. Either nested or hpage can be enabled, as we currently do not support vSIE for huge backed guests. Once the vSIE support is added we will either drop the parameter or enable it as default. For a guest the feature has to be enabled through the new KVM_CAP_S390_HPAGE_1M capability and the hpage module parameter. Enabling it means that cmm can't be enabled for the vm and disables pfmf and storage key interpretation. This is due to the fact that in some cases, in upcoming patches, we have to split huge pages in the guest mapping to be able to set more granular memory protection on 4k pages. These split pages have fake page tables that are not visible to the Linux memory management which subsequently will not manage its PGSTEs, while the SIE will. Disabling these features lets us manage PGSTE data in a consistent matter and solve that problem. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
* kvm: fix typo in flag nameMichael S. Tsirkin2018-06-121-2/+2
| | | | | | | | | | KVM_X86_DISABLE_EXITS_HTL really refers to exit on halt. Obviously a typo: should be named KVM_X86_DISABLE_EXITS_HLT. Fixes: caa057a2cad ("KVM: X86: Provide a capability to disable HLT intercepts") Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capabilityVitaly Kuznetsov2018-05-261-0/+1
| | | | | | | | | | We need a new capability to indicate support for the newly added HvFlushVirtualAddress{List,Space}{,Ex} hypercalls. Upon seeing this capability, userspace is supposed to announce PV TLB flush features by setting the appropriate CPUID bits (if needed). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPIKarimAllah Ahmed2018-04-271-0/+7
| | | | | | | | | | | | | | | | Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of capabilities. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: X86: Provide a capability to disable MWAIT interceptsWanpeng Li2018-03-161-1/+1
| | | | | | | | | | | | | | | | Allowing a guest to execute MWAIT without interception enables a guest to put a (physical) CPU into a power saving state, where it takes longer to return from than what may be desired by the host. Don't give a guest that power over a host by default. (Especially, since nothing prevents a guest from using MWAIT even when it is not advertised via CPUID.) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: add SYNC_REGS_SIZE_BYTES #define.Ken Hofsass2018-03-061-1/+5
| | | | | | | | | | | | | Replace hardcoded padding size value for struct kvm_sync_regs with #define SYNC_REGS_SIZE_BYTES. Also update the value specified in api.txt from outdated hardcoded value to SYNC_REGS_SIZE_BYTES. Signed-off-by: Ken Hofsass <hofsass@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* kvm: x86: hyperv: guest->host event signaling via eventfdRoman Kagan2018-03-061-0/+15
| | | | | | | | | | | | | | | | | | | | | In Hyper-V, the fast guest->host notification mechanism is the SIGNAL_EVENT hypercall, with a single parameter of the connection ID to signal. Currently this hypercall incurs a user exit and requires the userspace to decode the parameters and trigger the notification of the potentially different I/O context. To avoid the costly user exit, process this hypercall and signal the corresponding eventfd in KVM, similar to ioeventfd. The association between the connection id and the eventfd is established via the newly introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an (srcu-protected) IDR. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: David Hildenbrand <david@redhat.com> [asm/hyperv.h changes approved by KY Srinivasan. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: x86: Add a framework for supporting MSR-based featuresTom Lendacky2018-03-011-0/+2
| | | | | | | | | | | | Provide a new KVM capability that allows bits within MSRs to be recognized as features. Two new ioctls are added to the /dev/kvm ioctl routine to retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops callback is used to determine support for the listed MSR-based features. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Tweaked documentation. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* Merge branch 'x86/hyperv' of ↵Radim Krčmář2018-02-011-0/+4
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Topic branch for stable KVM clockource under Hyper-V. Thanks to Christoffer Dall for resolving the ARM conflict.
| * KVM: s390: wire up bpb featureChristian Borntraeger2018-01-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The new firmware interfaces for branch prediction behaviour changes are transparently available for the guest. Nevertheless, there is new state attached that should be migrated and properly resetted. Provide a mechanism for handling reset, migration and VSIE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [Changed capability number to 152. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * KVM: PPC: Book3S: Provide information about hardware/firmware CVE workaroundsPaul Mackerras2018-01-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace information about the underlying machine's level of vulnerability to the recently announced vulnerabilities CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754, and whether the machine provides instructions to assist software to work around the vulnerabilities. The ioctl returns two u64 words describing characteristics of the CPU and required software behaviour respectively, plus two mask words which indicate which bits have been filled in by the kernel, for extensibility. The bit definitions are the same as for the new H_GET_CPU_CHARACTERISTICS hypercall. There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which indicates whether the new ioctl is available. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | Merge branch 'sev-v9-p2' of https://github.com/codomania/kvmPaolo Bonzini2018-01-161-0/+90
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This part of Secure Encrypted Virtualization (SEV) patch series focuses on KVM changes required to create and manage SEV guests. SEV is an extension to the AMD-V architecture which supports running encrypted virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have their pages (code and data) secured such that only the guest itself has access to unencrypted version. Each encrypted VM is associated with a unique encryption key; if its data is accessed to a different entity using a different key the encrypted guest's data will be incorrectly decrypted, leading to unintelligible data. This security model ensures that hypervisor will no longer able to inspect or alter any guest code or data. The key management of this feature is handled by a separate processor known as the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key Management Specification (see below) provides a set of commands which can be used by hypervisor to load virtual machine keys through the AMD-SP driver. The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPT_OP). The ioctl will be used by qemu to issue SEV guest-specific commands defined in Key Management Specification. The following links provide additional details: AMD Memory Encryption white paper: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf AMD64 Architecture Programmer's Manual: http://support.amd.com/TechDocs/24593.pdf SME is section 7.10 SEV is section 15.34 SEV Key Management: http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf KVM Forum Presentation: http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf SEV Guest BIOS support: SEV support has been add to EDKII/OVMF BIOS https://github.com/tianocore/edk2 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: Define SEV key management command idBrijesh Singh2017-12-041-0/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define Secure Encrypted Virtualization (SEV) key management command id and structure. The command definition is available in SEV KM spec 0.14 (http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf) and Documentation/virtual/kvm/amd-memory-encryption.txt. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
| * KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctlBrijesh Singh2017-12-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to register/unregister the guest memory regions which may contain the encrypted data (e.g guest RAM, PCI BAR, SMRAM etc). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
| * KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctlBrijesh Singh2017-12-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the hardware supports memory encryption then the KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform specific memory encryption commands. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de>
* | KVM: s390: mark irq_state.flags as non-usableChristian Borntraeger2017-12-061-2/+2
|/ | | | | | | | | | | | | Old kernels did not check for zero in the irq_state.flags field and old QEMUs did not zero the flag/reserved fields when calling KVM_S390_*_IRQ_STATE. Let's add comments to prevent future uses of these fields. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-11-161-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Radim Krčmář: "First batch of KVM changes for 4.15 Common: - Python 3 support in kvm_stat - Accounting of slabs to kmemcg ARM: - Optimized arch timer handling for KVM/ARM - Improvements to the VGIC ITS code and introduction of an ITS reset ioctl - Unification of the 32-bit fault injection logic - More exact external abort matching logic PPC: - Support for running hashed page table (HPT) MMU mode on a host that is using the radix MMU mode; single threaded mode on POWER 9 is added as a pre-requisite - Resolution of merge conflicts with the last second 4.14 HPT fixes - Fixes and cleanups s390: - Some initial preparation patches for exitless interrupts and crypto - New capability for AIS migration - Fixes x86: - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs, and after-reset state - Refined dependencies for VMX features - Fixes for nested SMI injection - A lot of cleanups" * tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits) KVM: s390: provide a capability for AIS state migration KVM: s390: clear_io_irq() requests are not expected for adapter interrupts KVM: s390: abstract conversion between isc and enum irq_types KVM: s390: vsie: use common code functions for pinning KVM: s390: SIE considerations for AP Queue virtualization KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup KVM: PPC: Book3S HV: Cosmetic post-merge cleanups KVM: arm/arm64: fix the incompatible matching for external abort KVM: arm/arm64: Unify 32bit fault injection KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared KVM: arm/arm64: vgic-its: New helper functions to free the caches KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device arm/arm64: KVM: Load the timer state when enabling the timer KVM: arm/arm64: Rework kvm_timer_should_fire KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit KVM: arm/arm64: Move phys_timer_emulate function KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps ...
| * KVM: s390: provide a capability for AIS state migrationChristian Borntraeger2017-11-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The AIS capability was introduced in 4.12, while the interface to migrate the state was added in 4.13. Unfortunately it is not possible for userspace to detect the migration capability without creating a flic kvm device. As in QEMU the cpu model detection runs on the "none" machine this will result in cpu model issues regarding the "ais" capability. To get the "ais" capability properly let's add a new KVM capability that tells userspace that AIS states can be migrated. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
* | License cleanup: add SPDX license identifier to uapi header files with no ↵Greg Kroah-Hartman2017-11-021-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | license Many user space API headers are missing licensing information, which makes it hard for compliance tools to determine the correct license. By default are files without license information under the default license of the kernel, which is GPLV2. Marking them GPLV2 would exclude them from being included in non GPLV2 code, which is obviously not intended. The user space API headers fall under the syscall exception which is in the kernels COPYING file: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work". otherwise syscall usage would not be possible. Update the files which contain no license information with an SPDX license identifier. The chosen identifier is 'GPL-2.0 WITH Linux-syscall-note' which is the officially assigned identifier for the Linux syscall exception. SPDX license identifiers are a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. See the previous patch in this series for the methodology of how this patch was researched. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S HV: Report storage key support to userspacePaul Mackerras2017-08-311-1/+2
| | | | | | | | | | | | | | | | This adds information about storage keys to the struct returned by the KVM_PPC_GET_SMMU_INFO ioctl. The new fields replace a pad field, which was zeroed by previous kernel versions. Thus userspace that knows about the new fields will see zeroes when running on an older kernel, indicating that storage keys are not supported. The size of the structure has not changed. The number of keys is hard-coded for the CPUs supported by HV KVM, which is just POWER7, POWER8 and POWER9. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* kvm: x86: hyperv: make VP_INDEX managed by userspaceRoman Kagan2017-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Hyper-V identifies vCPUs by Virtual Processor Index, which can be queried via HV_X64_MSR_VP_INDEX msr. It is defined by the spec as a sequential number which can't exceed the maximum number of vCPUs per VM. APIC ids can be sparse and thus aren't a valid replacement for VP indices. Current KVM uses its internal vcpu index as VP_INDEX. However, to make it predictable and persistent across VM migrations, the userspace has to control the value of VP_INDEX. This patch achieves that, by storing vp_index explicitly on vcpu, and allowing HV_X64_MSR_VP_INDEX to be set from the host side. For compatibility it's initialized to KVM vcpu index. Also a few variables are renamed to make clear distinction betweed this Hyper-V vp_index and KVM vcpu_id (== APIC id). Besides, a new capability, KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip attempting msr writes where unsupported, to avoid spamming error logs. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2Roman Kagan2017-07-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | There is a flaw in the Hyper-V SynIC implementation in KVM: when message page or event flags page is enabled by setting the corresponding msr, KVM zeroes it out. This is problematic because on migration the corresponding MSRs are loaded on the destination, so the content of those pages is lost. This went unnoticed so far because the only user of those pages was in-KVM hyperv synic timers, which could continue working despite that zeroing. Newer QEMU uses those pages for Hyper-V VMBus implementation, and zeroing them breaks the migration. Besides, in newer QEMU the content of those pages is fully managed by QEMU, so zeroing them is undesirable even when writing the MSRs from the guest side. To support this new scheme, introduce a new capability, KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic pages aren't zeroed out in KVM. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definitionGleb Fotengauer-Malinovskiy2017-07-121-1/+1
| | | | | | | | | | | | In case of KVM_S390_GET_CMMA_BITS, the kernel does not only read struct kvm_s390_cmma_log passed from userspace (which constitutes _IOC_WRITE), it also writes back a return value (which constitutes _IOC_READ) making this an _IOWR ioctl instead of _IOW. Fixes: 4036e387 ("KVM: s390: ioctls to get and set guest storage attributes") Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini2017-07-031-0/+2
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD - Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending.