summaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
* kvm: Make init_rmode_identity_map() return 0 on success.Tang Chen2014-09-171-10/+8Star
| | | | | | | | | | | | In init_rmode_identity_map(), there two variables indicating the return value, r and ret, and it return 0 on error, 1 on success. The function is only called by vmx_create_vcpu(), and ret is redundant. This patch removes the redundant variable, and makes init_rmode_identity_map() return 0 on success, -errno on failure. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: Remove ept_identity_pagetable from struct kvm_arch.Tang Chen2014-09-173-28/+22Star
| | | | | | | | | | | | | | | | | | | | | kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But it is never used to refer to the page at all. In vcpu initialization, it indicates two things: 1. indicates if ept page is allocated 2. indicates if a memory slot for identity page is initialized Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept identity pagetable is initialized. So we can remove ept_identity_pagetable. NOTE: In the original code, ept identity pagetable page is pinned in memroy. As a result, it cannot be migrated/hot-removed. After this patch, since kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page is no longer pinned in memory. And it can be migrated/hot-removed. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Use kvm_make_request when applicableGuo Hui Liu2014-09-161-8/+7Star
| | | | | | | | This patch replace the set_bit method by kvm_make_request to make code more readable and consistent. Signed-off-by: Guo Hui Liu <liuguohui@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: make apic_accept_irq tracepoint more genericPaolo Bonzini2014-09-112-9/+6Star
| | | | | | | | | | Initially the tracepoint was added only to the APIC_DM_FIXED case, also because it reported coalesced interrupts that only made sense for that case. However, the coalesced argument is not used anymore and tracing other delivery modes is useful, so hoist the call out of the switch statement. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.Tang Chen2014-09-112-4/+5
| | | | | | | | | We have APIC_DEFAULT_PHYS_BASE defined as 0xfee00000, which is also the address of apic access page. So use this macro. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: propagate exception from permission checks on the nested page faultPaolo Bonzini2014-09-054-11/+16
| | | | | | | | | | | | | Currently, if a permission error happens during the translation of the final GPA to HPA, walk_addr_generic returns 0 but does not fill in walker->fault. To avoid this, add an x86_exception* argument to the translate_gpa function, and let it fill in walker->fault. The nested_page_fault field will be true, since the walk_mmu is the nested_mmu and translate_gpu instead operates on the "outer" (NPT) instance. Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: skip writeback on injection of nested exceptionPaolo Bonzini2014-09-052-6/+10
| | | | | | | | | | | | If a nested page fault happens during emulation, we will inject a vmexit, not a page fault. However because writeback happens after the injection, we will write ctxt->eip from L2 into the L1 EIP. We do not write back if an instruction caused an interception vmexit---do the same for page faults. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nSVM: propagate the NPF EXITINFO to the guestPaolo Bonzini2014-09-032-4/+32
| | | | | | | This is similar to what the EPT code does with the exit qualification. This allows the guest to see a valid value for bits 33:32. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMDPaolo Bonzini2014-09-032-2/+19
| | | | | | | | Bit 8 would be the "global" bit, which does not quite make sense for non-leaf page table entries. Intel ignores it; AMD ignores it in PDEs, but reserves it in PDPEs and PML4Es. The SVM test is relying on this behavior, so enforce it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: mmio: cleanup kvm_set_mmio_spte_maskTiejun Chen2014-09-033-6/+6
| | | | | | | | Just reuse rsvd_bits() inside kvm_set_mmio_spte_mask() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86: fix stale mmio cache bugDavid Matlack2014-09-033-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: fix potentially corrupt mmio cacheDavid Matlack2014-09-031-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vcpu exits and memslot mutations can run concurrently as long as the vcpu does not aquire the slots mutex. Thus it is theoretically possible for memslots to change underneath a vcpu that is handling an exit. If we increment the memslot generation number again after synchronize_srcu_expedited(), vcpus can safely cache memslot generation without maintaining a single rcu_dereference through an entire vm exit. And much of the x86/kvm code does not maintain a single rcu_dereference of the current memslots during each exit. We can prevent the following case: vcpu (CPU 0) | thread (CPU 1) --------------------------------------------+-------------------------- 1 vm exit | 2 srcu_read_unlock(&kvm->srcu) | 3 decide to cache something based on | old memslots | 4 | change memslots | (increments generation) 5 | synchronize_srcu(&kvm->srcu); 6 retrieve generation # from new memslots | 7 tag cache with new memslot generation | 8 srcu_read_unlock(&kvm->srcu) | ... | <action based on cache occurs even | though the caching decision was based | on the old memslots> | ... | <action *continues* to occur until next | memslot generation change, which may | be never> | | By incrementing the generation after synchronizing with kvm->srcu readers, we ensure that the generation retrieved in (6) will become invalid soon after (8). Keeping the existing increment is not strictly necessary, but we do keep it and just move it for consistency from update_memslots to install_new_memslots. It invalidates old cached MMIOs immediately, instead of having to wait for the end of synchronize_srcu_expedited, which makes the code more clearly correct in case CPU 1 is preempted right after synchronize_srcu() returns. To avoid halving the generation space in SPTEs, always presume that the low bit of the generation is zero when reconstructing a generation number out of an SPTE. This effectively disables MMIO caching in SPTEs during the call to synchronize_srcu_expedited. Using the low bit this way is somewhat like a seqcount---where the protected thing is a cache, and instead of retrying we can simply punt if we observe the low bit to be 1. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: do not bias the generation number in kvm_current_mmio_generationPaolo Bonzini2014-09-031-6/+1Star
| | | | | | | | | | | The next patch will give a meaning (a la seqcount) to the low bit of the generation number. Ensure that it matches between kvm->memslots->generation and kvm_current_mmio_generation(). Cc: stable@vger.kernel.org Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: use guest maxphyaddr to check MTRR valuesPaolo Bonzini2014-08-291-3/+2Star
| | | | | | | | | | | | | | The check introduced in commit d7a2a246a1b5 (KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRs, 2014-08-19) will break if the guest maxphyaddr is higher than the host's (which sometimes happens depending on your hardware and how QEMU is configured). To fix this, use cpuid_maxphyaddr similar to how the APIC_BASE MSR does already. Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: remove garbage arg to *hardware_{en,dis}ableRadim Krčmář2014-08-294-12/+12
| | | | | | | | | | In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: forward declare structs in kvm_types.hPaolo Bonzini2014-08-291-4/+0Star
| | | | | | | | | | Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid "'struct foo' declared inside parameter list" warnings (and consequent breakage due to conflicting types). Move them from individual files to a generic place in linux/kvm_types.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: remove Aligned bit from movntps/movntpdPaolo Bonzini2014-08-291-3/+3
| | | | | | These are not explicitly aligned, and do not require alignment on AVX. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86 emulator: emulate MOVNTDQAlex Williamson2014-08-291-1/+6
| | | | | | | | | | | | | | | | | | | | | | Windows 8.1 guest with NVIDIA driver and GPU fails to boot with an emulation failure. The KVM spew suggests the fault is with lack of movntdq emulation (courtesy of Paolo): Code=02 00 00 b8 08 00 00 00 f3 0f 6f 44 0a f0 f3 0f 6f 4c 0a e0 <66> 0f e7 41 f0 66 0f e7 49 e0 48 83 e9 40 f3 0f 6f 44 0a 10 f3 0f 6f 0c 0a 66 0f e7 41 10 $ as -o a.out .section .text .byte 0x66, 0x0f, 0xe7, 0x41, 0xf0 .byte 0x66, 0x0f, 0xe7, 0x49, 0xe0 $ objdump -d a.out 0: 66 0f e7 41 f0 movntdq %xmm0,-0x10(%rcx) 5: 66 0f e7 49 e0 movntdq %xmm1,-0x20(%rcx) Add the necessary emulation. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: vmx: VMXOFF emulation in vm86 should cause #UDNadav Amit2014-08-291-6/+8
| | | | | | | | | | | | | | | | | Unlike VMCALL, the instructions VMXOFF, VMLAUNCH and VMRESUME should cause a UD exception in real-mode or vm86. However, the emulator considers all these instructions the same for the matter of mode checks, and emulation upon exit due to #UD exception. As a result, the hypervisor behaves incorrectly on vm86 mode. VMXOFF, VMLAUNCH or VMRESUME cause on vm86 exit due to #UD. The hypervisor then emulates these instruction and inject #GP to the guest instead of #UD. This patch creates a new group for these instructions and mark only VMCALL as an instruction which can be emulated. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: fix some sparse warningsPaolo Bonzini2014-08-292-7/+7
| | | | | | | | | | | | | | | Sparse reports the following easily fixed warnings: arch/x86/kvm/vmx.c:8795:48: sparse: Using plain integer as NULL pointer arch/x86/kvm/vmx.c:2138:5: sparse: symbol vmx_read_l1_tsc was not declared. Should it be static? arch/x86/kvm/vmx.c:6151:48: sparse: Using plain integer as NULL pointer arch/x86/kvm/vmx.c:8851:6: sparse: symbol vmx_sched_in was not declared. Should it be static? arch/x86/kvm/svm.c:2162:5: sparse: symbol svm_read_l1_tsc was not declared. Should it be static? Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: nested TPR shadow/threshold emulationWanpeng Li2014-08-291-3/+51
| | | | | | | | | | | | | | | | | This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411 TPR shadow/threshold feature is important to speed up the Windows guest. Besides, it is a must feature for certain VMM. We map virtual APIC page address and TPR threshold from L1 VMCS. If TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested in, we inject it into L1 VMM for handling. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> [Add PAGE_ALIGNED check, do not write useless virtual APIC page address if TPR shadowing is disabled. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: introduce nested_get_vmcs12_pagesWanpeng Li2014-08-291-12/+25
| | | | | | | | Introduce function nested_get_vmcs12_pages() to check the valid of nested apic access page and virtual apic page earlier. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86: fix tracing for 32-bitPaolo Bonzini2014-08-251-2/+2
| | | | | | | | Fix commit 7b46268d29543e313e731606d845e65c17f232e4, which mistakenly included the new tracepoint under #ifdef CONFIG_X86_64. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: trace kvm_ple_window grow/shrinkRadim Krčmář2014-08-213-0/+35
| | | | | | | Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: dynamise PLE windowRadim Krčmář2014-08-211-2/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Window is increased on every PLE exit and decreased on every sched_in. The idea is that we don't want to PLE exit if there is no preemption going on. We do this with sched_in() because it does not hold rq lock. There are two new kernel parameters for changing the window: ple_window_grow and ple_window_shrink ple_window_grow affects the window on PLE exit and ple_window_shrink does it on sched_in; depending on their value, the window is modifier like this: (ple_window is kvm_intel's global) ple_window_shrink/ | ple_window_grow | PLE exit | sched_in -------------------+--------------------+--------------------- < 1 | = ple_window | = ple_window < ple_window | *= ple_window_grow | /= ple_window_shrink otherwise | += ple_window_grow | -= ple_window_shrink A third new parameter, ple_window_max, controls the maximal ple_window; it is internally rounded down to a closest multiple of ple_window_grow. VCPU's PLE window is never allowed below ple_window. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: make PLE window per-VCPURadim Krčmář2014-08-211-1/+11
| | | | | | | | | | Change PLE window into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Brings in a small overhead on every vmentry. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: introduce sched_in to kvm_x86_opsRadim Krčmář2014-08-214-0/+15
| | | | | | | | sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: add kvm_arch_sched_inRadim Krčmář2014-08-211-0/+4
| | | | | | | | | Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Replace X86_FEATURE_NX offset with the definitionNadav Amit2014-08-211-2/+2
| | | | | | | | Replace reference to X86_FEATURE_NX using bit shift with the defined X86_FEATURE_NX. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: emulate: warn on invalid or uninitialized exception numbersPaolo Bonzini2014-08-202-1/+5
| | | | | | | | | | These were reported when running Jailhouse on AMD processors. Initialize ctxt->exception.vector with an invalid exception number, and warn if it remained invalid even though the emulator got an X86EMUL_PROPAGATE_FAULT return code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: emulate: do not return X86EMUL_PROPAGATE_FAULT explicitlyPaolo Bonzini2014-08-201-5/+3Star
| | | | | | | Always get it through emulate_exception or emulate_ts. This ensures that the ctxt->exception fields have been populated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Clarify PMU related features bit manipulationNadav Amit2014-08-201-10/+14
| | | | | | | | | | kvm_pmu_cpuid_update makes a lot of bit manuiplation operations, when in fact there are already unions that can be used instead. Changing the bit manipulation to the union for clarity. This patch does not change the functionality. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: vmx: fix ept reserved bits for 1-GByte pageWanpeng Li2014-08-201-10/+12
| | | | | | | | | | | | | | | EPT misconfig handler in kvm will check which reason lead to EPT misconfiguration after vmexit. One of the reasons is that an EPT paging-structure entry is configured with settings reserved for future functionality. However, the handler can't identify if paging-structure entry of reserved bits for 1-GByte page are configured, since PDPTE which point to 1-GByte page will reserve bits 29:12 instead of bits 7:3 which are reserved for PDPTE that references an EPT Page Directory. This patch fix it by reserve bits 29:12 for 1-GByte page. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: recalculate_apic_map after enabling apicNadav Amit2014-08-191-11/+14
| | | | | | | | | | | Currently, recalculate_apic_map ignores vcpus whose lapic is software disabled through the spurious interrupt vector. However, once it is re-enabled, the map is not recalculated. Therefore, if the guest OS configured DFR while lapic is software-disabled, the map may be incorrect. This patch recalculates apic map after software enabling the lapic. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Clear apic tsc-deadline after deadlineNadav Amit2014-08-191-0/+5
| | | | | | | | | | | | | Intel SDM 10.5.4.1 says "When the timer generates an interrupt, it disarms itself and clears the IA32_TSC_DEADLINE MSR". This patch clears the MSR upon timer interrupt delivery which delivered on deadline mode. Since the MSR may be reconfigured while an interrupt is pending, causing the new value to be overriden, pending timer interrupts are checked before setting a new deadline. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRsWanpeng Li2014-08-191-3/+15
| | | | | | | | | | Section 11.11.2.3 of the SDM mentions "All other bits in the IA32_MTRR_PHYSBASEn and IA32_MTRR_PHYSMASKn registers are reserved; the processor generates a general-protection exception(#GP) if software attempts to write to them". This patch do it in kvm. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: fix check legal type of Variable Range MTRRsWanpeng Li2014-08-191-1/+7
| | | | | | | | | | | | The first entry in each pair(IA32_MTRR_PHYSBASEn) defines the base address and memory type for the range; the second entry(IA32_MTRR_PHYSMASKn) contains a mask used to determine the address range. The legal values for the type field of IA32_MTRR_PHYSBASEn are 0,1,4,5, and 6. However, IA32_MTRR_PHYSMASKn don't have type field. This patch avoid check if the type field is legal for IA32_MTRR_PHYSMASKn. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* arch/x86: Use RCU_INIT_POINTER(x, NULL) in kvm/vmx.cMonam Agarwal2014-08-191-1/+1
| | | | | | | | | | | Here rcu_assign_pointer() is ensuring that the initialization of a structure is carried out before storing a pointer to that structure. So, rcu_assign_pointer(p, NULL) can always safely be converted to RCU_INIT_POINTER(p, NULL). Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: raise invalid TSS exceptions during a task switchPaolo Bonzini2014-08-191-1/+1
| | | | | | | Conditions that would usually trigger a general protection fault should instead raise #TS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: drop fpu_activate hookWanpeng Li2014-08-193-3/+0Star
| | | | | | | | | | | | The only user of the fpu_activate hook was dropped in commit 2d04a05bd7e9 (KVM: x86 emulator: emulate CLTS internally, 2011-04-20). vmx_fpu_activate and svm_fpu_activate are still called on #NM (and for Intel CLTS), but never from common code; hence, there's no need for a hook. Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: SVM: add rdmsr support for AMD event registersWei Huang2014-08-191-0/+6
| | | | | | | | | | | | Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_PERFCTR0 MSRs. Reading the rest MSRs will trigger KVM to inject #GP into guest VM. This causes a warning message "Failed to access perfctr msr (MSR c0010001 is ffffffffffffffff)" on AMD host. This patch adds RDMSR support for all K7_EVNTSELn and K7_PERFCTRn registers and thus supresses the warning message. Signed-off-by: Wei Huang <wehuang@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Revert "KVM: x86: Increase the number of fixed MTRR regs to 10"Paolo Bonzini2014-08-191-1/+1
| | | | | | | | | | | | | This reverts commit 682367c494869008eb89ef733f196e99415ae862, which causes 32-bit SMP Windows 7 guests to panic. SeaBIOS has a limit on the number of MTRRs that it can handle, and this patch exceeded the limit. Better revert it. Thanks to Nadav Amit for debugging the cause. Cc: stable@nongnu.org Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: do not check CS.DPL against RPL during task switchPaolo Bonzini2014-08-191-3/+0Star
| | | | | | | | | | | | | | This reverts the check added by commit 5045b468037d (KVM: x86: check CS.DPL against RPL during task switch, 2014-05-15). Although the CS.DPL=CS.RPL check is mentioned in table 7-1 of the SDM as causing a #TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS conditions" which cause #TSS exceptions. In fact it causes some tests to fail, which pass on bare-metal. Keep the rest of the commit, since we will find new uses for it in 3.18. Reported-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Avoid emulating instructions on #UD mistakenlyNadav Amit2014-08-191-4/+4
| | | | | | | | | | | | | Commit d40a6898e5 mistakenly caused instructions which are not marked as EmulateOnUD to be emulated upon #UD exception. The commit caused the check of whether the instruction flags include EmulateOnUD to never be evaluated. As a result instructions whose emulation is broken may be emulated. This fix moves the evaluation of EmulateOnUD so it would be evaluated. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Tweak operand order in &&, remove EmulateOnUD where it's now superfluous. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge branch 'release' of ↵Linus Torvalds2014-08-161-0/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull idle update from Len Brown: "Two Intel-platform-specific updates to intel_idle, and a cosmetic tweak to the turbostat utility" * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: tweak whitespace in output format intel_idle: Broadwell support intel_idle: Disable Baytrail Core and Module C6 auto-demotion
| * intel_idle: Disable Baytrail Core and Module C6 auto-demotionLen Brown2014-08-151-0/+3
| | | | | | | | | | | | | | | | | | | | Power efficiency improves on Baytrail (Intel Atom Processor E3000) when Linux disables C6 auto-demotion. Based on work by Srinidhi Kasagar <srinidhi.kasagar@intel.com>. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
* | Merge tag 'pci-v3.17-changes-2' of ↵Linus Torvalds2014-08-152-6/+6
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull DEFINE_PCI_DEVICE_TABLE removal from Bjorn Helgaas: "Part two of the PCI changes for v3.17: - Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine) It's a mechanical change that removes uses of the DEFINE_PCI_DEVICE_TABLE macro. I waited until later in the merge window to reduce conflicts, but it's possible you'll still see a few" * tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use
| * | PCI: Remove DEFINE_PCI_DEVICE_TABLE macro useBenoit Taine2014-08-122-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines. This issue was reported by checkpatch. A simplified version of the semantic patch that makes this change is as follows (http://coccinelle.lip6.fr/): // <smpl> @@ identifier i; declarer name DEFINE_PCI_DEVICE_TABLE; initializer z; @@ - DEFINE_PCI_DEVICE_TABLE(i) + const struct pci_device_id i[] = z; // </smpl> [bhelgaas: add semantic patch] Signed-off-by: Benoit Taine <benoit.taine@lip6.fr> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | | Merge tag 'stable/for-linus-3.17-b-rc0-tag' of ↵Linus Torvalds2014-08-142-6/+6
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bugfixes from David Vrabel: - fix ARM build - fix boot crash with PVH guests - improve reliability of resume/migration * tag 'stable/for-linus-3.17-b-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: use vmap() to map grant table pages in PVH guests x86/xen: resume timer irqs early arm/xen: remove duplicate arch_gnttab_init() function
| * | | x86/xen: use vmap() to map grant table pages in PVH guestsDavid Vrabel2014-08-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b7dd0e350e0b (x86/xen: safely map and unmap grant frames when in atomic context) causes PVH guests to crash in arch_gnttab_map_shared() when they attempted to map the pages for the grant table. This use of a PV-specific function during the PVH grant table setup is non-obvious and not needed. The standard vmap() function does the right thing. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Mukesh Rathor <mukesh.rathor@oracle.com> Tested-by: Mukesh Rathor <mukesh.rathor@oracle.com> Cc: stable@vger.kernel.org