openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
*	KVM: Fix kvmclock on !constant_tsc boxes	Gerd Hoffmann	2009-03-24	1	-9/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kvmclock currently falls apart on machines without constant tsc. This patch fixes it. Changes: * keep tsc frequency in a per-cpu variable. * handle kvmclock update using a new request flag, thus checking whenever we need an update each time we enter guest context. * use a cpufreq notifier to track frequency changes and force kvmclock updates. * send ipis to kick cpu out of guest context if needed to make sure the guest doesn't see stale values. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: VMX: Use kvm_mmu_page_fault() handle EPT violation mmio	Sheng Yang	2009-03-24	1	-29/+1
\| \| \| \| \| \| \|	Removed duplicated code. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Drop unused evaluations from string pio handlers	Jan Kiszka	2009-03-24	2	-6/+2
\| \| \| \| \| \| \| \|	Looks like neither the direction nor the rep prefix are used anymore. Drop related evaluations from SVM's and VMX's I/O exit handlers. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Add FFXSR support	Alexander Graf	2009-03-24	2	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \|	AMD K10 CPUs implement the FFXSR feature that gets enabled using EFER. Let's check if the virtual CPU description includes that CPUID feature bit and allow enabling it then. This is required for Windows Server 2008 in Hyper-V mode. v2 adds CPUID capability exposure Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
*	x86: Add EFER descriptions for FFXSR	Alexander Graf	2009-03-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	AMD k10 includes support for the FFXSR feature, which leaves out XMM registers on FXSAVE/FXSAVE when the EFER_FFXSR bit is set in EFER. The CPUID feature bit exists already, but the EFER bit is missing currently, so this patch adds it to the list of known EFER bits. Signed-off-by: Alexander Graf <agraf@suse.de> CC: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: make irq ack notifications aware of routing table	Marcelo Tosatti	2009-03-24	2	-2/+5
\| \| \| \| \| \| \| \| \| \|	IRQ ack notifications assume an identity mapping between pin->gsi, which might not be the case with, for example, HPET. Translate before acking. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Gleb Natapov <gleb@redhat.com>
*	KVM: Avoid using CONFIG_ in userspace visible headers	Avi Kivity	2009-03-24	1	-0/+1
\| \| \| \| \| \| \| \|	Kconfig symbols are not available in userspace, and are not stripped by headers-install. Avoid their use by adding #defines in <asm/kvm.h> to suit each architecture. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Userspace controlled irq routing	Avi Kivity	2009-03-24	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently KVM has a static routing from GSI numbers to interrupts (namely, 0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to the IOAPIC). This is insufficient for several reasons: - HPET requires non 1:1 mapping for the timer interrupt - MSIs need a new method to assign interrupt numbers and dispatch them - ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the ioapics This patch implements an interrupt routing table (as a linked list, but this can be easily changed) and a userspace interface to replace the table. The routing table is initialized according to the current hardwired mapping. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: x86: Fix typos and whitespace errors	Amit Shah	2009-03-24	1	-17/+16
\| \| \| \| \| \| \|	Some typos, comments, whitespace errors corrected in the cpuid code Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Only enable cr4_pge role in shadow mode	Avi Kivity	2009-03-24	1	-1/+1
\| \| \| \| \| \|	Two dimensional paging is only confused by it. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Rename "metaphysical" attribute to "direct"	Avi Kivity	2009-03-24	3	-24/+25
\| \| \| \| \| \| \|	This actually describes what is going on, rather than alerting the reader that something strange is going on. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: drop zeroing on mmu_memory_cache_alloc	Marcelo Tosatti	2009-03-24	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Zeroing on mmu_memory_cache_alloc is unnecessary since: - Smaller areas are pre-allocated with kmem_cache_zalloc. - Page pointed by ->spt is overwritten with prefetch_page and entries in page pointed by ->gfns are initialized before reading. [avi: zeroing pages is unnecessary] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: SVM: Fix typo in has_svm()	Joe Perches	2009-03-24	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Reset PIT irq injection logic when the PIT IRQ is unmasked	Avi Kivity	2009-03-24	2	-0/+16
\| \| \| \| \| \| \| \| \| \| \|	While the PIT is masked the guest cannot ack the irq, so the reinject logic will never allow the interrupt to be injected. Fix by resetting the reinjection counters on unmask. Unbreaks Xen. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Add CONFIG_HAVE_KVM_IRQCHIP	Avi Kivity	2009-03-24	1	-0/+4
\| \| \| \| \| \| \|	Two KVM archs support irqchips and two don't. Add a Kconfig item to make selecting between the two models easier. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Optimize page unshadowing	Avi Kivity	2009-03-24	1	-3/+12
\| \| \| \| \| \| \|	Using kvm_mmu_lookup_page() will result in multiple scans of the hash chains; use hlist_for_each_entry_safe() to achieve a single scan instead. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: SVM: Add microcode patch level dummy	Alexander Graf	2009-03-24	1	-0/+3
\| \| \| \| \| \| \| \| \|	VMware ESX checks if the microcode level is correct when using a barcelona CPU, in order to see if it actually can use SVM. Let's tell it we're on the safe side... Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Properly lock PIT creation	Avi Kivity	2009-03-24	2	-2/+6
\| \| \| \| \| \|	Otherwise, two threads can create a PIT in parallel and cause a memory leak. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: x86 emulator: implement 'ret far' instruction (opcode 0xcb)	Avi Kivity	2009-03-24	1	-1/+25
\| \| \| \|	Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: VMX: When emulating on invalid vmx state, don't return to userspace ↵	Avi Kivity	2009-03-24	1	-2/+6
\| \| \| \| \| \| \| \| \|	unnecessarily If we aren't doing mmio there's no need to exit to userspace (which will just be confused). Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: x86 emulator: Make emulate_pop() a little more generic	Avi Kivity	2009-03-24	1	-9/+6
\| \| \| \| \| \| \|	Allow emulate_pop() to read into arbitrary memory rather than just the source operand. Needed for complicated instructions like far returns. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: VMX: Prevent exit handler from running if emulating due to invalid state	Avi Kivity	2009-03-24	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \|	If we've just emulated an instruction, we won't have any valid exit reason and associated information. Fix by moving the clearing of the emulation_required flag to the exit handler. This way the exit handler can notice that we've been emulating and abort early. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: VMX: don't clobber segment AR if emulating invalid state	Avi Kivity	2009-03-24	1	-1/+1
\| \| \| \| \| \| \|	The ususable bit is important for determining state validity; don't clobber it. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: VMX: Fix guest state validity checks	Avi Kivity	2009-03-24	1	-4/+14
\| \| \| \| \| \| \|	The vmx guest state validity checks are full of bugs. Make them conform to the manual. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Move struct kvm_pio_request into x86 kvm_host.h	Avi Kivity	2009-03-24	1	-0/+12
\| \| \| \| \| \|	This is an x86 specific stucture and has no business living in common code. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: PIT: provide an option to disable interrupt reinjection	Marcelo Tosatti	2009-03-24	4	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Certain clocks (such as TSC) in older 2.6 guests overaccount for lost ticks, causing severe time drift. Interrupt reinjection magnifies the problem. Provide an option to disable it. [avi: allow room for expansion in case we want to disable reinjection of other timers] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Fallback support for MSR_VM_HSAVE_PA	Avi Kivity	2009-03-24	1	-0/+2
\| \| \| \| \| \| \| \|	Since we advertise MSR_VM_HSAVE_PA, userspace will attempt to read it even on Intel. Implement fake support for this MSR to avoid the warnings. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: remove the vmap usage	Izik Eidus	2009-03-24	1	-49/+13
\| \| \| \| \| \| \| \|	vmap() on guest pages hides those pages from the Linux mm for an extended (userspace determined) amount of time. Get rid of it. Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: introduce kvm_read_guest_virt, kvm_write_guest_virt	Izik Eidus	2009-03-24	2	-18/+42
\| \| \| \| \| \| \| \| \|	This commit change the name of emulator_read_std into kvm_read_guest_virt, and add new function name kvm_write_guest_virt that allow writing into a guest virtual address. Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: VMX: initialize TSC offset relative to vm creation time	Marcelo Tosatti	2009-03-24	3	-8/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VMX initializes the TSC offset for each vcpu at different times, and also reinitializes it for vcpus other than 0 on APIC SIPI message. This bug causes the TSC's to appear unsynchronized in the guest, even if the host is good. Older Linux kernels don't handle the situation very well, so gettimeofday is likely to go backwards in time: http://www.mail-archive.com/kvm@vger.kernel.org/msg02955.html http://sourceforge.net/tracker/index.php?func=detail&aid=2025534&group_id=180599&atid=893831 Fix it by initializating the offset of each vcpu relative to vm creation time, and moving it from vmx_vcpu_reset to vmx_vcpu_setup, out of the APIC MP init path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Drop walk_shadow()	Avi Kivity	2009-03-24	1	-20/+0
\| \| \| \| \| \|	No longer used. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Replace walk_shadow() by for_each_shadow_entry() in invlpg()	Avi Kivity	2009-03-24	1	-49/+32
\| \| \| \|	Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Replace walk_shadow() by for_each_shadow_entry() in fetch()	Avi Kivity	2009-03-24	1	-70/+58
\| \| \| \| \| \| \|	Effectively reverting to the pre walk_shadow() version -- but now with the reusable for_each(). Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Use for_each_shadow_entry() in __direct_map()	Avi Kivity	2009-03-24	1	-54/+29
\| \| \| \| \| \|	Eliminating a callback and a useless structure. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Add for_each_shadow_entry(), a simpler alternative to walk_shadow()	Avi Kivity	2009-03-24	1	-20/+49
\| \| \| \| \| \| \| \| \|	Using a for_each loop style removes the need to write callback and nasty casts. Implement the walk_shadow() using the for_each_shadow_entry(). Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: Fix vmload and friends misinterpreted as lidt	Avi Kivity	2009-03-24	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \|	The AMD SVM instruction family all overload the 0f 01 /3 opcode, further multiplexing on the three r/m bits. But the code decided that anything that isn't a vmmcall must be an lidt (which shares the 0f 01 /3 opcode, for the case that mod = 3). Fix by aborting emulation if this isn't a vmmcall. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Initialize a shadow page's global attribute from cr4.pge	Avi Kivity	2009-03-24	1	-3/+1
\| \| \| \| \| \| \|	If cr4.pge is cleared, we ought to treat any ptes in the page as non-global. This allows us to remove the check from set_spte(). Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Segregate mmu pages created with different cr4.pge settings	Avi Kivity	2009-03-24	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Don't allow a vcpu with cr4.pge cleared to use a shadow page created with cr4.pge set; this might cause a cr3 switch not to sync ptes that have the global bit set (the global bit has no effect if !cr4.pge). This can only occur on smp with different cr4.pge settings for different vcpus (since a cr4 change will resync the shadow ptes), but there's no cost to being correct here. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: MMU: Inherit a shadow page's guest level count from vcpu setup	Avi Kivity	2009-03-24	2	-6/+12
\| \| \| \| \| \| \| \| \| \|	Instead of "calculating" it on every shadow page allocation, set it once when switching modes, and copy it when allocating pages. This doesn't buy us much, but sets up the stage for inheriting more information related to the mmu setup. Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: x86: Wire-up hardware breakpoints for guest debugging	Jan Kiszka	2009-03-24	3	-1/+23
\| \| \| \| \| \| \| \| \|	Add the remaining bits to make use of debug registers also for guest debugging, thus enabling the use of hardware breakpoints and watchpoints. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: x86: Virtualize debug registers	Jan Kiszka	2009-03-24	6	-96/+193
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So far KVM only had basic x86 debug register support, once introduced to realize guest debugging that way. The guest itself was not able to use those registers. This patch now adds (almost) full support for guest self-debugging via hardware registers. It refactors the code, moving generic parts out of SVM (VMX was already cleaned up by the KVM_SET_GUEST_DEBUG patches), and it ensures that the registers are properly switched between host and guest. This patch also prepares debug register usage by the host. The latter will (once wired-up by the following patch) allow for hardware breakpoints/watchpoints in guest code. If this is enabled, the guest will only see faked debug registers without functionality, but with content reflecting the guest's modifications. Tested on Intel only, but SVM /should/ work as well, but who knows... Known limitations: Trapping on tss switch won't work - most probably on Intel. Credits also go to Joerg Roedel - I used his once posted debugging series as platform for this patch. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: VMX: Allow single-stepping when uninterruptible	Jan Kiszka	2009-03-24	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When single-stepping over STI and MOV SS, we must clear the corresponding interruptibility bits in the guest state. Otherwise vmentry fails as it then expects bit 14 (BS) in pending debug exceptions being set, but that's not correct for the guest debugging case. Note that clearing those bits is safe as we check for interruptibility based on the original state and do not inject interrupts or NMIs if guest interruptibility was blocked. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: New guest debug interface	Jan Kiszka	2009-03-24	5	-73/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic part, controlling the "main switch" and the single-step feature. The arch specific part adds an x86 interface for intercepting both types of debug exceptions separately and re-injecting them when the host was not interested. Moveover, the foundation for guest debugging via debug registers is layed. To signal breakpoint events properly back to userland, an arch-specific data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block contains the PC, the debug exception, and relevant debug registers to tell debug events properly apart. The availability of this new interface is signaled by KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are provided. Note that both SVM and VTX are supported, but only the latter was tested yet. Based on the experience with all those VTX corner case, I would be fairly surprised if SVM will work out of the box. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: VMX: Support for injecting software exceptions	Jan Kiszka	2009-03-24	2	-16/+22
\| \| \| \| \| \| \| \| \| \|	VMX differentiates between processor and software generated exceptions when injecting them into the guest. Extend vmx_queue_exception accordingly (and refactor related constants) so that we can use this service reliably for the new guest debugging framework. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set	Alexander Graf	2009-03-24	1	-16/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Userspace has to tell the kernel module somehow that nested SVM should be used. The easiest way that doesn't break anything I could think of is to implement if (cpuid & svm) allow write to efer else deny write to efer Old userspaces mask the SVM capability bit, so they don't break. In order to find out that the SVM capability is set, I had to split the kvm_emulate_cpuid into a finding and an emulating part. (introduced in v6) Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: SVM: Allow setting the SVME bit	Alexander Graf	2009-03-24	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normally setting the SVME bit in EFER is not allowed, as we did not support SVM. Not since we do, we should also allow enabling SVM mode. v2 comes as last patch, so we don't enable half-ready code v4 introduces a module option to enable SVM v6 warns that nesting is enabled Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: SVM: Allow read access to MSR_VM_VR	Joerg Roedel	2009-03-24	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	KVM tries to read the VM_CR MSR to find out if SVM was disabled by the BIOS. So implement read support for this MSR to make nested SVM running. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: SVM: Add VMEXIT handler and intercepts	Alexander Graf	2009-03-24	1	-0/+293
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the #VMEXIT intercept, so we return to the level 1 guest when something happens in the level 2 guest that should return to the level 1 guest. v2 implements HIF handling and cleans up exception interception v3 adds support for V_INTR_MASKING_MASK v4 uses the host page hsave v5 removes IOPM merging code v6 moves mmu code out of the atomic section Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: SVM: Add VMRUN handler	Alexander Graf	2009-03-24	3	-2/+165
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements VMRUN. VMRUN enters a virtual CPU and runs that in the same context as the normal guest CPU would run. So basically it is implemented the same way, a normal CPU would do it. We also prepare all intercepts that get OR'ed with the original intercepts, as we do not allow a level 2 guest to be intercepted less than the first level guest. v2 implements the following improvements: - fixes the CPL check - does not allocate iopm when not used - remembers the host's IF in the HIF bit in the hflags v3: - make use of the new permission checking - add support for V_INTR_MASKING_MASK v4: - use host page backed hsave v5: - remove IOPM merging code v6: - save cr4 so PAE l1 guests work v7: - return 0 on vmrun so we check the MSRs too - fix MSR check to use the correct variable Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
*	KVM: SVM: Add VMLOAD and VMSAVE handlers	Alexander Graf	2009-03-24	1	-2/+58
\| \| \| \| \| \| \| \| \| \| \| \| \|	This implements the VMLOAD and VMSAVE instructions, that usually surround the VMRUN instructions. Both instructions load / restore the same elements, so we only need to implement them once. v2 fixes CPL checking and replaces memcpy by assignments v3 makes use of the new permission checking Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>