summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* KVM: MMU: Discard reserved bits checking on PDE bit 7-8Sheng Yang2009-06-101-3/+4
| | | | | | | | | | | | | | | | 1. It's related to a Linux kernel bug which fixed by Ingo on 07a66d7c53a538e1a9759954a82bb6c07365eff9. The original code exists for quite a long time, and it would convert a PDE for large page into a normal PDE. But it fail to fit normal PDE well. With the code before Ingo's fix, the kernel would fall reserved bit checking with bit 8 - the remaining global bit of PTE. So the kernel would receive a double-fault. 2. After discussion, we decide to discard PDE bit 7-8 reserved checking for now. For this marked as reserved in SDM, but didn't checked by the processor in fact... Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix unneeded instruction skipping during task switching.Gleb Natapov2009-06-104-18/+51
| | | | | | | | | | There is no need to skip instruction if the reason for a task switch is a task gate in IDT and access to it is caused by an external even. The problem is currently solved only for VMX since there is no reliable way to skip an instruction in SVM. We should emulate it instead. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix task switch back link handling.Gleb Natapov2009-06-101-8/+32
| | | | | | | Back link is written to a wrong TSS now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Do not zero idt_vectoring_info in vmx_complete_interrupts().Gleb Natapov2009-06-101-7/+0Star
| | | | | | | | | | We will need it later in task_switch(). Code in handle_exception() is dead. is_external_interrupt(vect_info) will always be false since idt_vectoring_info is zeroed in vmx_complete_interrupts(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Rewrite vmx_complete_interrupt()'s twisted maze of if() statementsGleb Natapov2009-06-101-18/+25
| | | | | | | | | | | ...with a more straightforward switch(). Also fix a bug when NMI could be dropped on exit. Although this should never happen in practice, since NMIs can only be injected, never triggered internally by the guest like exceptions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fix handling of a fault during NMI unblocked due to IRETGleb Natapov2009-06-101-6/+11
| | | | | | | | | Bit 12 is undefined in any of the following cases: If the VM exit sets the valid bit in the IDT-vectoring information field. If the VM exit is due to a double fault. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use rsvd_bits_mask in load_pdptrs()Dong, Eddie2009-06-103-6/+10
| | | | | | | Also remove bit 5-6 from rsvd_bits_mask per latest SDM. Signed-off-by: Eddie Dong <Eddie.Dong@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fix feature testingSheng Yang2009-06-101-9/+9
| | | | | | | The testing of feature is too early now, before vmcs_config complete initialization. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Clean up Flex Priority relatedSheng Yang2009-06-101-17/+30
| | | | | | | And clean paranthes on returns. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: remove pointless conditional before kfree() in lapic initializationWei Yongjun2009-06-101-2/+1Star
| | | | | | | Remove pointless conditional before kfree(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Use different shadows when EFER.NXE changesAvi Kivity2009-06-102-0/+4
| | | | | | | | | | | | | A pte that is shadowed when the guest EFER.NXE=1 is not valid when EFER.NXE=0; if bit 63 is set, the pte should cause a fault, and since the shadow EFER always has NX enabled, this won't happen. Fix by using a different shadow page table for different EFER.NXE bits. This allows vcpus to run correctly with different values of EFER.NXE, and for transitions on this bit to be handled correctly without requiring a full flush. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Emulate #PF error code of reserved bits violationDong, Eddie2009-06-104-0/+88
| | | | | | | | | | | Detect, indicate, and propagate page faults where reserved bits are set. Take care to handle the different paging modes, each of which has different sets of reserved bits. [avi: fix pte reserved bits for efer.nxe=0] Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ia64: enable external interrupt in vmmYang Zhang2009-06-103-10/+16
| | | | | | | | | Currently, the interrupt enable bit is cleared when in the vmm. This patch sets the bit and the external interrupts can be dealt with when in the vmm. This improves the I/O performance. Signed-off-by: Yang Zhang <yang.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Fix comment in page_fault()Eddie Dong2009-06-101-1/+1
| | | | | | | | The original one is for the code before refactoring. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Correct wrong vmcs field sizesSheng Yang2009-06-101-6/+6
| | | | | | | EXIT_QUALIFICATION and GUEST_LINEAR_ADDRESS are natural width, not 64-bit. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Make flexpriority module parameter reflect hardware capabilityAvi Kivity2009-06-101-3/+4
| | | | | | If the hardware does not support flexpriority, zero the module parameter. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix interrupt unhalting a vcpu when it shouldn'tGleb Natapov2009-06-107-2/+41
| | | | | | | | kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Timer event should not unconditionally unhalt vcpu.Gleb Natapov2009-06-102-26/+37
| | | | | | | | | | Currently timer events are processed before entering guest mode. Move it to main vcpu event loop since timer events should be processed even while vcpu is halted. Timer may cause interrupt/nmi to be injected and only then vcpu will be unhalted. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fold vm_need_ept() into callersAvi Kivity2009-06-101-19/+14Star
| | | | | | Trivial. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Zero ept module parameter if ept is not presentAvi Kivity2009-06-101-1/+4
| | | | | | Allows reading back hardware capability. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Zero the vpid module parameter if vpid is not supportedAvi Kivity2009-06-101-1/+4
| | | | | | This allows reading back how the hardware is configured. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Annotate module parameters as __read_mostlyAvi Kivity2009-06-101-5/+5
| | | | Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Simplify module parameter namesAvi Kivity2009-06-101-3/+3
| | | | | | Instead of 'enable_vpid=1', use a simple 'vpid=1'. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Rename kvm_handle_exit() to vmx_handle_exit()Avi Kivity2009-06-101-2/+2
| | | | | | It is a static vmx-specific function. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Make module parameters readableAvi Kivity2009-06-101-5/+5
| | | | | | Useful to see how the module was loaded. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: reuse (pop|push)_irq from svm.c in vmx.cGleb Natapov2009-06-103-36/+24Star
| | | | | | | | The prioritized bit vector manipulation functions are useful in both vmx and svm. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Remove duplicate code in svm_do_inject_vector()Gleb Natapov2009-06-101-9/+1Star
| | | | | | | svm_do_inject_vector() reimplements pop_irq(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: Ignore reads to EVNTSEL MSRsAmit Shah2009-06-101-0/+2
| | | | | | | | | | | | We ignore writes to the performance counters and performance event selector registers already. Kaspersky antivirus reads the eventsel MSR causing it to crash with the current behaviour. Return 0 as data when the eventsel registers are read to stop the crash. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: do not free active mmu pages in free_mmu_pages()Gleb Natapov2009-06-101-8/+0Star
| | | | | | | | free_mmu_pages() should only undo what alloc_mmu_pages() does. Free mmu pages from the generic VM destruction function, kvm_destroy_vm(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Device assignment framework reworkSheng Yang2009-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | After discussion with Marcelo, we decided to rework device assignment framework together. The old problems are kernel logic is unnecessary complex. So Marcelo suggest to split it into a more elegant way: 1. Split host IRQ assign and guest IRQ assign. And userspace determine the combination. Also discard msi2intx parameter, userspace can specific KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to enable MSI to INTx convertion. 2. Split assign IRQ and deassign IRQ. Import two new ioctls: KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ. This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the old interface). [avi: replace homemade bitcount() by hweight_long()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: make 'lapic_timer_ops' and 'kpit_ops' staticHannes Eder2009-06-102-2/+2
| | | | | | | | | Fix this sparse warnings: arch/x86/kvm/lapic.c:916:22: warning: symbol 'lapic_timer_ops' was not declared. Should it be static? arch/x86/kvm/i8254.c:268:22: warning: symbol 'kpit_ops' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ia64: Drop in SN2 replacement of fast path ITC emulation fault handlerJes Sorensen2009-06-104-5/+71
| | | | | | | | | | Copy in SN2 RTC based ITC emulation for fast exit. The two versions have the same size, so a dropin is simpler than patching the branch instruction to hit the SN2 version. Signed-off-by: Jes Sorensen <jes@sgi.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ia64: SN2 adjust emulated ITC frequency to match RTC frequencyJes Sorensen2009-06-101-1/+27
| | | | | | | | | On SN2 do not pass down the real ITC frequency, but rather patch the values to match the SN2 RTC frequency. Signed-off-by: Jes Sorensen <jes@sgi.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ia64: Create inline function kvm_get_itc() to centralize ITC reading.Jes Sorensen2009-06-102-6/+32
| | | | | | | | | | | Move all reading of special register 'AR_ITC' into two functions, one in the kernel and one in the VMM module. When running on SN2, base the result on the RTC rather the system ITC, as the ITC isn't synchronized. Signed-off-by: Jes Sorensen <jes@sgi.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ia64: Map in SN2 RTC registers to the VMM moduleJes Sorensen2009-06-103-5/+42
| | | | | | | | | On SN2, map in the SN2 RTC registers to the VMM module, needed for ITC emulation. Signed-off-by: Jes Sorensen <jes@sgi.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: APIC: get rid of deliver_bitmaskGleb Natapov2009-06-104-61/+38Star
| | | | | | | | Deliver interrupt during destination matching loop. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: change the way how lowest priority vcpu is calculatedGleb Natapov2009-06-104-51/+10Star
| | | | | | | | | The new way does not require additional loop over vcpus to calculate the one with lowest priority as one is chosen during delivery bitmap construction. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: consolidate ioapic/ipi interrupt delivery logicGleb Natapov2009-06-104-43/+39Star
| | | | | | | | | | Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in apic_send_ipi() to figure out ipi destination instead of reimplementing the logic. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: APIC: kvm_apic_set_irq deliver all kinds of interruptsGleb Natapov2009-06-105-25/+35
| | | | | | | Get rid of ioapic_inj_irq() and ioapic_inj_nmi() functions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: remove call to kvm_mmu_pte_write from walk_addrJoerg Roedel2009-06-101-1/+0Star
| | | | | | | | There is no reason to update the shadow pte here because the guest pte is only changed to dirty state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: ia64: fix compilation error in kvm_get_lowest_prio_vcpuYang Zhang2009-06-101-1/+1
| | | | | | | | Modify the arg of kvm_get_lowest_prio_vcpu(). Make it consistent with its declaration. Signed-off-by: Yang Zhang <yang.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: unify part of generic timer handlingMarcelo Tosatti2009-06-107-108/+129
| | | | | | | | | Hide the internals of vcpu awakening / injection from the in-kernel emulated timers. This makes future changes in this logic easier and decreases the distance to more generic timer handling. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PIT: remove usage of count_load_time for channel 0Marcelo Tosatti2009-06-101-6/+31
| | | | | | | We can infer elapsed time from hrtimer_expires_remaining. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PIT: remove unused scheduled variableMarcelo Tosatti2009-06-102-2/+0Star
| | | | | | | Unused. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: paravirt skip pit-through-ioapic boot checkMarcelo Tosatti2009-06-101-0/+4
| | | | | | | | Skip the test which checks if the PIT is properly routed when using the IOAPIC, aimed at buggy hardware. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: silence preempt warning on kvm_write_guest_timeMatt T. Yourst2009-06-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This issue just appeared in kvm-84 when running on 2.6.28.7 (x86-64) with PREEMPT enabled. We're getting syslog warnings like this many (but not all) times qemu tells KVM to run the VCPU: BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-x86/28938 caller is kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] Pid: 28938, comm: qemu-system-x86 2.6.28.7-mtyrel-64bit Call Trace: debug_smp_processor_id+0xf7/0x100 kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] ? __wake_up+0x4e/0x70 ? wake_futex+0x27/0x40 kvm_vcpu_ioctl+0x2e9/0x5a0 [kvm] enqueue_hrtimer+0x8a/0x110 _spin_unlock_irqrestore+0x27/0x50 vfs_ioctl+0x31/0xa0 do_vfs_ioctl+0x74/0x480 sys_futex+0xb4/0x140 sys_ioctl+0x99/0xa0 system_call_fastpath+0x16/0x1b As it turns out, the call trace is messed up due to gcc's inlining, but I isolated the problem anyway: kvm_write_guest_time() is being used in a non-thread-safe manner on preemptable kernels. Basically kvm_write_guest_time()'s body needs to be surrounded by preempt_disable() and preempt_enable(), since the kernel won't let us query any per-CPU data (indirectly using smp_processor_id()) without preemption disabled. The attached patch fixes this issue by disabling preemption inside kvm_write_guest_time(). [marcelo: surround only __get_cpu_var calls since the warning is harmless] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Enable MSI-X for KVM assigned deviceSheng Yang2009-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch finally enable MSI-X. What we need for MSI-X: 1. Intercept one page in MMIO region of device. So that we can get guest desired MSI-X table and set up the real one. Now this have been done by guest, and transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY. 2. Information for incoming interrupt. Now one device can have more than one interrupt, and they are all handled by one workqueue structure. So we need to identify them. The previous patch enable gsi_msg_pending_bitmap get this done. 3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X message address/data. We used same entry number for the host and guest here, so that it's easy to find the correlated guest gsi. What we lack for now: 1. The PCI spec said nothing can existed with MSI-X table in the same page of MMIO region, except pending bits. The patch ignore pending bits as the first step (so they are always 0 - no pending). 2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch didn't support this, and Linux also don't work in this way. 3. The patch didn't implement MSI-X mask all and mask single entry. I would implement the former in driver/pci/msi.c later. And for single entry, userspace should have reposibility to handle it. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: bit ops for deliver_bitmapSheng Yang2009-06-101-3/+4
| | | | | | | It's also convenient when we extend KVM supported vcpu number in the future. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Update intr delivery func to accept unsigned long* bitmapSheng Yang2009-06-101-4/+4
| | | | | | | | Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is increased. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Don't intercept MSR_KERNEL_GS_BASEAvi Kivity2009-06-101-14/+43
| | | | | | | | | | | | | Windows 2008 accesses this MSR often on context switch intensive workloads; since we run in guest context with the guest MSR value loaded (so swapgs can work correctly), we can simply disable interception of rdmsr/wrmsr for this MSR. A complication occurs since in legacy mode, we run with the host MSR value loaded. In this case we enable interception. This means we need two MSR bitmaps, one for legacy mode and one for long mode. Signed-off-by: Avi Kivity <avi@redhat.com>