summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* xen/mmu: On early bootup, flush the TLB when changing RO->RW bits Xen ↵Konrad Rzeszutek Wilk2013-04-021-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | provided pagetables. Occassionaly on a DL380 G4 the guest would crash quite early with this: (XEN) d244:v0: unhandled page fault (ec=0003) (XEN) Pagetable walk from ffffffff84dc7000: (XEN) L4[0x1ff] = 00000000c3f18067 0000000000001789 (XEN) L3[0x1fe] = 00000000c3f14067 000000000000178d (XEN) L2[0x026] = 00000000dc8b2067 0000000000004def (XEN) L1[0x1c7] = 00100000dc8da067 0000000000004dc7 (XEN) domain_crash_sync called from entry.S (XEN) Domain 244 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.1.3OVM x86_64 debug=n Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: e033:[<ffffffff81263f22>] (XEN) RFLAGS: 0000000000000216 EM: 1 CONTEXT: pv guest (XEN) rax: 0000000000000000 rbx: ffffffff81785f88 rcx: 000000000000003f (XEN) rdx: 0000000000000000 rsi: 00000000dc8da063 rdi: ffffffff84dc7000 The offending code shows it to be a loop writting the value zero (%rax) in the %rdi (the L4 provided by Xen) register: 0: 44 00 00 add %r8b,(%rax) 3: 31 c0 xor %eax,%eax 5: b9 40 00 00 00 mov $0x40,%ecx a: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 11: 00 00 13: ff c9 dec %ecx 15:* 48 89 07 mov %rax,(%rdi) <-- trapping instruction 18: 48 89 47 08 mov %rax,0x8(%rdi) 1c: 48 89 47 10 mov %rax,0x10(%rdi) which fails. xen_setup_kernel_pagetable recycles some of the Xen's page-table entries when it has switched over to its Linux page-tables. Right before try to clear the page, we make a hypercall to change it from _RO to _RW and that works (otherwise we would hit an BUG()). And the _RW flag is set for that page: (XEN) L1[0x1c7] = 001000004885f067 0000000000004dc7 The error code is 3, so PFEC_page_present and PFEC_write_access, so page is present (correct), and we tried to write to the page, but a violation occurred. The one theory is that the the page entries in hardware (which are cached) are not up to date with what we just set. Especially as we have just done an CR3 write and flushed the multicalls. This patch does solve the problem by flusing out the TLB page entry after changing it from _RO to _RW and we don't hit this issue anymore. Fixed-Oracle-Bug: 16243091 [ON OCCASIONS VM START GOES INTO 'CRASH' STATE: CLEAR_PAGE+0X12 ON HP DL380 G4] Reported-and-Tested-by: Saar Maoz <Saar.Maoz@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/events: Handle VIRQ_TIMER before any other hardirq in event loop.Keir Fraser2013-04-021-4/+15
| | | | | | | | | | | | | | This avoids any other hardirq handler seeing a very stale jiffies value immediately after wakeup from a long idle period. The one observable symptom of this was a USB keyboard, with software keyboard repeat, which would always repeat a key immediately that it was pressed. This is due to the key press waking the guest, the key handler immediately runs, sees an old jiffies value, and then that jiffies value significantly updated, before the key is unpressed. Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/events: avoid race with raising an event in unmask_evtchn()David Vrabel2013-03-271-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | In unmask_evtchn(), when the mask bit is cleared after testing for pending and the event becomes pending between the test and clear, then the upcall will not become pending and the event may be lost or delayed. Avoid this by always clearing the mask bit before checking for pending. If a hypercall is needed, remask the event as EVTCHNOP_unmask will only retrigger pending events if they were masked. This fixes a regression introduced in 3.7 by b5e579232d635b79a3da052964cb357ccda8d9ea (xen/events: fix unmask_evtchn for PV on HVM guests) which reordered the clear mask and check pending operations. Changes in v2: - set mask before hypercall. Cc: stable@vger.kernel.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/mmu: Move the setting of pvops.write_cr3 to later phase in bootup.Konrad Rzeszutek Wilk2013-03-271-2/+1Star
| | | | | | | | | | | | | We move the setting of write_cr3 from the early bootup variant (see git commit 0cc9129d75ef8993702d97ab0e49542c15ac6ab9 "x86-64, xen, mmu: Provide an early version of write_cr3.") to a more appropiate location. This new location sets all of the other non-early variants of pvops calls - and most importantly is before the alternative_asm mechanism kicks in. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/acpi-stub: Disable it b/c the acpi_processor_add is no longer called.Konrad Rzeszutek Wilk2013-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the Xen ACPI stub code (CONFIG_XEN_STUB=y) enabled, the power C and P states are no longer uploaded to the hypervisor. The reason is that the Xen CPU hotplug code: xen-acpi-cpuhotplug.c and the xen-acpi-stub.c register themselves as the "processor" type object. That means the generic processor (processor_driver.c) stops working and it does not call (acpi_processor_add) which populates the per_cpu(processors, pr->id) = pr; structure. The 'pr' is gathered from the acpi_processor_get_info function which does the job of finding the C-states and figuring out PBLK address. The 'processors->pr' is then later used by xen-acpi-processor.c (the one that uploads C and P states to the hypervisor). Since it is NULL, we end skip the gathering of _PSD, _PSS, _PCT, etc and never upload the power management data. The end result is that enabling the CONFIG_XEN_STUB in the build means that xen-acpi-processor is not working anymore. This temporary patch fixes it by marking the XEN_STUB driver as BROKEN until this can be properly fixed. CC: jinsong.liu@intel.com Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen-pciback: notify hypervisor about devices intended to be assigned to guestsJan Beulich2013-03-224-18/+54
| | | | | | | | | | | | | | | | For MSI-X capable devices the hypervisor wants to write protect the MSI-X table and PBA, yet it can't assume that resources have been assigned to their final values at device enumeration time. Thus have pciback do that notification, as having the device controlled by it is a prerequisite to assigning the device to guests anyway. This is the kernel part of hypervisor side commit 4245d33 ("x86/MSI: add mechanism to fully protect MSI-X table from PV guest accesses") on the master branch of git://xenbits.xen.org/xen.git. CC: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/acpi-processor: Don't dereference struct acpi_processor on all CPUs.Konrad Rzeszutek Wilk2013-03-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With git commit c705c78c0d0835a4aa5d0d9a3422e3218462030c "acpi: Export the acpi_processor_get_performance_info" we are now using a different mechanism to access the P-states. The acpi_processor per-cpu structure is set and filtered by the core ACPI code which shrinks the per_cpu contents to only online CPUs. In the past we would call acpi_processor_register_performance() which would have not tried to dereference offline cpus. With the new patch and the fact that the loop we take is for for_all_possible_cpus we end up crashing on some machines. We could modify the loop to be for online_cpus - but all the other loops in the code use possible_cpus (for a good reason) - so lets leave it as so and just check if per_cpu(processor) is NULL. With this patch we will bypass the !online but possible CPUs. This fixes: IP: [<ffffffffa00d13b5>] xen_acpi_processor_init+0x1b6/0xe01 [xen_acpi_processor] PGD 4126e6067 PUD 4126e3067 PMD 0 Oops: 0002 [#1] SMP Pid: 432, comm: modprobe Not tainted 3.9.0-rc3+ #28 To be filled by O.E.M. To be filled by O.E.M./M5A97 RIP: e030:[<ffffffffa00d13b5>] [<ffffffffa00d13b5>] xen_acpi_processor_init+0x1b6/0xe01 [xen_acpi_processor] RSP: e02b:ffff88040c8a3ce8 EFLAGS: 00010282 .. snip.. Call Trace: [<ffffffffa00d11ff>] ? read_acpi_id+0x12b/0x12b [xen_acpi_processor] [<ffffffff8100215a>] do_one_initcall+0x12a/0x180 [<ffffffff810c42c3>] load_module+0x1cd3/0x2870 [<ffffffff81319b70>] ? ddebug_proc_open+0xc0/0xc0 [<ffffffff810c4f37>] sys_init_module+0xd7/0x120 [<ffffffff8166ce19>] system_call_fastpath+0x16/0x1b on some machines. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/acpi: remove redundant acpi/acpi_drivers.h includeLiu Jinsong2013-03-111-1/+0Star
| | | | | | | | | | | | | | | | | It's redundant since linux/acpi.h has include it when CONFIG_ACPI enabled, and when CONFIG_ACPI disabled it will trigger compiling warning In file included from drivers/xen/xen-stub.c:28:0: include/acpi/acpi_drivers.h:103:31: warning: 'struct acpi_device' declared inside parameter list [enabled by default] include/acpi/acpi_drivers.h:103:31: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default] include/acpi/acpi_drivers.h:107:43: warning: 'struct acpi_pci_root' declared inside parameter list [enabled by default] Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen: arm: mandate EABI and use generic atomic operations.Ian Campbell2013-03-112-22/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rob Herring has observed that c81611c4e96f "xen: event channel arrays are xen_ulong_t and not unsigned long" introduced a compile failure when building without CONFIG_AEABI: /tmp/ccJaIZOW.s: Assembler messages: /tmp/ccJaIZOW.s:831: Error: even register required -- `ldrexd r5,r6,[r4]' Will Deacon pointed out that this is because OABI does not require even base registers for 64-bit values. We can avoid this by simply using the existing atomic64_xchg operation and the same containerof trick as used by the cmpxchg macros. However since this code is used on memory which is shared with the hypervisor we require proper atomic instructions and cannot use the generic atomic64 callbacks (which are based on spinlocks), therefore add a dependency on !GENERIC_ATOMIC64. Since we already depend on !CPU_V6 there isn't much downside to this. While thinking about this we also observed that OABI has different struct alignment requirements to EABI, which is a problem for hypercall argument structs which are shared with the hypervisor and which must be in EABI layout. Since I don't expect people to want to run OABI kernels on Xen depend on CONFIG_AEABI explicitly too (although it also happens to be enforced by the !GENERIC_ATOMIC64 requirement too). Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Rob Herring <robherring2@gmail.com> Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* acpi: Export the acpi_processor_get_performance_infoKonrad Rzeszutek Wilk2013-03-063-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The git commit d5aaffa9dd531c978c6f3fea06a2972653bd7fc8 (cpufreq: handle cpufreq being disabled for all exported function) tightens the cpufreq API by returning errors when disable_cpufreq() had been called. The problem we are hitting is that the module xen-acpi-processor which uses the ACPI's functions: acpi_processor_register_performance, acpi_processor_preregister_performance, and acpi_processor_notify_smm fails at acpi_processor_register_performance with -22. Note that earlier during bootup in arch/x86/xen/setup.c there is also an call to cpufreq's API: disable_cpufreq(). This is b/c we want the Linux kernel to parse the ACPI data, but leave the cpufreq decisions to the hypervisor. In v3.9 all the checks that d5aaffa9dd531c978c6f3fea06a2972653bd7fc8 added are now hit and the calls to cpufreq_register_notifier will now fail. This means that acpi_processor_ppc_init ends up printing: "Warning: Processor Platform Limit not supported" and the acpi_processor_ppc_status is not set. The repercussions of that is that the call to acpi_processor_register_performance fails right away at: if (!(acpi_processor_ppc_status & PPC_REGISTERED)) and we don't progress any further on parsing and extracting the _P* objects. The only reason the Xen code called that function was b/c it was exported and the only way to gather the P-states. But we can also just make acpi_processor_get_performance_info be exported and not use acpi_processor_register_performance. This patch does so. Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/pciback: Don't disable a PCI device that is already disabled.Konrad Rzeszutek Wilk2013-03-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While shuting down a HVM guest with pci devices passed through we get this: pciback 0000:04:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100002) ------------[ cut here ]------------ WARNING: at drivers/pci/pci.c:1397 pci_disable_device+0x88/0xa0() Hardware name: MS-7640 Device pciback disabling already-disabled device Modules linked in: Pid: 53, comm: xenwatch Not tainted 3.9.0-rc1-20130304a+ #1 Call Trace: [<ffffffff8106994a>] warn_slowpath_common+0x7a/0xc0 [<ffffffff81069a31>] warn_slowpath_fmt+0x41/0x50 [<ffffffff813cf288>] pci_disable_device+0x88/0xa0 [<ffffffff814554a7>] xen_pcibk_reset_device+0x37/0xd0 [<ffffffff81454b6f>] ? pcistub_put_pci_dev+0x6f/0x120 [<ffffffff81454b8d>] pcistub_put_pci_dev+0x8d/0x120 [<ffffffff814582a9>] __xen_pcibk_release_devices+0x59/0xa0 This fixes the bug. CC: stable@vger.kernel.org Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xenbus: fix compile failure on ARM with Xen enabledSteven Noonan2013-03-011-0/+1
| | | | | | | | | | Adding an include of linux/mm.h resolves this: drivers/xen/xenbus/xenbus_client.c: In function ‘xenbus_map_ring_valloc_hvm’: drivers/xen/xenbus/xenbus_client.c:532:66: error: implicit declaration of function ‘page_to_section’ [-Werror=implicit-function-declaration] CC: stable@vger.kernel.org Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/pci: We don't do multiple MSI's.Konrad Rzeszutek Wilk2013-03-011-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no hypercall to setup multiple MSI per PCI device. As such with these two new commits: - 08261d87f7d1b6253ab3223756625a5c74532293 PCI/MSI: Enable multiple MSIs with pci_enable_msi_block_auto() - 5ca72c4f7c412c2002363218901eba5516c476b1 AHCI: Support multiple MSIs we would call the PHYSDEVOP_map_pirq 'nvec' times with the same contents of the PCI device. Sander discovered that we would get the same PIRQ value 'nvec' times and return said values to the caller. That of course meant that the device was configured only with one MSI and AHCI would fail with: ahci 0000:00:11.0: version 3.0 xen: registering gsi 19 triggering 0 polarity 1 xen: --> pirq=19 -> irq=19 (gsi=19) (XEN) [2013-02-27 19:43:07] IOAPIC[0]: Set PCI routing entry (6-19 -> 0x99 -> IRQ 19 Mode:1 Active:1) ahci 0000:00:11.0: AHCI 0001.0200 32 slots 4 ports 6 Gbps 0xf impl SATA mode ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part ahci: probe of 0000:00:11.0 failed with error -22 That is b/c in ahci_host_activate the second call to devm_request_threaded_irq would return -EINVAL as we passed in (on the second run) an IRQ that was never initialized. CC: stable@vger.kernel.org Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/pat: Disable PAT using pat_enabled value.Konrad Rzeszutek Wilk2013-02-281-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The git commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1 (xen/pat: Disable PAT support for now) explains in details why we want to disable PAT for right now. However that change was not enough and we should have also disabled the pat_enabled value. Otherwise we end up with: mmap-example:3481 map pfn expected mapping type write-back for [mem 0x00010000-0x00010fff], got uncached-minus ------------[ cut here ]------------ WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0() mem 0x00010000-0x00010fff], got uncached-minus ------------[ cut here ]------------ WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0() ... Pid: 3481, comm: mmap-example Tainted: GF 3.8.0-6-generic #13-Ubuntu Call Trace: [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0 [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100 [<ffffffff81157459>] unmap_vmas+0x49/0x90 [<ffffffff8115f808>] exit_mmap+0x98/0x170 [<ffffffff810559a4>] mmput+0x64/0x100 [<ffffffff810560f5>] dup_mm+0x445/0x660 [<ffffffff81056d9f>] copy_process.part.22+0xa5f/0x1510 [<ffffffff81057931>] do_fork+0x91/0x350 [<ffffffff81057c76>] sys_clone+0x16/0x20 [<ffffffff816ccbf9>] stub_clone+0x69/0x90 [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f ---[ end trace 4918cdd0a4c9fea4 ]--- (a similar message shows up if you end up launching 'mcelog') The call chain is (as analyzed by Liu, Jinsong): do_fork --> copy_process --> dup_mm --> dup_mmap --> copy_page_range --> track_pfn_copy --> reserve_pfn_range --> line 624: flags != want_flags It comes from different memory types of page table (_PAGE_CACHE_WB) and MTRR (_PAGE_CACHE_UC_MINUS). Stefan Bader dug in this deep and found out that: "That makes it clearer as this will do reserve_memtype(...) --> pat_x_mtrr_type --> mtrr_type_lookup --> __mtrr_type_lookup And that can return -1/0xff in case of MTRR not being enabled/initialized. Which is not the case (given there are no messages for it in dmesg). This is not equal to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS. It looks like the problem starts early in reserve_memtype: if (!pat_enabled) { /* This is identical to page table setting without PAT */ if (new_type) { if (req_type == _PAGE_CACHE_WC) *new_type = _PAGE_CACHE_UC_MINUS; else *new_type = req_type & _PAGE_CACHE_MASK; } return 0; } This would be what we want, that is clearing the PWT and PCD flags from the supported flags - if pat_enabled is disabled." This patch does that - disabling PAT. CC: stable@vger.kernel.org # 3.3 and further Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/acpi: xen cpu hotplug minor updatesLiu Jinsong2013-02-251-22/+12Star
| | | | | | | | | | | | Recently at native Rafael did some cleanup for acpi, say, drop acpi_bus_add, remove unnecessary argument of acpi_bus_scan, and run acpi_bus_scan under acpi_scan_lock. This patch does similar cleanup for xen cpu hotplug, removing redundant logic, and adding lock. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/acpi: xen memory hotplug minor updatesLiu Jinsong2013-02-251-26/+26
| | | | | | | | | | | | | | | | Dan Carpenter found current xen memory hotplug logic has potential issue: at func acpi_memory_get_device() *mem_device = acpi_driver_data(device); while the device may be NULL and then dereference. At native side, Rafael recently updated acpi_memory_get_device(), dropping acpi_bus_add, adding lock, and avoiding above issue. This patch updates xen memory hotplug logic accordingly, removing redundant logic, adding lock, and avoiding dereference. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* Merge tag 'stable/for-linus-3.9-rc0-tag' of ↵Linus Torvalds2013-02-2523-85/+1331
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "This has two new ACPI drivers for Xen - a physical CPU offline/online and a memory hotplug. The way this works is that ACPI kicks the drivers and they make the appropiate hypercall to the hypervisor to tell it that there is a new CPU or memory. There also some changes to the Xen ARM ABIs and couple of fixes. One particularly nasty bug in the Xen PV spinlock code was fixed by Stefan Bader - and has been there since the 2.6.32! Features: - Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor to be aware of new CPU and new DIMMs - Cleanups Bug-fixes: - Fixes a long-standing bug in the PV spinlock wherein we did not kick VCPUs that were in a tight loop. - Fixes in the error paths for the event channel machinery" Fix up a few semantic conflicts with the ACPI interface changes in drivers/xen/xen-acpi-{cpu,mem}hotplug.c. * tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: event channel arrays are xen_ulong_t and not unsigned long xen: Send spinlock IPI to all waiters xen: introduce xen_remap, use it instead of ioremap xen: close evtchn port if binding to irq fails xen-evtchn: correct comment and error output xen/tmem: Add missing %s in the printk statement. xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0 xen/acpi: ACPI cpu hotplug xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.h xen/stub: driver for CPU hotplug xen/acpi: ACPI memory hotplug xen/stub: driver for memory hotplug xen: implement updated XENMEM_add_to_physmap_range ABI xen/smp: Move the common CPU init code a bit to prep for PVH patch.
| * xen: event channel arrays are xen_ulong_t and not unsigned longIan Campbell2013-02-204-52/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On ARM we want these to be the same size on 32- and 64-bit. This is an ABI change on ARM. X86 does not change. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Keir (Xen.org) <keir@xen.org> Cc: Tim Deegan <tim@xen.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: linux-arm-kernel@lists.infradead.org Cc: xen-devel@lists.xen.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen: Send spinlock IPI to all waitersStefan Bader2013-02-201-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a loophole between Xen's current implementation of pv-spinlocks and the scheduler. This was triggerable through a testcase until v3.6 changed the TLB flushing code. The problem potentially is still there just not observable in the same way. What could happen was (is): 1. CPU n tries to schedule task x away and goes into a slow wait for the runq lock of CPU n-# (must be one with a lower number). 2. CPU n-#, while processing softirqs, tries to balance domains and goes into a slow wait for its own runq lock (for updating some records). Since this is a spin_lock_irqsave in softirq context, interrupts will be re-enabled for the duration of the poll_irq hypercall used by Xen. 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives an interrupt (e.g. endio) and when processing the interrupt, tries to wake up task x. But that is in schedule and still on_cpu, so try_to_wake_up goes into a tight loop. 4. The runq lock of CPU n-# gets unlocked, but the message only gets sent to the first waiter, which is CPU n-# and that is busily stuck. 5. CPU n-# never returns from the nested interruption to take and release the lock because the scheduler uses a busy wait. And CPU n never finishes the task migration because the unlock notification only went to CPU n-#. To avoid this and since the unlocking code has no real sense of which waiter is best suited to grab the lock, just send the IPI to all of them. This causes the waiters to return from the hyper- call (those not interrupted at least) and do active spinlocking. BugLink: http://bugs.launchpad.net/bugs/1011792 Acked-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen: introduce xen_remap, use it instead of ioremapStefano Stabellini2013-02-205-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | ioremap can't be used to map ring pages on ARM because it uses device memory caching attributes (MT_DEVICE*). Introduce a Xen specific abstraction to map ring pages, called xen_remap, that is defined as ioremap on x86 (no behavioral changes). On ARM it explicitly calls __arm_ioremap with the right caching attributes: MT_MEMORY. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen: close evtchn port if binding to irq failsWei Liu2013-02-201-0/+10
| | | | | | | | | | | | CC: stable@vger.kernel.org Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen-evtchn: correct comment and error outputWei Liu2013-02-201-2/+2
| | | | | | | | | | | | | | | | The evtchn device has been moved to /dev/xen. Also change log level to KERN_ERR as other xen drivers. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/tmem: Add missing %s in the printk statement.Konrad Rzeszutek Wilk2013-02-201-1/+1
| | | | | | | | | | | | Seems that it got lost. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0Liu Jinsong2013-02-201-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | To avoid compile issue and it's meanigfull only under CONFIG_XEN_DOM0. In file included from linux/arch/x86/xen/enlighten.c:47:0: linux/include/xen/acpi.h:75:76: error: unknown type name ‘acpi_handle’ make[3]: *** [arch/x86/xen/enlighten.o] Error 1 Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> [v1: Fixed spelling mistakes] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/acpi: ACPI cpu hotplugLiu Jinsong2013-02-206-0/+530
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implement real Xen ACPI cpu hotplug driver as module. When loaded, it replaces Xen stub driver. For booting existed cpus, the driver enumerates them. For hotadded cpus, which added at runtime and notify OS via device or container event, the driver is invoked to add them, parsing cpu information, hypercalling to Xen hypervisor to add them, and finally setting up new /sys interface for them. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.hLiu Jinsong2013-02-202-18/+18
| | | | | | | | | | | | | | So that it could be reused by Xen CPU hotplug logic. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/stub: driver for CPU hotplugLiu Jinsong2013-02-202-2/+44
| | | | | | | | | | | | | | | | Add Xen stub driver for CPU hotplug, early occupy to block native, will be replaced later by real Xen processor driver module. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/acpi: ACPI memory hotplugLiu Jinsong2013-02-204-4/+522
| | | | | | | | | | | | | | | | | | | | | | | | This patch implements real Xen acpi memory hotplug driver as module. When loaded, it replaces Xen stub driver. When an acpi memory device hotadd event occurs, it notifies OS and invokes notification callback, adding related memory device and parsing memory information, finally hypercall to xen hypervisor to add memory. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/stub: driver for memory hotplugLiu Jinsong2013-02-204-0/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch create a file (xen-stub.c) for Xen stub drivers. Xen stub drivers are used to reserve space for Xen drivers, i.e. memory hotplug and cpu hotplug, and to block native drivers loaded, so that real Xen drivers can be modular and loaded on demand. This patch is specific for Xen memory hotplug (other Xen logic can add stub drivers on their own). The xen stub driver will occupied earlier via subsys_initcall (than native memory hotplug driver via module_init and so blocking native). Later real Xen memory hotplug logic will unregister the stub driver and register itself to take effect on demand. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen: implement updated XENMEM_add_to_physmap_range ABIIan Campbell2013-02-202-3/+11
| | | | | | | | | | | | | | | | | | | | Allows for more fine grained error reporting. Only used by PVH and ARM both of which are marked EXPERIMENTAL precisely because the ABI is not yet stable Signed-off-by: Ian Campbell <ian.campbell@citrix.com> [v1: Rebased without PVH patches] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/smp: Move the common CPU init code a bit to prep for PVH patch.Konrad Rzeszutek Wilk2013-02-201-19/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | The PV and PVH code CPU init code share some functionality. The PVH code ("xen/pvh: Extend vcpu_guest_context, p2m, event, and XenBus") sets some of these up, but not all. To make it easier to read, this patch removes the PV specific out of the generic way. No functional change - just code movement. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v2: Fixed compile errors noticed by Fengguang Wu build system] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-02-2465-1671/+4359
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Marcelo Tosatti: "KVM updates for the 3.9 merge window, including x86 real mode emulation fixes, stronger memory slot interface restrictions, mmu_lock spinlock hold time reduction, improved handling of large page faults on shadow, initial APICv HW acceleration support, s390 channel IO based virtio, amongst others" * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) Revert "KVM: MMU: lazily drop large spte" x86: pvclock kvm: align allocation size to page size KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr x86 emulator: fix parity calculation for AAD instruction KVM: PPC: BookE: Handle alignment interrupts booke: Added DBCR4 SPR number KVM: PPC: booke: Allow multiple exception types KVM: PPC: booke: use vcpu reference from thread_struct KVM: Remove user_alloc from struct kvm_memory_slot KVM: VMX: disable apicv by default KVM: s390: Fix handling of iscs. KVM: MMU: cleanup __direct_map KVM: MMU: remove pt_access in mmu_set_spte KVM: MMU: cleanup mapping-level KVM: MMU: lazily drop large spte KVM: VMX: cleanup vmx_set_cr0(). KVM: VMX: add missing exit names to VMX_EXIT_REASONS array KVM: VMX: disable SMEP feature when guest is in non-paging mode KVM: Remove duplicate text in api.txt Revert "KVM: MMU: split kvm_mmu_free_page" ...
| * | Revert "KVM: MMU: lazily drop large spte"Marcelo Tosatti2013-02-201-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit caf6900f2d8aaebe404c976753f6813ccd31d95e. It is causing migration failures, reference https://bugzilla.kernel.org/show_bug.cgi?id=54061. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | x86: pvclock kvm: align allocation size to page sizeMarcelo Tosatti2013-02-191-5/+6
| | | | | | | | | | | | | | | | | | | | | To match whats mapped via vsyscalls to userspace. Reported-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | Merge commit 'origin/next' into kvm-ppc-nextAlexander Graf2013-02-158-87/+66Star
| |\ \
| | * | KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msrJan Kiszka2013-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We already pass vmcs12 as argument. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | x86 emulator: fix parity calculation for AAD instructionGleb Natapov2013-02-131-8/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Reported-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | KVM: Remove user_alloc from struct kvm_memory_slotTakuya Yoshikawa2013-02-113-23/+16Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This field was needed to differentiate memory slots created by the new API, KVM_SET_USER_MEMORY_REGION, from those by the old equivalent, KVM_SET_MEMORY_REGION, whose support was dropped long before: commit b74a07beed0e64bfba413dcb70dd6749c57f43dc KVM: Remove kernel-allocated memory regions Although we also have private memory slots to which KVM allocates memory with vm_mmap(), !user_alloc slots in other words, the slot id should be enough for differentiating them. Note: corresponding function parameters will be removed later. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | KVM: VMX: disable apicv by defaultYang Zhang2013-02-111-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without Posted Interrupt, current code is broken. Just disable by default until Posted Interrupt is ready. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | KVM: s390: Fix handling of iscs.Cornelia Huck2013-02-111-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two ways to express an interruption subclass: - As a bitmask, as used in cr6. - As a number, as used in the I/O interruption word. Unfortunately, we have treated the I/O interruption word as if it contained the bitmask as well, which went unnoticed so far as - (not-yet-released) qemu made the same mistake, and - Linux guest kernels don't check the isc value in the I/O interruption word for subchannel interrupts. Make sure that we treat the I/O interruption word correctly. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| | * | KVM: MMU: cleanup __direct_mapXiao Guangrong2013-02-071-8/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use link_shadow_page to link the sp to the spte in __direct_map Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: remove pt_access in mmu_set_spteXiao Guangrong2013-02-072-15/+11Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is only used in debug code, so drop it Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: cleanup mapping-levelXiao Guangrong2013-02-071-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use min() to cleanup mapping_level Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: MMU: lazily drop large spteXiao Guangrong2013-02-071-16/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, kvm zaps the large spte if write-protected is needed, the later read can fault on that spte. Actually, we can make the large spte readonly instead of making them not present, the page fault caused by read access can be avoided The idea is from Avi: | As I mentioned before, write-protecting a large spte is a good idea, | since it moves some work from protect-time to fault-time, so it reduces | jitter. This removes the need for the return value. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | KVM: VMX: cleanup vmx_set_cr0().Gleb Natapov2013-02-071-9/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calculating hw_cr0 teh current code masks bits that should be always on and re-adds them back immediately after. Cleanup the code by masking only those bits that should be dropped from hw_cr0. This allow us to get rid of some defines. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | KVM: PPC: BookE: Handle alignment interruptsAlexander Graf2013-02-132-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the guest triggers an alignment interrupt, we don't handle it properly today and instead BUG_ON(). This really shouldn't happen. Instead, we should just pass the interrupt back into the guest so it can deal with it. Reported-by: Gao Guanhua-B22826 <B22826@freescale.com> Tested-by: Gao Guanhua-B22826 <B22826@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * | | booke: Added DBCR4 SPR numberBharat Bhushan2013-02-131-0/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * | | KVM: PPC: booke: Allow multiple exception typesBharat Bhushan2013-02-135-16/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current kvmppc_booke_handlers uses the same macro (KVM_HANDLER) and all handlers are considered to be the same size. This will not be the case if we want to use different macros for different handlers. This patch improves the kvmppc_booke_handler so that it can support different macros for different handlers. Signed-off-by: Liu Yu <yu.liu@freescale.com> [bharat.bhushan@freescale.com: Substantial changes] Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * | | KVM: PPC: booke: use vcpu reference from thread_structBharat Bhushan2013-02-133-7/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Like other places, use thread_struct to get vcpu reference. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * | | Merge commit 'origin/next' into kvm-ppc-nextAlexander Graf2013-02-1323-166/+803
| |\| |