summaryrefslogtreecommitdiffstats
path: root/arch/x86/xen
Commit message (Collapse)AuthorAgeFilesLines
* xen/mmu: On early bootup, flush the TLB when changing RO->RW bits Xen ↵Konrad Rzeszutek Wilk2013-04-021-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | provided pagetables. Occassionaly on a DL380 G4 the guest would crash quite early with this: (XEN) d244:v0: unhandled page fault (ec=0003) (XEN) Pagetable walk from ffffffff84dc7000: (XEN) L4[0x1ff] = 00000000c3f18067 0000000000001789 (XEN) L3[0x1fe] = 00000000c3f14067 000000000000178d (XEN) L2[0x026] = 00000000dc8b2067 0000000000004def (XEN) L1[0x1c7] = 00100000dc8da067 0000000000004dc7 (XEN) domain_crash_sync called from entry.S (XEN) Domain 244 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.1.3OVM x86_64 debug=n Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: e033:[<ffffffff81263f22>] (XEN) RFLAGS: 0000000000000216 EM: 1 CONTEXT: pv guest (XEN) rax: 0000000000000000 rbx: ffffffff81785f88 rcx: 000000000000003f (XEN) rdx: 0000000000000000 rsi: 00000000dc8da063 rdi: ffffffff84dc7000 The offending code shows it to be a loop writting the value zero (%rax) in the %rdi (the L4 provided by Xen) register: 0: 44 00 00 add %r8b,(%rax) 3: 31 c0 xor %eax,%eax 5: b9 40 00 00 00 mov $0x40,%ecx a: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 11: 00 00 13: ff c9 dec %ecx 15:* 48 89 07 mov %rax,(%rdi) <-- trapping instruction 18: 48 89 47 08 mov %rax,0x8(%rdi) 1c: 48 89 47 10 mov %rax,0x10(%rdi) which fails. xen_setup_kernel_pagetable recycles some of the Xen's page-table entries when it has switched over to its Linux page-tables. Right before try to clear the page, we make a hypercall to change it from _RO to _RW and that works (otherwise we would hit an BUG()). And the _RW flag is set for that page: (XEN) L1[0x1c7] = 001000004885f067 0000000000004dc7 The error code is 3, so PFEC_page_present and PFEC_write_access, so page is present (correct), and we tried to write to the page, but a violation occurred. The one theory is that the the page entries in hardware (which are cached) are not up to date with what we just set. Especially as we have just done an CR3 write and flushed the multicalls. This patch does solve the problem by flusing out the TLB page entry after changing it from _RO to _RW and we don't hit this issue anymore. Fixed-Oracle-Bug: 16243091 [ON OCCASIONS VM START GOES INTO 'CRASH' STATE: CLEAR_PAGE+0X12 ON HP DL380 G4] Reported-and-Tested-by: Saar Maoz <Saar.Maoz@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/mmu: Move the setting of pvops.write_cr3 to later phase in bootup.Konrad Rzeszutek Wilk2013-03-271-2/+1Star
| | | | | | | | | | | | | We move the setting of write_cr3 from the early bootup variant (see git commit 0cc9129d75ef8993702d97ab0e49542c15ac6ab9 "x86-64, xen, mmu: Provide an early version of write_cr3.") to a more appropiate location. This new location sets all of the other non-early variants of pvops calls - and most importantly is before the alternative_asm mechanism kicks in. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/pat: Disable PAT using pat_enabled value.Konrad Rzeszutek Wilk2013-02-281-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The git commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1 (xen/pat: Disable PAT support for now) explains in details why we want to disable PAT for right now. However that change was not enough and we should have also disabled the pat_enabled value. Otherwise we end up with: mmap-example:3481 map pfn expected mapping type write-back for [mem 0x00010000-0x00010fff], got uncached-minus ------------[ cut here ]------------ WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0() mem 0x00010000-0x00010fff], got uncached-minus ------------[ cut here ]------------ WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0() ... Pid: 3481, comm: mmap-example Tainted: GF 3.8.0-6-generic #13-Ubuntu Call Trace: [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0 [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100 [<ffffffff81157459>] unmap_vmas+0x49/0x90 [<ffffffff8115f808>] exit_mmap+0x98/0x170 [<ffffffff810559a4>] mmput+0x64/0x100 [<ffffffff810560f5>] dup_mm+0x445/0x660 [<ffffffff81056d9f>] copy_process.part.22+0xa5f/0x1510 [<ffffffff81057931>] do_fork+0x91/0x350 [<ffffffff81057c76>] sys_clone+0x16/0x20 [<ffffffff816ccbf9>] stub_clone+0x69/0x90 [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f ---[ end trace 4918cdd0a4c9fea4 ]--- (a similar message shows up if you end up launching 'mcelog') The call chain is (as analyzed by Liu, Jinsong): do_fork --> copy_process --> dup_mm --> dup_mmap --> copy_page_range --> track_pfn_copy --> reserve_pfn_range --> line 624: flags != want_flags It comes from different memory types of page table (_PAGE_CACHE_WB) and MTRR (_PAGE_CACHE_UC_MINUS). Stefan Bader dug in this deep and found out that: "That makes it clearer as this will do reserve_memtype(...) --> pat_x_mtrr_type --> mtrr_type_lookup --> __mtrr_type_lookup And that can return -1/0xff in case of MTRR not being enabled/initialized. Which is not the case (given there are no messages for it in dmesg). This is not equal to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS. It looks like the problem starts early in reserve_memtype: if (!pat_enabled) { /* This is identical to page table setting without PAT */ if (new_type) { if (req_type == _PAGE_CACHE_WC) *new_type = _PAGE_CACHE_UC_MINUS; else *new_type = req_type & _PAGE_CACHE_MASK; } return 0; } This would be what we want, that is clearing the PWT and PCD flags from the supported flags - if pat_enabled is disabled." This patch does that - disabling PAT. CC: stable@vger.kernel.org # 3.3 and further Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* Merge tag 'stable/for-linus-3.9-rc0-tag' of ↵Linus Torvalds2013-02-252-20/+23
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "This has two new ACPI drivers for Xen - a physical CPU offline/online and a memory hotplug. The way this works is that ACPI kicks the drivers and they make the appropiate hypercall to the hypervisor to tell it that there is a new CPU or memory. There also some changes to the Xen ARM ABIs and couple of fixes. One particularly nasty bug in the Xen PV spinlock code was fixed by Stefan Bader - and has been there since the 2.6.32! Features: - Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor to be aware of new CPU and new DIMMs - Cleanups Bug-fixes: - Fixes a long-standing bug in the PV spinlock wherein we did not kick VCPUs that were in a tight loop. - Fixes in the error paths for the event channel machinery" Fix up a few semantic conflicts with the ACPI interface changes in drivers/xen/xen-acpi-{cpu,mem}hotplug.c. * tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: event channel arrays are xen_ulong_t and not unsigned long xen: Send spinlock IPI to all waiters xen: introduce xen_remap, use it instead of ioremap xen: close evtchn port if binding to irq fails xen-evtchn: correct comment and error output xen/tmem: Add missing %s in the printk statement. xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0 xen/acpi: ACPI cpu hotplug xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.h xen/stub: driver for CPU hotplug xen/acpi: ACPI memory hotplug xen/stub: driver for memory hotplug xen: implement updated XENMEM_add_to_physmap_range ABI xen/smp: Move the common CPU init code a bit to prep for PVH patch.
| * xen: Send spinlock IPI to all waitersStefan Bader2013-02-201-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a loophole between Xen's current implementation of pv-spinlocks and the scheduler. This was triggerable through a testcase until v3.6 changed the TLB flushing code. The problem potentially is still there just not observable in the same way. What could happen was (is): 1. CPU n tries to schedule task x away and goes into a slow wait for the runq lock of CPU n-# (must be one with a lower number). 2. CPU n-#, while processing softirqs, tries to balance domains and goes into a slow wait for its own runq lock (for updating some records). Since this is a spin_lock_irqsave in softirq context, interrupts will be re-enabled for the duration of the poll_irq hypercall used by Xen. 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives an interrupt (e.g. endio) and when processing the interrupt, tries to wake up task x. But that is in schedule and still on_cpu, so try_to_wake_up goes into a tight loop. 4. The runq lock of CPU n-# gets unlocked, but the message only gets sent to the first waiter, which is CPU n-# and that is busily stuck. 5. CPU n-# never returns from the nested interruption to take and release the lock because the scheduler uses a busy wait. And CPU n never finishes the task migration because the unlock notification only went to CPU n-#. To avoid this and since the unlocking code has no real sense of which waiter is best suited to grab the lock, just send the IPI to all of them. This causes the waiters to return from the hyper- call (those not interrupted at least) and do active spinlocking. BugLink: http://bugs.launchpad.net/bugs/1011792 Acked-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/smp: Move the common CPU init code a bit to prep for PVH patch.Konrad Rzeszutek Wilk2013-02-201-19/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | The PV and PVH code CPU init code share some functionality. The PVH code ("xen/pvh: Extend vcpu_guest_context, p2m, event, and XenBus") sets some of these up, but not all. To make it easier to read, this patch removes the PV specific out of the generic way. No functional change - just code movement. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v2: Fixed compile errors noticed by Fengguang Wu build system] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | x86-64, xen, mmu: Provide an early version of write_cr3.Konrad Rzeszutek Wilk2013-02-231-5/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With commit 8170e6bed465 ("x86, 64bit: Use a #PF handler to materialize early mappings on demand") we started hitting an early bootup crash where the Xen hypervisor would inform us that: (XEN) d7:v0: unhandled page fault (ec=0000) (XEN) Pagetable walk from ffffea000005b2d0: (XEN) L4[0x1d4] = 0000000000000000 ffffffffffffffff (XEN) domain_crash_sync called from entry.S (XEN) Domain 7 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.2.0 x86_64 debug=n Not tainted ]---- .. that Xen was unable to context switch back to dom0. Looking at the calling stack we find: [<ffffffff8103feba>] xen_get_user_pgd+0x5a <-- [<ffffffff8103feba>] xen_get_user_pgd+0x5a [<ffffffff81042d27>] xen_write_cr3+0x77 [<ffffffff81ad2d21>] init_mem_mapping+0x1f9 [<ffffffff81ac293f>] setup_arch+0x742 [<ffffffff81666d71>] printk+0x48 We are trying to figure out whether we need to up-date the user PGD as well. Please keep in mind that under 64-bit PV guests we have a limited amount of rings: 0 for the Hypervisor, and 1 for both the Linux kernel and user-space. As such the Linux pvops'fied version of write_cr3 checks if it has to update the user-space cr3 as well. That clearly is not needed during early bootup. The recent changes (see above git commit) streamline the x86 page table allocation to be much simpler (And also incidentally the #PF handler ends up in spirit being similar to how the Xen toolstack sets up the initial page-tables). The fix is to have an early-bootup version of cr3 that just loads the kernel %cr3. The later version - which also handles user-page modifications will be used after the initial page tables have been setup. [ hpa: removed a redundant #ifdef and made the new function __init. Also note that x86-32 already has such an early xen_write_cr3. ] Tested-by: "H. Peter Anvin" <hpa@zytor.com> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361579812-23709-1-git-send-email-konrad.wilk@oracle.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2013-02-221-28/+0Star
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The *really* big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...
| * | Merge remote-tracking branch 'origin/x86/boot' into x86/mm2H. Peter Anvin2013-01-306-41/+101
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coming patches to x86/mm2 require the changes and advanced baseline in x86/boot. Resolved Conflicts: arch/x86/kernel/setup.c mm/nobootmem.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | x86, mm, Xen: Remove mapping_pagetable_reserve()Yinghai Lu2012-11-171-28/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Page table area are pre-mapped now after x86, mm: setup page table in top-down x86, mm: Remove early_memremap workaround for page table accessing on 64bit mapping_pagetable_reserve is not used anymore, so remove it. Also remove operation in mask_rw_pte(), as modified allow_low_page always return pages that are already mapped, moreover xen_alloc_pte_init, xen_alloc_pmd_init, etc, will mark the page RO before hooking it into the pagetable automatically. -v2: add changelog about mask_rw_pte() from Stefano. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-27-git-send-email-yinghai@kernel.org Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | | Merge tag 'pm+acpi-3.9-rc1' of ↵Linus Torvalds2013-02-201-4/+1Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: - Rework of the ACPI namespace scanning code from Rafael J. Wysocki with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg, Toshi Kani, and Yinghai Lu. - ACPI power resources handling and ACPI device PM update from Rafael J Wysocki. - ACPICA update to version 20130117 from Bob Moore and Lv Zheng with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner. - Support for Intel Lynxpoint LPSS from Mika Westerberg. - cpuidle update from Len Brown including Intel Haswell support, C1 state for intel_idle, removal of global pm_idle. - cpuidle fixes and cleanups from Daniel Lezcano. - cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with contributions from Stratos Karafotis and Rickard Andersson. - Intel P-states driver for Sandy Bridge processors from Dirk Brandewie. - cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn. - cpufreq fixes related to ordering issues between acpi-cpufreq and powernow-k8 from Borislav Petkov and Matthew Garrett. - cpufreq support for Calxeda Highbank processors from Mark Langsdorf and Rob Herring. - cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update from Shawn Guo. - cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat, and Inderpal Singh. - Support for "lightweight suspend" from Zhang Rui. - Removal of the deprecated power trace API from Paul Gortmaker. - Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki Ishimatsu. * tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits) PM idle: remove global declaration of pm_idle unicore32 idle: delete stray pm_idle comment openrisc idle: delete pm_idle mn10300 idle: delete pm_idle microblaze idle: delete pm_idle m32r idle: delete pm_idle, and other dead idle code ia64 idle: delete pm_idle cris idle: delete idle and pm_idle ARM64 idle: delete pm_idle ARM idle: delete pm_idle blackfin idle: delete pm_idle sparc idle: rename pm_idle to sparc_idle sh idle: rename global pm_idle to static sh_idle x86 idle: rename global pm_idle to static x86_idle APM idle: register apm_cpu_idle via cpuidle cpufreq / intel_pstate: Add kernel command line option disable intel_pstate. cpufreq / intel_pstate: Change to disallow module build tools/power turbostat: display SMI count by default intel_idle: export both C1 and C1E ACPI / hotplug: Fix concurrency issues and memory leaks ...
| * | | x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flagLen Brown2013-02-101-3/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove 32-bit x86 a cmdline param "no-hlt", and the cpuinfo_x86.hlt_works_ok that it sets. If a user wants to avoid HLT, then "idle=poll" is much more useful, as it avoids invocation of HLT in idle, while "no-hlt" failed to do so. Indeed, hlt_works_ok was consulted in only 3 places. First, in /proc/cpuinfo where "hlt_bug yes" would be printed if and only if the user booted the system with "no-hlt" -- as there was no other code to set that flag. Second, check_hlt() would not invoke halt() if "no-hlt" were on the cmdline. Third, it was consulted in stop_this_cpu(), which is invoked by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() -- all cases where the machine is being shutdown/reset. The flag was not consulted in the more frequently invoked play_dead()/hlt_play_dead() used in processor offline and suspend. Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations indicating that it would be removed in 2012. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
| * | | xen idle: make xen-specific macro xen-specificLen Brown2013-02-101-1/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | This macro is only invoked by Xen, so make its definition specific to Xen. > set_pm_idle_to_default() < xen_set_default_idle() Signed-off-by: Len Brown <len.brown@intel.com> Cc: xen-devel@lists.xensource.com
* | | Merge branch 'x86-apic-for-linus' of ↵Linus Torvalds2013-02-201-0/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic changes from Ingo Molnar: "Main changes: - Multiple MSI support added to the APIC, PCI and AHCI code - acked by all relevant maintainers, by Alexander Gordeev. The advantage is that multiple AHCI ports can have multiple MSI irqs assigned, and can thus spread to multiple CPUs. [ Drivers can make use of this new facility via the pci_enable_msi_block_auto() method ] - x86 IOAPIC code from interrupt remapping cleanups from Joerg Roedel: These patches move all interrupt remapping specific checks out of the x86 core code and replaces the respective call-sites with function pointers. As a result the interrupt remapping code is better abstraced from x86 core interrupt handling code. - Various smaller improvements, fixes and cleanups." * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess x86, kvm: Fix intialization warnings in kvm.c x86, irq: Move irq_remapped out of x86 core code x86, io_apic: Introduce eoi_ioapic_pin call-back x86, msi: Introduce x86_msi.compose_msi_msg call-back x86, irq: Introduce setup_remapped_irq() x86, irq: Move irq_remapped() check into free_remapped_irq x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq() x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 core x86, irq: Add data structure to keep AMD specific irq remapping information x86, irq: Move irq_remapping_enabled declaration to iommu code x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pin x86, io_apic: Move irq_remapping_enabled checks out of check_timer() x86, io_apic: Convert setup_ioapic_entry to function pointer x86, io_apic: Introduce set_affinity function pointer x86, msi: Use IRQ remapping specific setup_msi_irqs routine x86, hpet: Introduce x86_msi_ops.setup_hpet_msi x86, io_apic: Introduce x86_io_apic_ops.print_entries for debugging x86, io_apic: Introduce x86_io_apic_ops.disable() x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resume ...
| * | | x86/apic: Allow x2apic without IR on VMware platformAlok N Kataria2013-01-241-0/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates x2apic initializaition code to allow x2apic on VMware platform even without interrupt remapping support. The hypervisor_x2apic_available hook was added in x2apic initialization code and used by KVM and XEN, before this. I have also cleaned up that code to export this hook through the hypervisor_x86 structure. Compile tested for KVM and XEN configs, this patch doesn't have any functional effect on those two platforms. On VMware platform, verified that x2apic is used in physical mode on products that support this. Signed-off-by: Alok N Kataria <akataria@vmware.com> Reviewed-by: Doug Covelli <dcovelli@vmware.com> Reviewed-by: Dan Hecht <dhecht@vmware.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Avi Kivity <avi@redhat.com> Link: http://lkml.kernel.org/r/1358466282.423.60.camel@akataria-dtop.eng.vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge tag 'stable/for-linus-3.8-rc7-tag-two' of ↵Linus Torvalds2013-02-154-63/+32Star
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: "Two fixes: - A simple bug-fix for redundant NULL check. - CVE-2013-0228/XSA-42: x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS and two reverts: - Revert the PVonHVM kexec. The patch introduces a regression with older hypervisor stacks, such as Xen 4.1." * tag 'stable/for-linus-3.8-rc7-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: Revert "xen PVonHVM: use E820_Reserved area for shared_info" Revert "xen/PVonHVM: fix compile warning in init_hvm_pv_info" xen: remove redundant NULL check before unregister_and_remove_pcpu(). x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.
| * | Revert "xen PVonHVM: use E820_Reserved area for shared_info"Konrad Rzeszutek Wilk2013-02-153-55/+24Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f. We are doing this b/c on 32-bit PVonHVM with older hypervisors (Xen 4.1) it ends up bothing up the start_info. This is bad b/c we use it for the time keeping, and the timekeeping code loops forever - as the version field never changes. Olaf says to revert it, so lets do that. Acked-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | Revert "xen/PVonHVM: fix compile warning in init_hvm_pv_info"Konrad Rzeszutek Wilk2013-02-151-1/+1
| | | | | | | | | | | | | | | | | | This reverts commit a7be94ac8d69c037d08f0fd94b45a593f1d45176. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS.Jan Beulich2013-02-131-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes CVE-2013-0228 / XSA-42 Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user in 32bit PV guest can use to crash the > guest with the panic like this: ------------- general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51712/block/xvda/dev Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1 EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0 EIP is at xen_iret+0x12/0x2b EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010 ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069 Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000) Stack: 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000 Call Trace: Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00 8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40 10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02 EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0 general protection fault: 0000 [#2] ---[ end trace ab0d29a492dcd330 ]--- Kernel panic - not syncing: Fatal exception Pid: 1250, comm: r Tainted: G D --------------- 2.6.32-356.el6.i686 #1 Call Trace: [<c08476df>] ? panic+0x6e/0x122 [<c084b63c>] ? oops_end+0xbc/0xd0 [<c084b260>] ? do_general_protection+0x0/0x210 [<c084a9b7>] ? error_code+0x73/ ------------- Petr says: " I've analysed the bug and I think that xen_iret() cannot cope with mangled DS, in this case zeroed out (null selector/descriptor) by either xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT entry was invalidated by the reproducer. " Jan took a look at the preliminary patch and came up a fix that solves this problem: "This code gets called after all registers other than those handled by IRET got already restored, hence a null selector in %ds or a non-null one that got loaded from a code or read-only data descriptor would cause a kernel mode fault (with the potential of crashing the kernel as a whole, if panic_on_oops is set)." The way to fix this is to realize that the we can only relay on the registers that IRET restores. The two that are guaranteed are the %cs and %ss as they are always fixed GDT selectors. Also they are inaccessible from user mode - so they cannot be altered. This is the approach taken in this patch. Another alternative option suggested by Jan would be to relay on the subtle realization that using the %ebp or %esp relative references uses the %ss segment. In which case we could switch from using %eax to %ebp and would not need the %ss over-rides. That would also require one extra instruction to compensate for the one place where the register is used as scaled index. However Andrew pointed out that is too subtle and if further work was to be done in this code-path it could escape folks attention and lead to accidents. Reviewed-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Petr Matousek <pmatouse@redhat.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | Merge tag 'stable/for-linus-3.8-rc3-tag' of ↵Linus Torvalds2013-01-181-7/+0Star
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fixes from Konrad Rzeszutek Wilk: - CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels) - Fix racy vma access spotted by Al Viro - Fix mmap batch ioctl potentially resulting in large O(n) page allcations. - Fix vcpu online/offline BUG:scheduling while atomic.. - Fix unbound buffer scanning for more than 32 vCPUs. - Fix grant table being incorrectly initialized - Fix incorrect check in pciback - Allow privcmd in backend domains. Fix up whitespace conflict due to ugly merge resolution in Xen tree in arch/arm/xen/enlighten.c * tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests. Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic." xen/gntdev: remove erronous use of copy_to_user xen/gntdev: correctly unmap unlinked maps in mmu notifier xen/gntdev: fix unsafe vma access xen/privcmd: Fix mmap batch ioctl. Xen: properly bound buffer access when parsing cpu/*/availability xen/grant-table: correctly initialize grant table version 1 x86/xen : Fix the wrong check in pciback xen/privcmd: Relax access control in privcmd_ioctl_mmap
| * | Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling ↵Konrad Rzeszutek Wilk2013-01-161-7/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while atomic." This reverts commit 41bd956de3dfdc3a43708fe2e0c8096c69064a1e. The fix is incorrect and not appropiate for the latest kernels. In fact it _causes_ the BUG: scheduling while atomic while doing vCPU hotplug. Suggested-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | Merge tag 'v3.7' into stable/for-linus-3.8Konrad Rzeszutek Wilk2013-01-151-1/+20
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux 3.7 * tag 'v3.7': (833 commits) Linux 3.7 Input: matrix-keymap - provide proper module license Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage ipv4: ip_check_defrag must not modify skb before unsharing Revert "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended" inet_diag: validate port comparison byte code to prevent unsafe reads inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run() inet_diag: validate byte code to prevent oops in inet_diag_bc_run() inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state mm: vmscan: fix inappropriate zone congestion clearing vfs: fix O_DIRECT read past end of block device net: gro: fix possible panic in skb_gro_receive() tcp: bug fix Fast Open client retransmission tmpfs: fix shared mempolicy leak mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones mm: compaction: validate pfn range passed to isolate_freepages_block mmc: sh-mmcif: avoid oops on spurious interrupts (second try) Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts" mmc: sdhci-s3c: fix missing clock for gpio card-detect lib/Makefile: Fix oid_registry build dependency ... Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Conflicts: arch/arm/xen/enlighten.c drivers/xen/Makefile [We need to have the v3.7 base as the 'for-3.8' was based off v3.7-rc3 and there are some patches in v3.7-rc6 that we to have in our branch]
* | | Merge tag 'stable/for-linus-3.8-rc0-bugfix-tag' of ↵Linus Torvalds2012-12-182-4/+5
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bugfixes from Konrad Rzeszutek Wilk: "Two fixes. One of them is caused by the recent change introduced by the 'x86-bsp-hotplug-for-linus' tip tree that inhibited bootup (old function does not do what it used to do). The other one is just a vanilla bug. - Fix to bootup regression introduced by 'x86-bsp-hotplug-for-linus' tip branch. - Fix to vcpu hotplug code." * tag 'stable/for-linus-3.8-rc0-bugfix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/vcpu: Fix vcpu restore path. xen: Add EVTCHNOP_reset in Xen interface header files. xen/smp: Use smp_store_boot_cpu_info() to store cpu info for BSP during boot time.
| * | xen/vcpu: Fix vcpu restore path.Wei Liu2012-12-181-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | The runstate of vcpu should be restored for all possible cpus, as well as the vcpu info placement. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen/smp: Use smp_store_boot_cpu_info() to store cpu info for BSP during boot ↵Konrad Rzeszutek Wilk2012-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | time. Git commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da ("x86, hotplug: Support functions for CPU0 online/offline") alters what the call to smp_store_cpu_info() does. For BSP we should use the smp_store_boot_cpu_info() and for secondary CPU's the old variant of smp_store_cpu_info() should be used. This fixes the regression introduced by said commit. Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | Merge tag 'stable/for-linus-3.8-rc0-tag' of ↵Linus Torvalds2012-12-135-29/+95
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen updates from Konrad Rzeszutek Wilk: - Add necessary infrastructure to make balloon driver work under ARM. - Add /dev/xen/privcmd interfaces to work with ARM and PVH. - Improve Xen PCIBack wild-card parsing. - Add Xen ACPI PAD (Processor Aggregator) support - so can offline/ online sockets depending on the power consumption. - PVHVM + kexec = use an E820_RESV region for the shared region so we don't overwrite said region during kexec reboot. - Cleanups, compile fixes. Fix up some trivial conflicts due to the balloon driver now working on ARM, and there were changes next to the previous work-arounds that are now gone. * tag 'stable/for-linus-3.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/PVonHVM: fix compile warning in init_hvm_pv_info xen: arm: implement remap interfaces needed for privcmd mappings. xen: correctly use xen_pfn_t in remap_domain_mfn_range. xen: arm: enable balloon driver xen: balloon: allow PVMMU interfaces to be compiled out xen: privcmd: support autotranslated physmap guests. xen: add pages parameter to xen_remap_domain_mfn_range xen/acpi: Move the xen_running_on_version_or_later function. xen/xenbus: Remove duplicate inclusion of asm/xen/hypervisor.h xen/acpi: Fix compile error by missing decleration for xen_domain. xen/acpi: revert pad config check in xen_check_mwait xen/acpi: ACPI PAD driver xen-pciback: reject out of range inputs xen-pciback: simplify and tighten parsing of device IDs xen PVonHVM: use E820_Reserved area for shared_info
| * | Merge branch 'arm-privcmd-for-3.8' of ↵Konrad Rzeszutek Wilk2012-11-302-2/+16
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://xenbits.xen.org/people/ianc/linux into stable/for-linus-3.8 * 'arm-privcmd-for-3.8' of git://xenbits.xen.org/people/ianc/linux: xen: arm: implement remap interfaces needed for privcmd mappings. xen: correctly use xen_pfn_t in remap_domain_mfn_range. xen: arm: enable balloon driver xen: balloon: allow PVMMU interfaces to be compiled out xen: privcmd: support autotranslated physmap guests. xen: add pages parameter to xen_remap_domain_mfn_range Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen: correctly use xen_pfn_t in remap_domain_mfn_range.Ian Campbell2012-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For Xen on ARM a PFN is 64 bits so we need to use the appropriate type here. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: include the necessary header, Reported-by: Fengguang Wu <fengguang.wu@intel.com> ]
| | * | xen: balloon: allow PVMMU interfaces to be compiled outIan Campbell2012-11-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARM platform has no concept of PVMMU and therefor no HYPERVISOR_update_va_mapping et al. Allow this code to be compiled out when not required. In some similar situations (e.g. P2M) we have defined dummy functions to avoid this, however I think we can/should draw the line at dummying out actual hypercalls. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | xen: add pages parameter to xen_remap_domain_mfn_rangeIan Campbell2012-11-291-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also introduce xen_unmap_domain_mfn_range. These are the parts of Mukesh's "xen/pvh: Implement MMU changes for PVH" which are also needed as a baseline for ARM privcmd support. The original patch was: Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> This derivative is also: Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
| * | | xen/PVonHVM: fix compile warning in init_hvm_pv_infoOlaf Hering2012-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After merging the xen-two tree, today's linux-next build (x86_64 allmodconfig) produced this warning: arch/x86/xen/enlighten.c: In function 'init_hvm_pv_info': arch/x86/xen/enlighten.c:1617:16: warning: unused variable 'ebx' [-Wunused-variable] arch/x86/xen/enlighten.c:1617:11: warning: unused variable 'eax' [-Wunused-variable] Introduced by commit 9d02b43dee0d ("xen PVonHVM: use E820_Reserved area for shared_info"). Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/acpi: Move the xen_running_on_version_or_later function.Konrad Rzeszutek Wilk2012-11-281-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As on ia64 builds we get: include/xen/interface/version.h: In function 'xen_running_on_version_or_later': include/xen/interface/version.h:76: error: implicit declaration of function 'HYPERVISOR_xen_version' We can later on make this function exportable if there are modules using part of it. For right now the only two users are built-in. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/acpi: revert pad config check in xen_check_mwaitLiu, Jinsong2012-11-261-2/+8
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With Xen acpi pad logic added into kernel, we can now revert xen mwait related patch df88b2d96e36d9a9e325bfcd12eb45671cbbc937 ("xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded. "). The reason is, when running under newer Xen platform, Xen pad driver would be early loaded, so native pad driver would fail to be loaded, and hence no mwait/monitor #UD risk again. Another point is, only Xen4.2 or later support Xen acpi pad, so we won't expose mwait cpuid capability when running under older Xen platform. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen PVonHVM: use E820_Reserved area for shared_infoOlaf Hering2012-11-023-24/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3 ("xen PVonHVM: move shared_info to MMIO before kexec"). Currently kexec in a PVonHVM guest fails with a triple fault because the new kernel overwrites the shared info page. The exact failure depends on the size of the kernel image. This patch moves the pfn from RAM into an E820 reserved memory area. The pfn containing the shared_info is located somewhere in RAM. This will cause trouble if the current kernel is doing a kexec boot into a new kernel. The new kernel (and its startup code) can not know where the pfn is, so it can not reserve the page. The hypervisor will continue to update the pfn, and as a result memory corruption occours in the new kernel. The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the E820 map. Within that range newer toolstacks (4.3+) will keep 1MB starting from FE700000 as reserved for guest use. Older Xen4 toolstacks will usually not allocate areas up to FE700000, so FE700000 is expected to work also with older toolstacks. In Xen3 there is no reserved area at a fixed location. If the guest is started on such old hosts the shared_info page will be placed in RAM. As a result kexec can not be used. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | x86, 386 removal: Remove CONFIG_CMPXCHGH. Peter Anvin2012-11-291-1/+1
| |/ |/| | | | | | | | | | | | | All 486+ CPUs support CMPXCHG, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-3-git-send-email-hpa@linux.intel.com
* | xen/mmu: Use Xen specific TLB flush instead of the generic one.Konrad Rzeszutek Wilk2012-10-311-1/+20
|/ | | | | | | | | | | | | | | | | | | | | | As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the hypervisor to do a TLB flush on all active vCPUs. If instead we were using the generic one (which ends up being xen_flush_tlb) we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But before we make that hypercall the kernel will IPI all of the vCPUs (even those that were asleep from the hypervisor perspective). The end result is that we needlessly wake them up and do a TLB flush when we can just let the hypervisor do it correctly. This patch gives around 50% speed improvement when migrating idle guest's from one host to another. Oracle-bug: 14630170 CC: stable@vger.kernel.org Tested-by: Jingjie Jiang <jingjie.jiang@oracle.com> Suggested-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* Merge commit 'v3.7-rc1' into stable/for-linus-3.7Konrad Rzeszutek Wilk2012-10-195-22/+22
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit 'v3.7-rc1': (10892 commits) Linux 3.7-rc1 x86, boot: Explicitly include autoconf.h for hostprogs perf: Fix UAPI fallout ARM: config: make sure that platforms are ordered by option string ARM: config: sort select statements alphanumerically UAPI: (Scripted) Disintegrate include/linux/byteorder UAPI: (Scripted) Disintegrate include/linux UAPI: Unexport linux/blk_types.h UAPI: Unexport part of linux/ppp-comp.h perf: Handle new rbtree implementation procfs: don't need a PATH_MAX allocation to hold a string representation of an int vfs: embed struct filename inside of names_cache allocation if possible audit: make audit_inode take struct filename vfs: make path_openat take a struct filename pointer vfs: turn do_path_lookup into wrapper around struct filename variant audit: allow audit code to satisfy getname requests from its names_list vfs: define struct filename and have getname() return it btrfs: Fix compilation with user namespace support enabled userns: Fix posix_acl_file_xattr_userns gid conversion userns: Properly print bluetooth socket uids ...
| * Merge tag 'stable/for-linus-3.7-rc0-tag' of ↵Linus Torvalds2012-10-122-1/+58
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fixes from Konrad Rzeszutek Wilk: "This has four bug-fixes and one tiny feature that I forgot to put initially in my tree due to oversight. The feature is for kdump kernels to speed up the /proc/vmcore reading. There is a ram_is_pfn helper function that the different platforms can register for. We are now doing that. The bug-fixes cover some embarrassing struct pv_cpu_ops variables being set to NULL on Xen (but not baremetal). We had a similar issue in the past with {write|read}_msr_safe and this fills the three missing ones. The other bug-fix is to make the console output (hvc) be capable of dealing with misbehaving backends and not fall flat on its face. Lastly, a quirk for older XenBus implementations that came with an ancient v3.4 hypervisor (so RHEL5 based) - reading of certain non-existent attributes just hangs the guest during bootup - so we take precaution of not doing that on such older installations. Feature: - Register a pfn_is_ram helper to speed up reading of /proc/vmcore. Bug-fixes: - Three pvops call for Xen were undefined causing BUG_ONs. - Add a quirk so that the shutdown watches (used by kdump) are not used with older Xen (3.4). - Fix ungraceful state transition for the HVC console." * tag 'stable/for-linus-3.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pv-on-hvm kexec: add quirk for Xen 3.4 and shutdown watches. xen/bootup: allow {read|write}_cr8 pvops call. xen/bootup: allow read_tscp call for Xen PV guests. xen pv-on-hvm: add pfn_is_ram helper for kdump xen/hvc: handle backend CLOSED without CLOSING
| * | mm: kill vma flag VM_RESERVED and mm->reserved_vm counterKonstantin Khlebnikov2012-10-091-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags -+------------------------+--------------------------------------------- 1| account as reserved_vm | VM_IO 2| skip in core dump | VM_IO, VM_DONTDUMP 3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | Merge tag 'stable/for-linus-3.7-arm-tag' of ↵Linus Torvalds2012-10-073-1/+2
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull ADM Xen support from Konrad Rzeszutek Wilk: Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports ARMv7 platforms. The goal in implementing this architecture is to exploit the hardware as much as possible. That means use as little as possible of PV operations (so no PV MMU) - and use existing PV drivers for I/Os (network, block, console, etc). This is similar to how PVHVM guests operate in X86 platform nowadays - except that on ARM there is no need for QEMU. The end result is that we share a lot of the generic Xen drivers and infrastructure. Details on how to compile/boot/etc are available at this Wiki: http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions and this blog has links to a technical discussion/presentations on the overall architecture: http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/ * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits) xen/xen_initial_domain: check that xen_start_info is initialized xen: mark xen_init_IRQ __init xen/Makefile: fix dom-y build arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls ...
| * \ \ Merge tag 'stable/for-linus-3.7-x86-tag' of ↵Linus Torvalds2012-10-0310-59/+378
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "Features: - When hotplugging PCI devices in a PV guest we can allocate Xen-SWIOTLB later. - Cleanup Xen SWIOTLB. - Support pages out grants from HVM domains in the backends. - Support wild cards in xen-pciback.hide=(BDF) arguments. - Update grant status updates with upstream hypervisor. - Boot PV guests with more than 128GB. - Cleanup Xen MMU code/add comments. - Obtain XENVERS using a preferred method. - Lay out generic changes to support Xen ARM. - Allow privcmd ioctl for HVM (used to do only PV). - Do v2 of mmap_batch for privcmd ioctls. - If hypervisor saves the LED keyboard light - we will now instruct the kernel about its state. Fixes: - More fixes to Xen PCI backend for various calls/FLR/etc. - With more than 4GB in a 64-bit PV guest disable native SWIOTLB. - Fix up smatch warnings. - Fix up various return values in privmcmd and mm." * tag 'stable/for-linus-3.7-x86-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (48 commits) xen/pciback: Restore the PCI config space after an FLR. xen-pciback: properly clean up after calling pcistub_device_find() xen/vga: add the xen EFI video mode support xen/x86: retrieve keyboard shift status flags from hypervisor. xen/gndev: Xen backend support for paged out grant targets V4. xen-pciback: support wild cards in slot specifications xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. xen/swiotlb: Remove functions not needed anymore. xen/pcifront: Use Xen-SWIOTLB when initting if required. xen/swiotlb: For early initialization, return zero on success. xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. xen/swiotlb: Move the error strings to its own function. xen/swiotlb: Move the nr_tbl determination in its own function. xen/arm: compile and run xenbus xen: resynchronise grant table status codes with upstream xen/privcmd: return -EFAULT on error xen/privcmd: Fix mmap batch ioctl error status copy back. xen/privcmd: add PRIVCMD_MMAPBATCH_V2 ioctl xen/mm: return more precise error from xen_remap_domain_range() xen/mmu: If the revector fails, don't attempt to revector anything else. ...
| * \ \ \ Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds2012-10-011-11/+7Star
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/platform changes from Ingo Molnar: "This cleans up some Xen-induced pagetable init code uglies, by generalizing new platform callbacks and state: x86_init.paging.*" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Document x86_init.paging.pagetable_init() x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done() x86: Move paging_init() call to x86_init.paging.pagetable_init() x86: Rename pagetable_setup_start() to pagetable_init() x86: Remove base argument from x86_init.paging.pagetable_setup_start
| * \ \ \ \ Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds2012-10-011-4/+2Star
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar: "The one change that stands out is the alternatives patching change that prevents us from ever patching back instructions from SMP to UP: this simplifies things and speeds up CPU hotplug. Other than that it's smaller fixes, cleanups and improvements." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Unspaghettize do_trap() x86_64: Work around old GAS bug x86: Use REP BSF unconditionally x86: Prefer TZCNT over BFS x86/64: Adjust types of temporaries used by ffs()/fls()/fls64() x86: Drop unnecessary kernel_eflags variable on 64-bit x86/smp: Don't ever patch back to UP if we unplug cpus
| | * | | | | x86/smp: Don't ever patch back to UP if we unplug cpusRusty Russell2012-08-231-4/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still patch SMP instructions to UP variants if we boot with a single CPU, but not at any other time. In particular, not if we unplug CPUs to return to a single cpu. Paul McKenney points out: mean offline overhead is 6251/48=130.2 milliseconds. If I remove the alternatives_smp_switch() from the offline path [...] the mean offline overhead is 550/42=13.1 milliseconds Basically, we're never going to get those 120ms back, and the code is pretty messy. We get rid of: 1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the documentation is wrong. It's now the default. 2) The skip_smp_alternatives flag used by suspend. 3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end() which were only used to set this one flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paul.mckenney@us.ibm.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | Merge tag 'stable/for-linus-3.6-rc1-tag' of ↵Linus Torvalds2012-08-161-0/+5
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fix from Konrad Rzeszutek Wilk: "Way back in v3.5 we added a mechanism to populate back pages that were released (they overlapped with MMIO regions), but neglected to reserve the proper amount of virtual space for extend_brk to work properly. Coincidentally some other commit aligned the _brk space to larger area so I didn't trigger this until it was run on a machine with more than 2GB of MMIO space." * On machines with large MMIO/PCI E820 spaces we fail to boot b/c we failed to pre-allocate large enough virtual space for extend_brk. * tag 'stable/for-linus-3.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.
| * | | | | | | xen/boot: Disable NUMA for PV guests.Konrad Rzeszutek Wilk2012-09-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hypervisor is in charge of allocating the proper "NUMA" memory and dealing with the CPU scheduler to keep them bound to the proper NUMA node. The PV guests (and PVHVM) have no inkling of where they run and do not need to know that right now. In the future we will need to inject NUMA configuration data (if a guest spans two or more NUMA nodes) so that the kernel can make the right choices. But those patches are not yet present. In the meantime, disable the NUMA capability in the PV guest, which also fixes a bootup issue. Andre says: "we see Dom0 crashes due to the kernel detecting the NUMA topology not by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA). This will detect the actual NUMA config of the physical machine, but will crash about the mismatch with Dom0's virtual memory. Variation of the theme: Dom0 sees what it's not supposed to see. This happens with the said config option enabled and on a machine where this scanning is still enabled (K8 and Fam10h, not Bulldozer class) We have this dump then: NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10 Scanning NUMA topology in Northbridge 24 Number of physical nodes 4 Node 0 MemBase 0000000000000000 Limit 0000000040000000 Node 1 MemBase 0000000040000000 Limit 0000000138000000 Node 2 MemBase 0000000138000000 Limit 00000001f8000000 Node 3 MemBase 00000001f8000000 Limit 0000000238000000 Initmem setup node 0 0000000000000000-0000000040000000 NODE_DATA [000000003ffd9000 - 000000003fffffff] Initmem setup node 1 0000000040000000-0000000138000000 NODE_DATA [0000000137fd9000 - 0000000137ffffff] Initmem setup node 2 0000000138000000-00000001f8000000 NODE_DATA [00000001f095e000 - 00000001f0984fff] Initmem setup node 3 00000001f8000000-0000000238000000 Cannot find 159744 bytes in node 3 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 .. snip.. [<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178 [<ffffffff81d23348>] sparse_init+0xe4/0x25a [<ffffffff81d16840>] paging_init+0x13/0x22 [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b [<ffffffff81683954>] ? printk+0x3c/0x3e [<ffffffff81d01a38>] start_kernel+0xe5/0x468 [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1 [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36 [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c " so we just disable NUMA scanning by setting numa_off=1. CC: stable@vger.kernel.org Reported-and-Tested-by: Andre Przywara <andre.przywara@amd.com> Acked-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | | | xen/boot: Disable BIOS SMP MP table search.Konrad Rzeszutek Wilk2012-09-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the initial domain we are able to search/map certain regions of memory to harvest configuration data. For all low-level we use ACPI tables - for interrupts we use exclusively ACPI _PRT (so DSDT) and MADT for INT_SRC_OVR. The SMP MP table is not used at all. As a matter of fact we do not even support machines that only have SMP MP but no ACPI tables. Lets follow how Moorestown does it and just disable searching for BIOS SMP tables. This also fixes an issue on HP Proliant BL680c G5 and DL380 G6: 9f->100 for 1:1 PTE Freeing 9f-100 pfn range: 97 pages freed 1-1 mapping on 9f->100 .. snip.. e820: BIOS-provided physical RAM map: Xen: [mem 0x0000000000000000-0x000000000009efff] usable Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable .. snip.. Scan for SMP in [mem 0x00000000-0x000003ff] Scan for SMP in [mem 0x0009fc00-0x0009ffff] Scan for SMP in [mem 0x000f0000-0x000fffff] found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0] (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0 (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e() BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71 . snip.. Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5 .. snip.. Call Trace: [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248 [<ffffffff81624731>] ? printk+0x48/0x4a [<ffffffff81ad32ac>] early_ioremap+0x13/0x15 [<ffffffff81acc140>] get_mpc_size+0x2f/0x67 [<ffffffff81acc284>] smp_scan_config+0x10c/0x136 [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b [<ffffffff81624731>] ? printk+0x48/0x4a [<ffffffff81abca7f>] start_kernel+0x90/0x390 [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136 [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661 (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. which is that ioremap would end up mapping 0xff using _PAGE_IOMAP (which is what early_ioremap sticks as a flag) - which meant we would get MFN 0xFF (pte ff461, which is OK), and then it would also map 0x100 (b/c ioremap tries to get page aligned request, and it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page) as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP bypasses the P2M lookup we would happily set the PTE to 1000461. Xen would deny the request since we do not have access to the Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example 0x80140. CC: stable@vger.kernel.org Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665 Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | | | xen/m2p: do not reuse kmap_op->dev_bus_addrStefano Stabellini2012-09-121-16/+11Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | | | | | xen/x86: remove duplicated include from enlighten.cWei Yongjun2012-10-191-2/+0Star
| |_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated include. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) CC: stable@vger.kernel.org Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | | | | xen/bootup: allow {read|write}_cr8 pvops call.Konrad Rzeszutek Wilk2012-10-121-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We actually do not do anything about it. Just return a default value of zero and if the kernel tries to write anything but 0 we BUG_ON. This fixes the case when an user tries to suspend the machine and it blows up in save_processor_state b/c 'read_cr8' is set to NULL and we get: kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100! invalid opcode: 0000 [#1] SMP Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270 .. snip.. Call Trace: [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150 [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5 CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>