summaryrefslogtreecommitdiffstats
path: root/arch/x86/xen/p2m.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'stable/for-linus-3.7-x86-tag' of ↵Linus Torvalds2012-10-031-7/+85
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "Features: - When hotplugging PCI devices in a PV guest we can allocate Xen-SWIOTLB later. - Cleanup Xen SWIOTLB. - Support pages out grants from HVM domains in the backends. - Support wild cards in xen-pciback.hide=(BDF) arguments. - Update grant status updates with upstream hypervisor. - Boot PV guests with more than 128GB. - Cleanup Xen MMU code/add comments. - Obtain XENVERS using a preferred method. - Lay out generic changes to support Xen ARM. - Allow privcmd ioctl for HVM (used to do only PV). - Do v2 of mmap_batch for privcmd ioctls. - If hypervisor saves the LED keyboard light - we will now instruct the kernel about its state. Fixes: - More fixes to Xen PCI backend for various calls/FLR/etc. - With more than 4GB in a 64-bit PV guest disable native SWIOTLB. - Fix up smatch warnings. - Fix up various return values in privmcmd and mm." * tag 'stable/for-linus-3.7-x86-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (48 commits) xen/pciback: Restore the PCI config space after an FLR. xen-pciback: properly clean up after calling pcistub_device_find() xen/vga: add the xen EFI video mode support xen/x86: retrieve keyboard shift status flags from hypervisor. xen/gndev: Xen backend support for paged out grant targets V4. xen-pciback: support wild cards in slot specifications xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. xen/swiotlb: Remove functions not needed anymore. xen/pcifront: Use Xen-SWIOTLB when initting if required. xen/swiotlb: For early initialization, return zero on success. xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. xen/swiotlb: Move the error strings to its own function. xen/swiotlb: Move the nr_tbl determination in its own function. xen/arm: compile and run xenbus xen: resynchronise grant table status codes with upstream xen/privcmd: return -EFAULT on error xen/privcmd: Fix mmap batch ioctl error status copy back. xen/privcmd: add PRIVCMD_MMAPBATCH_V2 ioctl xen/mm: return more precise error from xen_remap_domain_range() xen/mmu: If the revector fails, don't attempt to revector anything else. ...
| * Merge branch 'stable/128gb.v5.1' into stable/for-linus-3.7Konrad Rzeszutek Wilk2012-09-121-7/+85
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * stable/128gb.v5.1: xen/mmu: If the revector fails, don't attempt to revector anything else. xen/p2m: When revectoring deal with holes in the P2M array. xen/mmu: Release just the MFN list, not MFN list and part of pagetables. xen/mmu: Remove from __ka space PMD entries for pagetables. xen/mmu: Copy and revector the P2M tree. xen/p2m: Add logic to revector a P2M tree to use __va leafs. xen/mmu: Recycle the Xen provided L4, L3, and L2 pages xen/mmu: For 64-bit do not call xen_map_identity_early xen/mmu: use copy_page instead of memcpy. xen/mmu: Provide comments describing the _ka and _va aliasing issue xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything. Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas." xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain. xen/x86: Use memblock_reserve for sensitive areas. xen/p2m: Fix the comment describing the P2M tree. Conflicts: arch/x86/xen/mmu.c The pagetable_init is the old xen_pagetable_setup_done and xen_pagetable_setup_start rolled in one. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/p2m: When revectoring deal with holes in the P2M array.Konrad Rzeszutek Wilk2012-08-231-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we free the PFNs and then subsequently populate them back during bootup: Freeing 20000-20200 pfn range: 512 pages freed 1-1 mapping on 20000->20200 Freeing 40000-40200 pfn range: 512 pages freed 1-1 mapping on 40000->40200 Freeing bad80-badf4 pfn range: 116 pages freed 1-1 mapping on bad80->badf4 Freeing badf6-bae7f pfn range: 137 pages freed 1-1 mapping on badf6->bae7f Freeing bb000-100000 pfn range: 282624 pages freed 1-1 mapping on bb000->100000 Released 283999 pages of unused memory Set 283999 page(s) to 1-1 mapping Populating 1acb8a-1f20e9 pfn range: 283999 pages added We end up having the P2M array (that is the one that was grafted on the P2M tree) filled with IDENTITY_FRAME or INVALID_P2M_ENTRY) entries. The patch titled "xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID." recycles said slots and replaces the P2M tree leaf's with &mfn_list[xx] with p2m_identity or p2m_missing. And re-uses the P2M array sections for other P2M tree leaf's. For the above mentioned bootup excerpt, the PFNs at 0x20000->0x20200 are going to be IDENTITY based: P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME. We can re-use that and replace P2M[0][256] to point to p2m_identity. The "old" page (the grafted P2M array provided by Xen) that was at P2M[0][256] gets put somewhere else. Specifically at P2M[6][358], b/c when we populate back: Populating 1acb8a-1f20e9 pfn range: 283999 pages added we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with the new MFNs. That is all OK, except when we revector we assume that the PFN count would be the same in the grafted P2M array and in the newly allocated. Since that is no longer the case, as we have holes in the P2M that point to p2m_missing or p2m_identity we have to take that into account. [v2: Check for overflow] [v3: Move within the __va check] [v4: Fix the computation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/p2m: Add logic to revector a P2M tree to use __va leafs.Konrad Rzeszutek Wilk2012-08-231-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During bootup Xen supplies us with a P2M array. It sticks it right after the ramdisk, as can be seen with a 128GB PV guest: (certain parts removed for clarity): xc_dom_build_image: called xc_dom_alloc_segment: kernel : 0xffffffff81000000 -> 0xffffffff81e43000 (pfn 0x1000 + 0xe43 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000 xc_dom_alloc_segment: ramdisk : 0xffffffff81e43000 -> 0xffffffff925c7000 (pfn 0x1e43 + 0x10784 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000 xc_dom_alloc_segment: phys2mach : 0xffffffff925c7000 -> 0xffffffffa25c7000 (pfn 0x125c7 + 0x10000 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000 xc_dom_alloc_page : start info : 0xffffffffa25c7000 (pfn 0x225c7) xc_dom_alloc_page : xenstore : 0xffffffffa25c8000 (pfn 0x225c8) xc_dom_alloc_page : console : 0xffffffffa25c9000 (pfn 0x225c9) nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s) nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s) nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s) nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s) xc_dom_alloc_segment: page tables : 0xffffffffa25ca000 -> 0xffffffffa26e1000 (pfn 0x225ca + 0x117 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000 xc_dom_alloc_page : boot stack : 0xffffffffa26e1000 (pfn 0x226e1) xc_dom_build_image : virt_alloc_end : 0xffffffffa26e2000 xc_dom_build_image : virt_pgtab_end : 0xffffffffa2800000 So the physical memory and virtual (using __START_KERNEL_map addresses) layout looks as so: phys __ka /------------\ /-------------------\ | 0 | empty | 0xffffffff80000000| | .. | | .. | | 16MB | <= kernel starts | 0xffffffff81000000| | .. | | | | 30MB | <= kernel ends => | 0xffffffff81e43000| | .. | & ramdisk starts | .. | | 293MB | <= ramdisk ends=> | 0xffffffff925c7000| | .. | & P2M starts | .. | | .. | | .. | | 549MB | <= P2M ends => | 0xffffffffa25c7000| | .. | start_info | 0xffffffffa25c7000| | .. | xenstore | 0xffffffffa25c8000| | .. | cosole | 0xffffffffa25c9000| | 549MB | <= page tables => | 0xffffffffa25ca000| | .. | | | | 550MB | <= PGT end => | 0xffffffffa26e1000| | .. | boot stack | | \------------/ \-------------------/ As can be seen, the ramdisk, P2M and pagetables are taking a bit of __ka addresses space. Which is a problem since the MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits right in there! This results during bootup with the inability to load modules, with this error: ------------[ cut here ]------------ WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370() Call Trace: [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0 [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20 [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370 [<ffffffff81130c4d>] map_vm_area+0x2d/0x50 [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80 [<ffffffff810c6186>] ? load_module+0x66/0x19c0 [<ffffffff8105cadc>] module_alloc+0x5c/0x60 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80 [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80 [<ffffffff810c70c3>] load_module+0xfa3/0x19c0 [<ffffffff812491f6>] ? security_file_permission+0x86/0x90 [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220 [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b ---[ end trace fd8f7704fdea0291 ]--- vmalloc: allocation failure, allocated 16384 of 20480 bytes modprobe: page allocation failure: order:0, mode:0xd2 Since the __va and __ka are 1:1 up to MODULES_VADDR and cleanup_highmap rids __ka of the ramdisk mapping, what we want to do is similar - get rid of the P2M in the __ka address space. There are two ways of fixing this: 1) All P2M lookups instead of using the __ka address would use the __va address. This means we can safely erase from __ka space the PMD pointers that point to the PFNs for P2M array and be OK. 2). Allocate a new array, copy the existing P2M into it, revector the P2M tree to use that, and return the old P2M to the memory allocate. This has the advantage that it sets the stage for using XEN_ELF_NOTE_INIT_P2M feature. That feature allows us to set the exact virtual address space we want for the P2M - and allows us to boot as initial domain on large machines. So we pick option 2). This patch only lays the groundwork in the P2M code. The patch that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree." Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." ↵Konrad Rzeszutek Wilk2012-08-231-5/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and "xen/x86: Use memblock_reserve for sensitive areas." This reverts commit 806c312e50f122c47913145cf884f53dd09d9199 and commit 59b294403e9814e7c1154043567f0d71bac7a511. And also documents setup.c and why we want to do it that way, which is that we tried to make the the memblock_reserve more selective so that it would be clear what region is reserved. Sadly we ran in the problem wherein on a 64-bit hypervisor with a 32-bit initial domain, the pt_base has the cr3 value which is not neccessarily where the pagetable starts! As Jan put it: " Actually, the adjustment turns out to be correct: The page tables for a 32-on-64 dom0 get allocated in the order "first L1", "first L2", "first L3", so the offset to the page table base is indeed 2. When reading xen/include/public/xen.h's comment very strictly, this is not a violation (since there nothing is said that the first thing in the page table space is pointed to by pt_base; I admit that this seems to be implied though, namely do I think that it is implied that the page table space is the range [pt_base, pt_base + nt_pt_frames), whereas that range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames), which - without a priori knowledge - the kernel would have difficulty to figure out)." - so lets just fall back to the easy way and reserve the whole region. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/x86: Use memblock_reserve for sensitive areas.Konrad Rzeszutek Wilk2012-08-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | instead of a big memblock_reserve. This way we can be more selective in freeing regions (and it also makes it easier to understand where is what). [v1: Move the auto_translate_physmap to proper line] [v2: Per Stefano suggestion add more comments] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/p2m: Fix the comment describing the P2M tree.Konrad Rzeszutek Wilk2012-08-211-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | It mixed up the p2m_mid_missing with p2m_missing. Also remove some extra spaces. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | xen/m2p: do not reuse kmap_op->dev_bus_addrStefano Stabellini2012-09-121-16/+11Star
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | xen/p2m: Fix one-off error in checking the P2M tree directory.Konrad Rzeszutek Wilk2012-09-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES inclusive) when trying to figure out whether we can re-use some of the P2M middle leafs. Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512 we would try to use the 512th entry. Fortunately for us the p2m_top_index has a check for this: BUG_ON(pfn >= MAX_P2M_PFN); which we hit and saw this: (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1.2-OVM x86_64 debug=n Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff819cadeb>] (XEN) RFLAGS: 0000000000000212 EM: 1 CONTEXT: pv guest (XEN) rax: ffffffff81db5000 rbx: ffffffff81db4000 rcx: 0000000000000000 (XEN) rdx: 0000000000480211 rsi: 0000000000000000 rdi: ffffffff81db4000 (XEN) rbp: ffffffff81793db8 rsp: ffffffff81793d38 r8: 0000000008000000 (XEN) r9: 4000000000000000 r10: 0000000000000000 r11: ffffffff81db7000 (XEN) r12: 0000000000000ff8 r13: ffffffff81df1ff8 r14: ffffffff81db6000 (XEN) r15: 0000000000000ff8 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000000661795000 cr2: 0000000000000000 Fixes-Oracle-Bug: 14570662 CC: stable@vger.kernel.org # only for v3.5 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID.Konrad Rzeszutek Wilk2012-08-171-3/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If P2M leaf is completly packed with INVALID_P2M_ENTRY or with 1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf with either a p2m_missing or p2m_identity respectively. The old page (which was created via extend_brk or was grafted on from the mfn_list) can be re-used for setting new PFNs. This also means we can remove git commit: 5bc6f9888db5739abfa0cae279b4b442e4db8049 xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back which tried to fix this. and make the amount that is required to be reserved much smaller. CC: stable@vger.kernel.org # for 3.5 only. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back.Konrad Rzeszutek Wilk2012-08-021-0/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we release pages back during bootup: Freeing 9d-100 pfn range: 99 pages freed Freeing 9cf36-9d0d2 pfn range: 412 pages freed Freeing 9f6bd-9f6bf pfn range: 2 pages freed Freeing 9f714-9f7bf pfn range: 171 pages freed Freeing 9f7e0-9f7ff pfn range: 31 pages freed Freeing 9f800-100000 pfn range: 395264 pages freed Released 395979 pages of unused memory We then try to populate those pages back. In the P2M tree however the space for those leafs must be reserved - as such we use extend_brk. We reserve 8MB of _brk space, which means we can fit over 1048576 PFNs - which is more than we should ever need. Without this, on certain compilation of the kernel we would hit: (XEN) domain_crash_sync called from entry.S (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff818aad3b>] (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (XEN) rax: ffffffff81a7c000 rbx: 000000000000003d rcx: 0000000000001000 (XEN) rdx: ffffffff81a7b000 rsi: 0000000000001000 rdi: 0000000000001000 (XEN) rbp: ffffffff81801cd8 rsp: ffffffff81801c98 r8: 0000000000100000 (XEN) r9: ffffffff81a7a000 r10: 0000000000000001 r11: 0000000000000003 (XEN) r12: 0000000000000004 r13: 0000000000000004 r14: 000000000000003d (XEN) r15: 00000000000001e8 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000125803000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81801c98: .. which is extend_brk hitting a BUG_ON. Interestingly enough, most of the time we are not going to hit this b/c the _brk space is quite large (v3.5): ffffffff81a25000 B __brk_base ffffffff81e43000 B __brk_limit = ~4MB. vs earlier kernels (with this back-ported), the space is smaller: ffffffff81a25000 B __brk_base ffffffff81a7b000 B __brk_limit = 344 kBytes. where we would certainly hit this and hit extend_brk. Note that git commit c3d93f880197953f86ab90d9da4744e926b38e33 (xen: populate correct number of pages when across mem boundary (v2)) exposed this bug). [v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion] CC: stable@vger.kernel.org #only for 3.5 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen: mark local pages as FOREIGN in the m2p_overrideStefano Stabellini2012-06-141-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the frontend and the backend reside on the same domain, even if we add pages to the m2p_override, these pages will never be returned by mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will always fail, so the pfn of the frontend will be returned instead (resulting in a deadlock because the frontend pages are already locked). INFO: task qemu-system-i38:1085 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-system-i38 D ffff8800cfc137c0 0 1085 1 0x00000000 ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0 ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0 ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0 Call Trace: [<ffffffff81101ee0>] ? __lock_page+0x70/0x70 [<ffffffff81a0fdd9>] schedule+0x29/0x70 [<ffffffff81a0fe80>] io_schedule+0x60/0x80 [<ffffffff81101eee>] sleep_on_page+0xe/0x20 [<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff81101ed7>] __lock_page+0x67/0x70 [<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811867e6>] ? bio_add_page+0x36/0x40 [<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60 [<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70 [<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00 [<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00 [<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00 [<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0 [<ffffffff81104027>] generic_file_aio_read+0x737/0x780 [<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0 [<ffffffff811038f0>] ? find_get_pages+0x150/0x150 [<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0 [<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90 [<ffffffff81198856>] aio_run_iocb+0x66/0x1a0 [<ffffffff811998b8>] do_io_submit+0x708/0xb90 [<ffffffff81199d50>] sys_io_submit+0x10/0x20 [<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b The explanation is in the comment within the code: We need to do this because the pages shared by the frontend (xen-blkfront) can be already locked (lock_page, called by do_read_cache_page); when the userspace backend tries to use them with direct_IO, mfn_to_pfn returns the pfn of the frontend, so do_blockdev_direct_IO is going to try to lock the same pages again resulting in a deadlock. A simplified call graph looks like this: pygrub QEMU ----------------------------------------------- do_read_cache_page io_submit | | lock_page ext3_direct_IO | bio_add_page | lock_page Internally the xen-blkback uses m2p_add_override to swizzle (temporarily) a 'struct page' to have a different MFN (so that it can point to another guest). It also can easily find out whether another pfn corresponding to the mfn exists in the m2p, and can set the FOREIGN bit in the p2m, making sure that mfn_to_pfn returns the pfn of the backend. This allows the backend to perform direct_IO on these pages, but as a side effect prevents the frontend from using get_user_pages_fast on them while they are being shared with the backend. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/p2m: An early bootup variant of set_phys_to_machineKonrad Rzeszutek Wilk2012-04-061-0/+15
| | | | | | | | | | During early bootup we can't use alloc_page, so to allocate leaf pages in the P2M we need to use extend_brk. For that we are utilizing the early_alloc_p2m and early_alloc_p2m_middle functions to do the job for us. This function follows the same logic as set_phys_to_machine. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/p2m: Collapse early_alloc_p2m_middle redundant checks.Konrad Rzeszutek Wilk2012-04-061-13/+12Star
| | | | | | | | | At the start of the function we were checking for idx != 0 and bailing out. And later calling extend_brk if idx != 0. That is unnecessary so remove that checks. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argumentKonrad Rzeszutek Wilk2012-04-061-5/+5
| | | | | | | | | | | | | For identity cases we want to call reserve_brk only on the boundary conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is to work around identify regions (PCI spaces, gaps in E820) which are not aligned on 2MB regions. However for the case were we want to allocate P2M middle leafs at the early bootup stage, irregardless of this alignment check we need some means of doing that. For that we provide the new argument. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/p2m: Move code around to allow for better re-usage.Konrad Rzeszutek Wilk2012-04-061-28/+34
| | | | | | | | | | | We are going to be using the early_alloc_p2m (and early_alloc_p2m_middle) code in follow up patches which are not related to setting identity pages. Hence lets move the code out in its own function and rename them as appropiate. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*-. Merge branches 'stable/bug.fixes-3.2' and 'stable/mmu.fixes' of ↵Linus Torvalds2011-10-251-19/+109
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/p2m/debugfs: Make type_name more obvious. xen/p2m/debugfs: Fix potential pointer exception. xen/enlighten: Fix compile warnings and set cx to known value. xen/xenbus: Remove the unnecessary check. xen/irq: If we fail during msi_capability_init return proper error code. xen/events: Don't check the info for NULL as it is already done. xen/events: BUG() when we can't allocate our event->irq array. * 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Fix selfballooning and ensure it doesn't go too far xen/gntdev: Fix sleep-inside-spinlock xen: modify kernel mappings corresponding to granted pages xen: add an "highmem" parameter to alloc_xenballooned_pages xen/p2m: Use SetPagePrivate and its friends for M2P overrides. xen/p2m: Make debug/xen/mmu/p2m visible again. Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
| | * xen: modify kernel mappings corresponding to granted pagesStefano Stabellini2011-09-291-10/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we want to use granted pages for AIO, changing the mappings of a user vma and the corresponding p2m is not enough, we also need to update the kernel mappings accordingly. Currently this is only needed for pages that are created for user usages through /dev/xen/gntdev. As in, pages that have been in use by the kernel and use the P2M will not need this special mapping. However there are no guarantees that in the future the kernel won't start accessing pages through the 1:1 even for internal usage. In order to avoid the complexity of dealing with highmem, we allocated the pages lowmem. We issue a HYPERVISOR_grant_table_op right away in m2p_add_override and we remove the mappings using another HYPERVISOR_grant_table_op in m2p_remove_override. Considering that m2p_add_override and m2p_remove_override are called once per page we use multicalls and hypercall batching. Use the kmap_op pointer directly as argument to do the mapping as it is guaranteed to be present up until the unmapping is done. Before issuing any unmapping multicalls, we need to make sure that the mapping has already being done, because we need the kmap->handle to be set correctly. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Removed GRANT_FRAME_BIT usage] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/p2m: Use SetPagePrivate and its friends for M2P overrides.Konrad Rzeszutek Wilk2011-09-241-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | We use the page->private field and hence should use the proper macros and set proper bits. Also WARN_ON in case somebody tries to overwrite our data. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * xen/p2m: Make debug/xen/mmu/p2m visible again.Konrad Rzeszutek Wilk2011-09-241-3/+32
| |/ |/| | | | | | | | | | | | | | | | | We dropped a lot of the MMU debugfs in favour of using tracing API - but there is one which just provides mostly static information that was made invisible by this change. Bring it back. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/p2m/debugfs: Make type_name more obvious.Konrad Rzeszutek Wilk2011-10-191-2/+5
| | | | | | | | | | | | | | | | | | Per Ian Campbell suggestion to defend against future breakage in case we expand the P2M values, incorporate the defines in the string array. Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen/p2m/debugfs: Fix potential pointer exception.Konrad Rzeszutek Wilk2011-10-191-1/+1
|/ | | | | | | | | We could be referencing the last + 1 element of level_name[] array which would cause a pointer exception, because of the initial setup of lvl=4. [v1: No need to do this for type_name, pointed out by Ian Campbell] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
*---. Merge branches 'stable/irq', 'stable/p2m.bugfixes', 'stable/e820.bugfixes' ↵Linus Torvalds2011-05-201-2/+28
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and 'stable/mmu.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/irq' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: do not clear and mask evtchns in __xen_evtchn_do_upcall * 'stable/p2m.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/p2m: Create entries in the P2M_MFN trees's to track 1-1 mappings * 'stable/e820.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/setup: Fix for incorrect xen_extra_mem_start initialization under 32-bit xen/setup: Ignore E820_UNUSABLE when setting 1-1 mappings. * 'stable/mmu.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen mmu: fix a race window causing leave_mm BUG()
| | * | xen/p2m: Create entries in the P2M_MFN trees's to track 1-1 mappingsKonrad Rzeszutek Wilk2011-05-121-2/+28
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. when applicable. We need to track in the p2m_mfn and p2m_mfn_p the MFNs and pointers, respectivly, for the P2M entries that are allocated for the identity mappings. Without this, a PV domain with an E820 that triggers the 1-1 mapping to kick in, won't be able to be restored as the P2M won't have the identity mappings. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions.Konrad Rzeszutek Wilk2011-04-201-1/+2
| | | | | | | | | | | | | | | | | | | | | If the backends, which use these two functions, are compiled as a module we need these two functions to be exported. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override.Konrad Rzeszutek Wilk2011-04-181-6/+4Star
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only supported the M2P (and P2M) override only for the GNTMAP_contains_pte type mappings. Meaning that we grants operations would "contain the machine address of the PTE to update" If the flag is unset, then the grant operation is "contains a host virtual address". The latter case means that the Hypervisor takes care of updating our page table (specifically the PTE entry) with the guest's MFN. As such we should not try to do anything with the PTE. Previous to this patch we would try to clear the PTE which resulted in Xen hypervisor being upset with us: (XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067 (XEN) domain_crash called from mm.c:1067 (XEN) Domain 0 (vcpu#0) crashed on cpu#3: (XEN) ----[ Xen-4.0-110228 x86_64 debug=y Not tainted ]---- and crashing us. This patch allows us to inhibit the PTE clearing in the PV guest if the GNTMAP_contains_pte is not set. On the m2p_remove_override path we provide the same parameter. Sadly in the grant-table driver we do not have a mechanism to tell m2p_remove_override whether to clear the PTE or not. Since the grant-table driver is used by user-space, we can safely assume that it operates only on PTE's. Hence the implementation for it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future we can implement the support for this. It will require some extra accounting structure to keep track of the page[i], and the flag. [v1: Added documentation details, made it return -EOPNOTSUPP instead of trying to do a half-way implementation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | xen: fix p2m section mismatchesRandy Dunlap2011-03-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix section mismatch warnings: set_phys_range_identity() is called by __init xen_set_identity(), so also mark set_phys_range_identity() as __init. then: __early_alloc_p2m() is called set_phys_range_identity(), so also mark __early_alloc_p2m() as __init. WARNING: arch/x86/built-in.o(.text+0x7856): Section mismatch in reference from the function __early_alloc_p2m() to the function .init.text:extend_brk() The function __early_alloc_p2m() references the function __init extend_brk(). This is often because __early_alloc_p2m lacks a __init annotation or the annotation of extend_brk is wrong. WARNING: arch/x86/built-in.o(.text+0x7967): Section mismatch in reference from the function set_phys_range_identity() to the function .init.text:extend_brk() The function set_phys_range_identity() references the function __init extend_brk(). This is often because set_phys_range_identity lacks a __init annotation or the annotation of extend_brk is wrong. [v2: Per Stephen Hemming recommonedation made __early_alloc_p2m static] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | xen/p2m: Allocate p2m tracking pages on overrideDaniel De Graaf2011-03-241-2/+4
| | | | | | | | | | | | | | | | | | | | It is possible to add a p2m override on pages that are currently mapped to INVALID_P2M_ENTRY; in particular, this will happen when using ballooned pages in gntdev. This means that set_phys_to_machine must be used instead of __set_phys_to_machine. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| |
| \
| \
| \
*---. \ Merge branches 'stable/ia64', 'stable/blkfront-cleanup' and 'stable/cleanup' ↵Linus Torvalds2011-03-151-3/+3
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/ia64' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: ia64 build broken due to "xen: switch to new schedop hypercall by default." * 'stable/blkfront-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Union the blkif_request request specific fields * 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: annotate functions which only call into __init at start of day xen p2m: annotate variable which appears unused xen: events: mark cpu_evtchn_mask_p as __refdata
| | | * xen: annotate functions which only call into __init at start of dayIan Campbell2011-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both xen_hvm_init_shared_info and xen_build_mfn_list_list can be called at resume time as well as at start of day but only reference __init functions (extend_brk) at start of day. Hence annotate with __ref. WARNING: arch/x86/built-in.o(.text+0x4f1): Section mismatch in reference from the function xen_hvm_init_shared_info() to the function .init.text:extend_brk() The function xen_hvm_init_shared_info() references the function __init extend_brk(). This is often because xen_hvm_init_shared_info lacks a __init annotation or the annotation of extend_brk is wrong. xen_hvm_init_shared_info calls extend_brk() iff !shared_info_page and initialises shared_info_page with the result. This happens at start of day only. WARNING: arch/x86/built-in.o(.text+0x599b): Section mismatch in reference from the function xen_build_mfn_list_list() to the function .init.text:extend_brk() The function xen_build_mfn_list_list() references the function __init extend_brk(). This is often because xen_build_mfn_list_list lacks a __init annotation or the annotation of extend_brk is wrong. (this warning occurs multiple times) xen_build_mfn_list_list only calls extend_brk() at boot time, while building the initial mfn list list Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | | * xen p2m: annotate variable which appears unusedIan Campbell2011-02-111-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CC arch/x86/xen/p2m.o arch/x86/xen/p2m.c: In function 'm2p_remove_override': arch/x86/xen/p2m.c:460: warning: 'address' may be used uninitialized in this function arch/x86/xen/p2m.c: In function 'm2p_add_override': arch/x86/xen/p2m.c:426: warning: 'address' may be used uninitialized in this function In actual fact address is inialised in one "if (!PageHighMem(page))" statement and used in a second and so is always initialised before use. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | Merge branches 'stable/p2m-identity.v4.9.1' and 'stable/e820' of ↵Linus Torvalds2011-03-151-7/+317
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/p2m-identity.v4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set.. xen/m2p: No need to catch exceptions when we know that there is no RAM xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set. xen/debugfs: Add 'p2m' file for printing out the P2M layout. xen/setup: Set identity mapping for non-RAM E820 and E820 gaps. xen/mmu: WARN_ON when racing to swap middle leaf. xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN. xen/mmu: Add the notion of identity (1-1) mapping. xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. * 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow. xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps.
| * | xen/debugfs: Add 'p2m' file for printing out the P2M layout.Konrad Rzeszutek Wilk2011-03-141-0/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We walk over the whole P2M tree and construct a simplified view of which PFN regions belong to what level and what type they are. Only enabled if CONFIG_XEN_DEBUG_FS is set. [v2: UNKN->UNKNOWN, use uninitialized_var] [v3: Rebased on top of mmu->p2m code split] [v4: Fixed the else if] Reviewed-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen/mmu: WARN_ON when racing to swap middle leaf.Konrad Rzeszutek Wilk2011-03-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The initial bootup code uses set_phys_to_machine quite a lot, and after bootup it would be used by the balloon driver. The balloon driver does have mutex lock so this should not be necessary - but just in case, add a WARN_ON if we do hit this scenario. If we do fail this, it is OK to continue as there is a backup mechanism (VM_IO) that can bypass the P2M and still set the _PAGE_IOMAP flags. [v2: Change from WARN to BUG_ON] [v3: Rebased on top of xen->p2m code split] [v4: Change from BUG_ON to WARN] Reviewed-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen/mmu: Add the notion of identity (1-1) mapping.Konrad Rzeszutek Wilk2011-03-141-2/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our P2M tree structure is a three-level. On the leaf nodes we set the Machine Frame Number (MFN) of the PFN. What this means is that when one does: pfn_to_mfn(pfn), which is used when creating PTE entries, you get the real MFN of the hardware. When Xen sets up a guest it initially populates a array which has descending (or ascending) MFN values, as so: idx: 0, 1, 2 [0x290F, 0x290E, 0x290D, ..] so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list starts looking quite random. We graft this structure on our P2M tree structure and stick in those MFN in the leafs. But for all other leaf entries, or for the top root, or middle one, for which there is a void entry, we assume it is "missing". So pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY. We add the possibility of setting 1-1 mappings on certain regions, so that: pfn_to_mfn(0xc0000)=0xc0000 The benefit of this is, that we can assume for non-RAM regions (think PCI BARs, or ACPI spaces), we can create mappings easily b/c we get the PFN value to match the MFN. For this to work efficiently we introduce one new page p2m_identity and allocate (via reserved_brk) any other pages we need to cover the sides (1GB or 4MB boundary violations). All entries in p2m_identity are set to INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs, no other fancy value). On lookup we spot that the entry points to p2m_identity and return the identity value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry points to an allocated page, we just proceed as before and return the PFN. If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions (pfn_to_mfn). The reason for having the IDENTITY_FRAME_BIT instead of just returning the PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a non-identity pfn. To protect ourselves against we elect to set (and get) the IDENTITY_FRAME_BIT on all identity mapped PFNs. This simplistic diagram is used to explain the more subtle piece of code. There is also a digram of the P2M at the end that can help. Imagine your E820 looking as so: 1GB 2GB /-------------------+---------\/----\ /----------\ /---+-----\ | System RAM | Sys RAM ||ACPI| | reserved | | Sys RAM | \-------------------+---------/\----/ \----------/ \---+-----/ ^- 1029MB ^- 2001MB [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)] And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB is actually not present (would have to kick the balloon driver to put it in). When we are told to set the PFNs for identity mapping (see patch: "xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start of the PFN and the end PFN (263424 and 512256 respectively). The first step is to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page covers 512^2 of page estate (1GB) and in case the start or end PFN is not aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to end pfn. We reserve_brk top leaf pages if they are missing (means they point to p2m_mid_missing). With the E820 example above, 263424 is not 1GB aligned so we allocate a reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000. Each entry in the allocate page is "missing" (points to p2m_missing). Next stage is to determine if we need to do a more granular boundary check on the 4MB (or 2MB depending on architecture) off the start and end pfn's. We check if the start pfn and end pfn violate that boundary check, and if so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer granularity of setting which PFNs are missing and which ones are identity. In our example 263424 and 512256 both fail the check so we reserve_brk two pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values) and assign them to p2m[1][2] and p2m[1][488] respectively. At this point we would at minimum reserve_brk one page, but could be up to three. Each call to set_phys_range_identity has at maximum a three page cost. If we were to query the P2M at this stage, all those entries from start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY ("missing"). The next step is to walk from the start pfn to the end pfn setting the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'. If we find that the middle leaf is pointing to p2m_missing we can swap it over to p2m_identity - this way covering 4MB (or 2MB) PFN space. At this point we do not need to worry about boundary aligment (so no need to reserve_brk a middle page, figure out which PFNs are "missing" and which ones are identity), as that has been done earlier. If we find that the middle leaf is not occupied by p2m_identity or p2m_missing, we dereference that page (which covers 512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example 263424 and 512256 end up there, and we set from p2m[1][2][256->511] and p2m[1][488][0->256] with IDENTITY_FRAME_BIT set. All other regions that are void (or not filled) either point to p2m_missing (considered missing) or have the default value of INVALID_P2M_ENTRY (also considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511] contain the INVALID_P2M_ENTRY value and are considered "missing." This is what the p2m ends up looking (for the E820 above) with this fabulous drawing: p2m /--------------\ /-----\ | &mfn_list[0],| /-----------------\ | 0 |------>| &mfn_list[1],| /---------------\ | ~0, ~0, .. | |-----| | ..., ~0, ~0 | | ~0, ~0, [x]---+----->| IDENTITY [@256] | | 1 |---\ \--------------/ | [p2m_identity]+\ | IDENTITY [@257] | |-----| \ | [p2m_identity]+\\ | .... | | 2 |--\ \-------------------->| ... | \\ \----------------/ |-----| \ \---------------/ \\ | 3 |\ \ \\ p2m_identity |-----| \ \-------------------->/---------------\ /-----------------\ | .. +->+ | [p2m_identity]+-->| ~0, ~0, ~0, ... | \-----/ / | [p2m_identity]+-->| ..., ~0 | / /---------------\ | .... | \-----------------/ / | IDENTITY[@0] | /-+-[x], ~0, ~0.. | / | IDENTITY[@256]|<----/ \---------------/ / | ~0, ~0, .... | | \---------------/ | p2m_missing p2m_missing /------------------\ /------------\ | [p2m_mid_missing]+---->| ~0, ~0, ~0 | | [p2m_mid_missing]+---->| ..., ~0 | \------------------/ \------------/ where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT) Reviewed-by: Ian Campbell <ian.campbell@citrix.com> [v5: Changed code to use ranges, added ASCII art] [v6: Rebased on top of xen->p2m code split] [v4: Squished patches in just this one] [v7: Added RESERVE_BRK for potentially allocated pages] [v8: Fixed alignment problem] [v9: Changed 1<<3X to 1<<BITS_PER_LONG-X] [v10: Copied git commit description in the p2m code + Add Review tag] [v11: Title had '2-1' - should be '1-1' mapping] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.Konrad Rzeszutek Wilk2011-03-031-5/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this patch, we diligently set regions that will be used by the balloon driver to be INVALID_P2M_ENTRY and under the ownership of the balloon driver. We are OK using the __set_phys_to_machine as we do not expect to be allocating any P2M middle or entries pages. The set_phys_to_machine has the side-effect of potentially allocating new pages and we do not want that at this stage. We can do this because xen_build_mfn_list_list will have already allocated all such pages up to xen_max_p2m_pfn. We also move the check for auto translated physmap down the stack so it is present in __set_phys_to_machine. [v2: Rebased with mmu->p2m code split] Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | xen/p2m: Mark INVALID_P2M_ENTRY the mfn_list past max_pfn.Stefan Bader2011-01-271-12/+6Star
|/ / | | | | | | | | | | | | | | | | In case the mfn_list does not have enough entries to fill a p2m page we do not want the entries from max_pfn up to the boundary to be filled with unknown values. Hence set them to INVALID_P2M_ENTRY. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* / xen: p2m: correctly initialize partial p2m leafStefan Bader2011-01-211-1/+19
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After changing the p2m mapping to a tree by commit 58e05027b530ff081ecea68e38de8d59db8f87e0 xen: convert p2m to a 3 level tree and trying to boot a DomU with 615MB of memory, the following crash was observed in the dump: kernel direct mapping tables up to 26f00000 @ 1ec4000-1fff000 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c0107397>] xen_set_pte+0x27/0x60 *pdpt = 0000000000000000 *pde = 0000000000000000 Adding further debug statements showed that when trying to set up pfn=0x26700 the returned mapping was invalid. pfn=0x266ff calling set_pte(0xc1fe77f8, 0x6b3003) pfn=0x26700 calling set_pte(0xc1fe7800, 0x3) Although the last_pfn obtained from the startup info is 0x26700, which should in turn not be hit, the additional 8MB which are added as extra memory normally seem to be ok. This lead to looking into the initial p2m tree construction, which uses the smaller value and assuming that there is other code handling the extra memory. When the p2m tree is set up, the leaves are directly pointed to the array which the domain builder set up. But if the mapping is not on a boundary that fits into one p2m page, this will result in the last leaf being only partially valid. And as the invalid entries are not initialized in that case, things go badly wrong. I am trying to fix that by checking whether the current leaf is a complete map and if not, allocate a completely new page and copy only the valid pointers there. This may not be the most efficient or elegant solution, but at least it seems to allow me booting DomUs with memory assignments all over the range. BugLink: http://bugs.launchpad.net/bugs/686692 [v2: Redid a bit of commit wording and fixed a compile warning] Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/p2m: Fix module linking error.Konrad Rzeszutek Wilk2011-01-111-0/+1
| | | | | | | Fixes: ERROR: "m2p_find_override_pfn" [drivers/block/xen-blkfront.ko] undefined! Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen p2m: clear the old pte when adding a page to m2p_overrideJeremy Fitzhardinge2011-01-111-4/+45
| | | | | | | | | | | | | | | | | When adding a page to m2p_override we change the p2m of the page so we need to also clear the old pte of the kernel linear mapping because it doesn't correspond anymore. When we remove the page from m2p_override we restore the original p2m of the page and we also restore the old pte of the kernel linear mapping. Before changing the p2m mappings in m2p_add_override and m2p_remove_override, check that the page passed as argument is valid and return an error if it is not. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen p2m: transparently change the p2m mappings in the m2p overrideStefano Stabellini2011-01-111-0/+12
| | | | | | | | | | In m2p_add_override store the original mfn into page->index and then change the p2m mapping, setting mfns as FOREIGN_FRAME. In m2p_remove_override restore the original mapping. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen: add m2p override mechanismJeremy Fitzhardinge2011-01-111-0/+80
| | | | | | | | | | | | Add a simple hashtable based mechanism to override some portions of the m2p, so that we can find out the pfn corresponding to an mfn of a granted page. In fact entries corresponding to granted pages in the m2p hold the original pfn value of the page in the source domain that granted it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen: move p2m handling to separate fileJeremy Fitzhardinge2011-01-111-0/+376
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>