summaryrefslogtreecommitdiffstats
path: root/util/mmap-alloc.c
Commit message (Collapse)AuthorAgeFilesLines
* memory: alloc RAM from file at offsetJagannathan Raman2021-02-091-3/+5
| | | | | | | | | | | | | Allow RAM MemoryRegion to be created from an offset in a file, instead of allocating at offset of 0 by default. This is needed to synchronize RAM between QEMU & remote process. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 609996697ad8617e3b01df38accc5c208c24d74e.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* memory: add readonly support to memory_region_init_ram_from_file()Stefan Hajnoczi2021-02-011-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is currently no way to open(O_RDONLY) and mmap(PROT_READ) when creating a memory region from a file. This functionality is needed since the underlying host file may not allow writing. Add a bool readonly argument to memory_region_init_ram_from_file() and the APIs it calls. Extend memory_region_init_ram_from_file() rather than introducing a memory_region_init_rom_from_file() API so that callers can easily make a choice between read/write and read-only at runtime without calling different APIs. No new RAMBlock flag is introduced for read-only because it's unclear whether RAMBlocks need to know that they are read-only. Pass a bool readonly argument instead. Both of these design decisions can be changed in the future. It just seemed like the simplest approach to me. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20210104171320.575838-2-stefanha@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* core: replace getpagesize() with qemu_real_host_page_sizeWei Yang2019-10-261-5/+5
| | | | | | | | | | | | | | | | | | | | | There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()Zhang Yi2019-04-251-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a file supporting DAX is used as vNVDIMM backend, mmap it with MAP_SYNC flag in addition which can ensure file system metadata synced in each guest writes to the backend file, without other QEMU actions (e.g., periodic fsync() by QEMU). Current, We have below different possible use cases: 1. pmem=on is set, shared=on is set, MAP_SYNC supported: a: backend is a dax supporting file. - MAP_SYNC will active. b: backend is not a dax supporting file. - mmap will trigger a warning. then MAP_SYNC flag will be ignored 2. The rest of cases: - we will never pass the MAP_SYNC to mmap2 Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> [ehabkost: Rebased patch to latest code on master] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Tested-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190422004849.26463-2-richardw.yang@linux.intel.com> [ehabkost: squashed documentation patch] Message-Id: <20190422004849.26463-3-richardw.yang@linux.intel.com> [ehabkost: documentation fixup] Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmapZhang Yi2019-04-251-1/+5
| | | | | | | | | | | | | | besides the existing 'shared' flags, we are going to add 'is_pmem' to qemu_ram_mmap(), which indicated the memory backend file is a persist memory. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Message-Id: <786c46862cfeb253ee0ea2f44d62ffe76edb7fa4.1549555521.git.yi.z.zhang@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* mmap-alloc: fix hugetlbfs misaligned length in ppc64Murilo Opsfelder Araujo2019-02-041-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 7197fb4058bcb68986bae2bb2c04d6370f3e7218 ("util/mmap-alloc: fix hugetlb support on ppc64") fixed Huge TLB mappings on ppc64. However, we still need to consider the underlying huge page size during munmap() because it requires that both address and length be a multiple of the underlying huge page size for Huge TLB mappings. Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES section of the munmap(2) manual: "For munmap(), addr and length must both be a multiple of the underlying huge page size." On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB mappings because the mapped segment can be aligned with the underlying huge page size, not aligned with the native system page size, as returned by getpagesize(). This has the side effect of not releasing huge pages back to the pool after a hugetlbfs file-backed memory device is hot-unplugged. This patch fixes the situation in qemu_ram_mmap() and qemu_ram_munmap() by considering the underlying page size on ppc64. After this patch, memory hot-unplug releases huge pages back to the pool. Fixes: 7197fb4058bcb68986bae2bb2c04d6370f3e7218 Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* mmap-alloc: unfold qemu_ram_mmap()Murilo Opsfelder Araujo2019-02-041-19/+34
| | | | | | | | | | | | | | Unfold parts of qemu_ram_mmap() for the sake of understanding, moving declarations to the top, and keeping architecture-specifics in the ifdef-else blocks. No changes in the function behaviour. Give ptr and ptr1 meaningful names: ptr -> guardptr : pointer to the PROT_NONE guard region ptr1 -> ptr : pointer to the mapped memory returned to caller Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* Make qemu_mempath_getpagesize() accept NULLDavid Gibson2018-04-271-12/+14
| | | | | | | | | | | | | | | | | | qemu_mempath_getpagesize() gets the effective (host side) page size for a block of memory backed by an mmap()ed file on the host. It requires the mem_path parameter to be non-NULL. This ends up meaning all the callers need a different case for handling anonymous memory (for memory-backend-ram or default memory with -mem-path is not specified). We can make all those callers a little simpler by having qemu_mempath_getpagesize() accept NULL, and treat that as the anonymous memory case. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
* sparc: Make sure we mmap at SHMLBA alignmentPeter Maydell2017-12-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | SPARC Linux has an oddity that it insists that mmap() of MAP_FIXED memory must be at an alignment defined by SHMLBA, which is more aligned than the page size (typically, SHMLBA alignment is to 16K, and pages are 8K). This is a relic of ancient hardware that had cache aliasing constraints, but even on modern hardware the kernel still insists on the alignment. To ensure that we get mmap() alignment sufficient to make the kernel happy, change QEMU_VMALLOC_ALIGN, qemu_fd_getpagesize() and qemu_mempath_getpagesize() to use the maximum of getpagesize() and SHMLBA. In particular, this allows 'make check' to pass on Sparc: we were previously failing the ivshmem tests. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 1512752248-17857-1-git-send-email-peter.maydell@linaro.org
* exec, kvm, target-ppc: Move getrampagesize() to common codeAlexey Kardashevskiy2017-03-031-0/+25
| | | | | | | | | | | | | | | | getrampagesize() returns the largest supported page size and mainly used to know if huge pages are enabled. However is implemented in target-ppc/kvm.c and not available in TCG or other architectures. This renames and moves gethugepagesize() to mmap-alloc.c where fd-based analog of it is already implemented. This renames and moves getrampagesize() to exec.c as it seems to be the common place for helpers like this. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* util/mmap-alloc: refactor a little bit for readabilityCao jin2017-01-241-6/+4Star
| | | | | | | | | | | | | | | | | | | | | | | 1st mmap returns *ptr* which aligns to host page size, | size + align | ------------------------------------------ ptr input param *align* could be 1M, or 2M, or host page size. After QEMU_ALIGN_UP, offset will >= 0 2nd mmap use flag MAP_FIXED, then it return ptr+offset, or else fail. If it success, then we will have something like: | offset | size | -------------------------------------- ptr ptr1 *ptr1* is what we really want to return, it equals ptr+offset. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
* util/mmap-alloc: check parameter before usingCao jin2017-01-241-3/+4
| | | | | | Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
* Use #include "..." for our own headers, <...> for othersMarkus Armbruster2016-07-121-1/+2
| | | | | | | | | | | | Tracked down with an ugly, brittle and probably buggy Perl script. Also move includes converted to <...> up so they get included before ours where that's obviously okay. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>
* os-posix: include sys/mman.hPaolo Bonzini2016-06-161-1/+0Star
| | | | | | | | | qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* util: Clean up includesPeter Maydell2016-02-041-2/+1Star
| | | | | | | | | | Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1454089805-5470-6-git-send-email-peter.maydell@linaro.org
* mmap-alloc: tweak a comment on ppc64Michael S. Tsirkin2015-12-221-4/+5
| | | | | | | | | | | | | | | | The comment I put in mmap-alloc to document the ppc64 rules refers to the previous revision of the patch: we don't look at memory alignment anymore, we check the fs from which the fd is mapped, instead. It's also not clear what does "in this case" refer to, rearrange text to make it clearer. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* util/mmap-alloc: fix hugetlb support on ppc64Michael S. Tsirkin2015-12-021-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM", it is no longer possible to back guest RAM with hugepages on ppc64 hosts: mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000 mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy) This is because on ppc64, Linux fixes a page size for a virtual address at mmap time, so we can't switch a range of memory from anonymous small pages to hugetlbs with MAP_FIXED. See commit d0f13e3c20b6fb73ccb467bdca97fa7cf5a574cd ("[POWERPC] Introduce address space "slices"") in Linux history for the details. Detect this and create the PROT_NONE mapping using the same fd. Naturally, this makes the guard page bigger with hugetlbfs. Based on patch by Greg Kurz. Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* mmap-alloc: fix error handlingMichael S. Tsirkin2015-10-291-2/+2
| | | | | | | | | | Existing callers are checking for MAP_FAILED, so we should return that on error. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* exec: factor out duplicate mmap codeMichael S. Tsirkin2015-10-211-0/+71
Anonymous and file-backed RAM allocation are now almost exactly the same. Reduce code duplication by moving RAM mmap code out of oslib-posix.c and exec.c. Reported-by: Marc-André Lureau <mlureau@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Thibaut Collet <thibaut.collet@6wind.com>