summaryrefslogtreecommitdiffstats
path: root/arch/arm64/mm/mmu.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'arm64-upstream' of ↵Linus Torvalds2018-02-081-0/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Catalin Marinas: "As I mentioned in the last pull request, there's a second batch of security updates for arm64 with mitigations for Spectre/v1 and an improved one for Spectre/v2 (via a newly defined firmware interface API). Spectre v1 mitigation: - back-end version of array_index_mask_nospec() - masking of the syscall number to restrict speculation through the syscall table - masking of __user pointers prior to deference in uaccess routines Spectre v2 mitigation update: - using the new firmware SMC calling convention specification update - removing the current PSCI GET_VERSION firmware call mitigation as vendors are deploying new SMCCC-capable firmware - additional branch predictor hardening for synchronous exceptions and interrupts while in user mode Meltdown v3 mitigation update: - Cavium Thunder X is unaffected but a hardware erratum gets in the way. The kernel now starts with the page tables mapped as global and switches to non-global if kpti needs to be enabled. Other: - Theoretical trylock bug fixed" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits) arm64: Kill PSCI_GET_VERSION as a variant-2 workaround arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: smccc: Implement SMCCC v1.1 inline primitive arm/arm64: smccc: Make function identifiers an unsigned quantity firmware/psci: Expose SMCCC version through psci_ops firmware/psci: Expose PSCI conduit arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: KVM: Turn kvm_psci_version into a static inline arm/arm64: KVM: Advertise SMCCC v1.1 arm/arm64: KVM: Implement PSCI 1.0 support arm/arm64: KVM: Add smccc accessors to PSCI code arm/arm64: KVM: Add PSCI_VERSION helper arm/arm64: KVM: Consolidate the PSCI include files arm64: KVM: Increment PC after handling an SMC trap arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: entry: Apply BP hardening for suspicious interrupts from EL0 arm64: entry: Apply BP hardening for high-priority synchronous exceptions arm64: futex: Mask __user pointers prior to dereference ...
| * arm64: mm: Permit transitioning from Global to Non-Global without BBMWill Deacon2018-02-061-0/+4
| | | | | | | | | | | | | | | | Break-before-make is not needed when transitioning from Global to Non-Global mappings, provided that the contiguous hint is not being used. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | Merge tag 'libnvdimm-for-4.16' of ↵Linus Torvalds2018-02-061-3/+6
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ross Zwisler: - Require struct page by default for filesystem DAX to remove a number of surprising failure cases. This includes failures with direct I/O, gdb and fork(2). - Add support for the new Platform Capabilities Structure added to the NFIT in ACPI 6.2a. This new table tells us whether the platform supports flushing of CPU and memory controller caches on unexpected power loss events. - Revamp vmem_altmap and dev_pagemap handling to clean up code and better support future future PCI P2P uses. - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}. - Enhance nfit_test so we can test some of the new things added in version 1.6 of the DSM specification. This includes testing firmware download and simulating the Last Shutdown State (LSS) status. * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits) libnvdimm, namespace: remove redundant initialization of 'nd_mapping' acpi, nfit: fix register dimm error handling libnvdimm, namespace: make min namespace size 4K tools/testing/nvdimm: force nfit_test to depend on instrumented modules libnvdimm/nfit_test: adding support for unit testing enable LSS status libnvdimm/nfit_test: add firmware download emulation nfit-test: Add platform cap support from ACPI 6.2a to test libnvdimm: expose platform persistence attribute for nd_region acpi: nfit: add persistent memory control flag for nd_region acpi: nfit: Add support for detect platform CPU cache flush on power loss device-dax: Fix trailing semicolon libnvdimm, btt: fix uninitialized err_lock dax: require 'struct page' by default for filesystem dax ext2: auto disable dax instead of failing mount ext4: auto disable dax instead of failing mount mm, dax: introduce pfn_t_special() mm: Fix devm_memremap_pages() collision handling mm: Fix memory size alignment in devm_memremap_pages_release() memremap: merge find_dev_pagemap into get_dev_pagemap memremap: change devm_memremap_pages interface to use struct dev_pagemap ...
| * mm: pass the vmem_altmap to vmemmap_freeChristoph Hellwig2018-01-081-1/+2
| | | | | | | | | | | | | | | | We can just pass this on instead of having to do a radix tree lookup without proper locking a few levels into the callchain. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * mm: pass the vmem_altmap to vmemmap_populateChristoph Hellwig2018-01-081-2/+4
| | | | | | | | | | | | | | | | We can just pass this on instead of having to do a radix tree lookup without proper locking a few levels into the callchain. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | arm64: Extend early page table code to allow for larger kernelsSteve Capper2018-01-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the early assembler page table code assumes that precisely 1xpgd, 1xpud, 1xpmd are sufficient to represent the early kernel text mappings. Unfortunately this is rarely the case when running with a 16KB granule, and we also run into limits with 4KB granule when building much larger kernels. This patch re-writes the early page table logic to compute indices of mappings for each level of page table, and if multiple indices are required, the next-level page table is scaled up accordingly. Also the required size of the swapper_pg_dir is computed at link time to cover the mapping [KIMAGE_ADDR + VOFFSET, _end]. When KASLR is enabled, an extra page is set aside for each level that may require extra entries at runtime. Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | arm64: mmu: add the entry trampolines start/end section markers into sections.hJames Morse2018-01-141-2/+0Star
| | | | | | | | | | | | | | | | | | | | SDEI needs to calculate an offset in the trampoline page too. Move the extern char[] to sections.h. This patch just moves code around. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | Merge branch 'for-next/52-bit-pa' into for-next/coreCatalin Marinas2017-12-221-5/+10
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * for-next/52-bit-pa: arm64: enable 52-bit physical address support arm64: allow ID map to be extended to 52 bits arm64: handle 52-bit physical addresses in page table entries arm64: don't open code page table entry creation arm64: head.S: handle 52-bit PAs in PTEs in early page table setup arm64: handle 52-bit addresses in TTBR arm64: limit PA size to supported range arm64: add kconfig symbol to configure physical address size
| * | arm64: allow ID map to be extended to 52 bitsKristina Martsenko2017-12-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when using VA_BITS < 48, if the ID map text happens to be placed in physical memory above VA_BITS, we increase the VA size (up to 48) and create a new table level, in order to map in the ID map text. This is okay because the system always supports 48 bits of VA. This patch extends the code such that if the system supports 52 bits of VA, and the ID map text is placed that high up, then we increase the VA size accordingly, up to 52. One difference from the current implementation is that so far the condition of VA_BITS < 48 has meant that the top level table is always "full", with the maximum number of entries, and an extra table level is always needed. Now, when VA_BITS = 48 (and using 64k pages), the top level table is not full, and we simply need to increase the number of entries in it, instead of creating a new table level. Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> [catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()] [catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | arm64: don't open code page table entry creationKristina Martsenko2017-12-221-5/+9
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of open coding the generation of page table entries, use the macros/functions that exist for this - pfn_p*d and p*d_populate. Most code in the kernel already uses these macros, this patch tries to fix up the few places that don't. This is useful for the next patch in this series, which needs to change the page table entry logic, and it's better to have that logic in one place. The KVM extended ID map is special, since we're creating a level above CONFIG_PGTABLE_LEVELS and the required function isn't available. Leave it as is and add a comment to explain it. (The normal kernel ID map code doesn't need this change because its page tables are created in assembly (__create_page_tables)). Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | arm64: kaslr: Put kernel vectors address in separate data pageWill Deacon2017-12-111-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The literal pool entry for identifying the vectors base is the only piece of information in the trampoline page that identifies the true location of the kernel. This patch moves it into a page-aligned region of the .rodata section and maps this adjacent to the trampoline text via an additional fixmap entry, which protects against any accidental leakage of the trampoline contents. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* | arm64: mm: Map entry trampoline into trampoline and kernel page tablesWill Deacon2017-12-111-0/+23
|/ | | | | | | | | | | | | | | | The exception entry trampoline needs to be mapped at the same virtual address in both the trampoline page table (which maps nothing else) and also the kernel page table, so that we can swizzle TTBR1_EL1 on exceptions from and return to EL0. This patch maps the trampoline at a fixed virtual address in the fixmap area of the kernel virtual address space, which allows the kernel proper to be randomized with respect to the trampoline when KASLR is enabled. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: mm: Remove arch_apei_flush_tlb_one()James Morse2017-11-071-0/+4
| | | | | | | | | | | | | Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on __set_fixmap() to do the invalidation. Remove it. Move the IPI-considered-harmful comment to __set_fixmap(). Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: All applicable <stable@vger.kernel.org>
* arm64: mmu: Place guard page after mapping of kernel imageWill Deacon2017-07-281-7/+11
| | | | | | | | | | | | | The vast majority of virtual allocations in the vmalloc region are followed by a guard page, which can help to avoid overruning on vma into another, which may map a read-sensitive device. This patch adds a guard page to the end of the kernel image mapping (i.e. following the data/bss segments). Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: mm: explicity include linux/vmalloc.hTobias Klauser2017-05-301-0/+1
| | | | | | | | | | | arm64's mm/mmu.c uses vm_area_add_early, struct vm_area and other definitions but relies on implict inclusion of linux/vmalloc.h which means that changes in other headers could break the build. Thus, add an explicit include. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: kdump: protect crash dump kernel memoryTakahiro Akashi2017-04-051-49/+54
| | | | | | | | | | | | | | | | | | | | arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres() are meant to be called by kexec_load() in order to protect the memory allocated for crash dump kernel once the image is loaded. The protection is implemented by unmapping the relevant segments in crash dump kernel memory, rather than making it read-only as other archs do, to prevent coherency issues due to potential cache aliasing (with mismatched attributes). Page-level mappings are consistently used here so that we can change the attributes of segments in page granularity as well as shrink the region also in page granularity through /sys/kernel/kexec_crash_size, putting the freed memory back to buddy system. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: set the contiguous bit for kernel mappings where appropriateArd Biesheuvel2017-03-231-43/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the third attempt at enabling the use of contiguous hints for kernel mappings. The most recent attempt 0bfc445dec9d was reverted after it turned out that updating permission attributes on live contiguous ranges may result in TLB conflicts. So this time, the contiguous hint is not set for .rodata or for the linear alias of .text/.rodata, both of which are mapped read-write initially, and remapped read-only at a later stage. (Note that the latter region could also be unmapped and remapped again with updated permission attributes, given that the region, while live, is only mapped for the convenience of the hibernation code, but that also means the TLB footprint is negligible anyway, so why bother) This enables the following contiguous range sizes for the virtual mapping of the kernel image, and for the linear mapping: granule size | cont PTE | cont PMD | -------------+------------+------------+ 4 KB | 64 KB | 32 MB | 16 KB | 2 MB | 1 GB* | 64 KB | 2 MB | 16 GB* | * Only when built for 3 or more levels of translation. This is due to the fact that a 2 level configuration only consists of PGDs and PTEs, and the added complexity of dealing with folded PMDs is not justified considering that 16 GB contiguous ranges are likely to be ignored by the hardware (and 16k/2 levels is a niche configuration) Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64/mm: remove pointless map/unmap sequences when creating page tablesArd Biesheuvel2017-03-231-4/+0Star
| | | | | | | | | | | | The routines __pud_populate and __pmd_populate only create a table entry at their respective level which refers to the next level page by its physical address, so there is no reason to map this page and then unmap it immediately after. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64/mmu: replace 'page_mappings_only' parameter with flags argumentArd Biesheuvel2017-03-231-18/+27
| | | | | | | | | | | | In preparation of extending the policy for manipulating kernel mappings with whether or not contiguous hints may be used in the page tables, replace the bool 'page_mappings_only' with a flags field and a flag NO_BLOCK_MAPPINGS. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64/mmu: add contiguous bit to sanity bug checkArd Biesheuvel2017-03-231-1/+9
| | | | | | | | | | | | A mapping with the contiguous bit cannot be safely manipulated while live, regardless of whether the bit changes between the old and new mapping. So take this into account when deciding whether the change is safe. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64/mmu: ignore debug_pagealloc for kernel segmentsArd Biesheuvel2017-03-231-4/+3Star
| | | | | | | | | | | | | | | | | The debug_pagealloc facility manipulates kernel mappings in the linear region at page granularity to detect out of bounds or use-after-free accesses. Since the kernel segments are not allocated dynamically, there is no point in taking the debug_pagealloc_enabled flag into account for them, and we can use block mappings unconditionally. Note that this applies equally to the linear alias of text/rodata: we will never have dynamic allocations there given that the same memory is statically in use by the kernel image. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64/mmu: align alloc_init_pte prototype with pmd/pud versionsArd Biesheuvel2017-03-231-4/+4
| | | | | | | | | | | Align the function prototype of alloc_init_pte() with its pmd and pud counterparts by replacing the pfn parameter with the equivalent physical address. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mmu: apply strict permissions to .init.text and .init.dataArd Biesheuvel2017-03-231-4/+8
| | | | | | | | | | | | | | | | | | | To avoid having mappings that are writable and executable at the same time, split the init region into a .init.text region that is mapped read-only, and a .init.data region that is mapped non-executable. This is possible now that the alternative patching occurs via the linear mapping, and the linear alias of the init region is always mapped writable (but never executable). Since the alternatives descriptions themselves are read-only data, move those into the .init.text region. Reviewed-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mmu: map .text as read-only from the outsetArd Biesheuvel2017-03-231-4/+14
| | | | | | | | | | | | | | | | | | | Now that alternatives patching code no longer relies on the primary mapping of .text being writable, we can remove the code that removes the writable permissions post-init time, and map it read-only from the outset. To preserve the existing behavior under rodata=off, which is relied upon by external debuggers to manage software breakpoints (as pointed out by Mark), add an early_param() check for rodata=, and use RWX permissions if it set to 'off'. Reviewed-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: alternatives: apply boot time fixups via the linear mappingArd Biesheuvel2017-03-231-5/+17
| | | | | | | | | | | | | | | | | | | | One important rule of thumb when desiging a secure software system is that memory should never be writable and executable at the same time. We mostly adhere to this rule in the kernel, except at boot time, when regions may be mapped RWX until after we are done applying alternatives or making other one-off changes. For the alternative patching, we can improve the situation by applying the fixups via the linear mapping, which is never mapped with executable permissions. So map the linear alias of .text with RW- permissions initially, and remove the write permissions as soon as alternative patching has completed. Reviewed-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mmu: move TLB maintenance from callers to create_mapping_late()Ard Biesheuvel2017-03-231-8/+8
| | | | | | | | | | | | | | | | | | In preparation of refactoring the kernel mapping logic so that text regions are never mapped writable, which would require adding explicit TLB maintenance to new call sites of create_mapping_late() (which is currently invoked twice from the same function), move the TLB maintenance from the call site into create_mapping_late() itself, and change it from a full TLB flush into a flush by VA, which is more appropriate here. Also, given that create_mapping_late() has evolved into a routine that only updates protection bits on existing mappings, rename it to update_mapping_prot() Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* Revert "arm64: mm: set the contiguous bit for kernel mappings where appropriate"Mark Rutland2017-02-241-30/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 0bfc445dec9dd8130d22c9f4476eed7598524129. When we change the permissions of regions mapped using contiguous entries, the architecture requires us to follow a Break-Before-Make strategy, breaking *all* associated entries before we can change any of the following properties from the entries: - presence of the contiguous bit - output address - attributes - permissiones Failure to do so can result in a number of problems (e.g. TLB conflict aborts and/or erroneous results from TLB lookups). See ARM DDI 0487A.k_iss10775, "Misprogramming of the Contiguous bit", page D4-1762. We do not take this into account when altering the permissions of kernel segments in mark_rodata_ro(), where we change the permissions of live contiguous entires one-by-one, leaving them transiently inconsistent. This has been observed to result in failures on some fast model configurations. Unfortunately, we cannot follow Break-Before-Make here as we'd have to unmap kernel text and data used to perform the sequence. For the timebeing, revert commit 0bfc445dec9dd813 so as to avoid issues resulting from this misuse of the contiguous bit. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <Will.Deacon@arm.com> Cc: stable@vger.kernel.org # v4.10 Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: fix warning about swapper_pg_dir overflowArnd Bergmann2017-02-151-1/+1
| | | | | | | | | | | | | | | | | | | With 4 levels of 16KB pages, we get this warning about the fact that we are copying a whole page into an array that is declared as having only two pointers for the top level of the page table: arch/arm64/mm/mmu.c: In function 'paging_init': arch/arm64/mm/mmu.c:528:2: error: 'memcpy' writing 16384 bytes into a region of size 16 overflows the destination [-Werror=stringop-overflow=] This is harmless since we actually reserve a whole page in the definition of the array that comes from, and just the extern declaration is short. The pgdir is initialized to zero either way, so copying the actual entries here seems like the best solution. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: mm: use phys_addr_t instead of unsigned long in __map_memblockMiles Chen2017-01-131-2/+2
| | | | | | | | | | Cosmetic change to use phys_addr_t instead of unsigned long for the return value of __pa_symbol(). Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: Use __pa_symbol for kernel symbolsLaura Abbott2017-01-121-12/+21
| | | | | | | | | | | __pa_symbol is technically the marcro that should be used for kernel symbols. Switch to this as a pre-requisite for DEBUG_VIRTUAL which will do bounds checking. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: dump: Add checking for writable and exectuable pagesLaura Abbott2016-11-071-0/+3
| | | | | | | | | | | | | | | Page mappings with full RWX permissions are a security risk. x86 has an option to walk the page tables and dump any bad pages. (See e1a58320a38d ("x86/mm: Warn on W^X mappings")). Add a similar implementation for arm64. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [catalin.marinas@arm.com: folded fix for KASan out of bounds from Mark Rutland] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: set the contiguous bit for kernel mappings where appropriateArd Biesheuvel2016-11-071-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | Now that we no longer allow live kernel PMDs to be split, it is safe to start using the contiguous bit for kernel mappings. So set the contiguous bit in the kernel page mappings for regions whose size and alignment are suitable for this. This enables the following contiguous range sizes for the virtual mapping of the kernel image, and for the linear mapping: granule size | cont PTE | cont PMD | -------------+------------+------------+ 4 KB | 64 KB | 32 MB | 16 KB | 2 MB | 1 GB* | 64 KB | 2 MB | 16 GB* | * Only when built for 3 or more levels of translation. This is due to the fact that a 2 level configuration only consists of PGDs and PTEs, and the added complexity of dealing with folded PMDs is not justified considering that 16 GB contiguous ranges are likely to be ignored by the hardware (and 16k/2 levels is a niche configuration) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: replace 'block_mappings_allowed' with 'page_mappings_only'Ard Biesheuvel2016-11-071-16/+16
| | | | | | | | | | | In preparation of adding support for contiguous PTE and PMD mappings, let's replace 'block_mappings_allowed' with 'page_mappings_only', which will be a more accurate description of the nature of the setting once we add such contiguous mappings into the mix. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: BUG on unsupported manipulations of live kernel mappingsArd Biesheuvel2016-11-071-27/+43
| | | | | | | | | | | | | | | | | Now that we take care not manipulate the live kernel page tables in a way that may lead to TLB conflicts, the case where a table mapping is replaced by a block mapping can no longer occur. So remove the handling of this at the PUD and PMD levels, and instead, BUG() on any occurrence of live kernel page table manipulations that modify anything other than the permission bits. Since mark_rodata_ro() is the only caller where the kernel mappings that are being manipulated are actually live, drop the various conditional flush_tlb_all() invocations, and add a single call to mark_rodata_ro() instead. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: drop fixup_init() and mm.hKefeng Wang2016-09-061-12/+0Star
| | | | | | | | | | There is only fixup_init() in mm.h , and it is only called in free_initmem(), so move the codes from fixup_init() into free_initmem(), then drop fixup_init() and mm.h. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: apply __ro_after_init to some objectsJisheng Zhang2016-08-221-1/+2
| | | | | | | | | | | | | | | | | | | | These objects are set during initialization, thereafter are read only. Previously I only want to mark vdso_pages, vdso_spec, vectors_page and cpu_ops as __read_mostly from performance point of view. Then inspired by Kees's patch[1] to apply more __ro_after_init for arm, I think it's better to mark them as __ro_after_init. What's more, I find some more objects are also read only after init. So apply __ro_after_init to all of them. This patch also removes global vdso_pagelist and tries to clean up vdso_spec[] assignment code. [1] http://www.spinics.net/lists/arm-kernel/msg523188.html Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: mm: avoid fdt_check_header() before the FDT is fully mappedArd Biesheuvel2016-08-011-4/+4
| | | | | | | | | | | | | | | | | | As reported by Zijun, the fdt_check_header() call in __fixmap_remap_fdt() is not safe since it is not guaranteed that the FDT header is mapped completely. Due to the minimum alignment of 8 bytes, the only fields we can assume to be mapped are 'magic' and 'totalsize'. Since the OF layer is in charge of validating the FDT image, and we are only interested in making reasonably sure that the size field contains a meaningful value, replace the fdt_check_header() call with an explicit comparison of the magic field's value against the expected value. Cc: <stable@vger.kernel.org> Reported-by: Zijun Hu <zijun_hu@htc.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: mm: run pgtable_page_ctor() on non-swapper translation table pagesArd Biesheuvel2016-07-251-3/+6
| | | | | | | | | | | | | | | | | | The kernel page table creation routines are accessible to other subsystems (e.g., EFI) via the create_pgd_mapping() entry point, which allows mappings to be created that are not covered by init_mm. Since generic code such as apply_to_page_range() may expect translation table pages that are not associated with init_mm to be covered by fully constructed struct pages, add a call to pgtable_page_ctor() in the alloc function used by create_pgd_mapping. Since it is no longer used by create_mapping_late(), also update the name of this function to better reflect its purpose. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Laura Abbott <labbott@redhat.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: make create_mapping_late() non-allocatingArd Biesheuvel2016-07-251-1/+1
| | | | | | | | | | | | | | The only purpose served by create_mapping_late() is to remap the already mapped .text and .rodata kernel segments with read-only permissions. Since we no longer allow block mappings to be split or merged, create_mapping_late() should not pass an allocation function pointer into __create_pgd_mapping(). So pass NULL instead. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Laura Abbott <labbott@redhat.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: fold init_pgd() into __create_pgd_mapping()Ard Biesheuvel2016-07-011-18/+6Star
| | | | | | | | | The routine __create_pgd_mapping() does nothing except calling init_pgd(), which has no other callers. So fold the latter into the former. Also, drop a comment that has gone stale. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: Remove split_p*d() functionsCatalin Marinas2016-07-011-43/+4Star
| | | | | | | | | | | Since the efi_create_mapping() no longer generates block mappings and being the last user of the split_p*d code, remove these functions and the corresponding TLBI. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [ardb: replace 'overlapping regions' with 'block mappings' in commit log] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: add param to force create_pgd_mapping() to use page mappingsArd Biesheuvel2016-07-011-40/+27Star
| | | | | | | | | | | | | | | | | | Add a bool parameter 'allow_block_mappings' to create_pgd_mapping() and the various helper functions that it descends into, to give the caller control over whether block entries may be used to create the mapping. The UEFI runtime mapping routines will use this to avoid creating block entries that would need to split up into page entries when applying the permissions listed in the Memory Attributes firmware table. This also replaces the block_mappings_allowed() helper function that was added for DEBUG_PAGEALLOC functionality, but the resulting code is functionally equivalent (given that debug_page_alloc does not operate on EFI page table entries anyway) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: remove unnecessary BUG_ONKefeng Wang2016-06-301-1/+0Star
| | | | | | | | | The memblock_alloc() and memblock_alloc_base() will panic on their own if no free memory, remove pointless BUG_ON. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: fix location of _etextArd Biesheuvel2016-06-271-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Kees Cook notes in the ARM counterpart of this patch [0]: The _etext position is defined to be the end of the kernel text code, and should not include any part of the data segments. This interferes with things that might check memory ranges and expect executable code up to _etext. In particular, Kees is referring to the HARDENED_USERCOPY patch set [1], which rejects attempts to call copy_to_user() on kernel ranges containing executable code, but does allow access to the .rodata segment. Regardless of whether one may or may not agree with the distinction, it makes sense for _etext to have the same meaning across architectures. So let's put _etext where it belongs, between .text and .rodata, and fix up existing references to use __init_begin instead, which unlike _end_rodata includes the exception and notes sections as well. The _etext references in kaslr.c are left untouched, since its references to [_stext, _etext) are meant to capture potential jump instruction targets, and so disregarding .rodata is actually an improvement here. [0] http://article.gmane.org/gmane.linux.kernel/2245084 [1] http://thread.gmane.org/gmane.linux.kernel.hardened.devel/2502 Reported-by: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: Move unflatten_device_tree() call earlier.David Daney2016-04-151-2/+0Star
| | | | | | | | | | | | | | | | In order to extract NUMA information from the device tree, we need to have the tree in its unflattened form. Move the call to bootmem_init() in the tail of paging_init() into setup_arch, and adjust header files so that its declaration is visible. Move the unflatten_device_tree() call between the calls to paging_init() and bootmem_init(). Follow on patches add NUMA handling to bootmem_init(). Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: cover the .head.text section in the .text segment mappingArd Biesheuvel2016-04-141-5/+5
| | | | | | | | | | | | | | | | | | | | Keeping .head.text out of the .text mapping buys us very little: its actual payload is only 4 KB, most of which is padding, but the page alignment may add up to 2 MB (in case of CONFIG_DEBUG_ALIGN_RODATA=y) of additional padding to the uncompressed kernel Image. Also, on 4 KB granule kernels, the 4 KB misalignment of .text forces us to map the adjacent 56 KB of code without the PTE_CONT attribute, and since this region contains things like the vector table and the GIC interrupt handling entry point, this region is likely to benefit from the reduced TLB pressure that results from PTE_CONT mappings. So remove the alignment between the .head.text and .text sections, and use the [_text, _etext) rather than the [_stext, _etext) interval for mapping the .text segment. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: use 'segment' rather than 'chunk' to describe mapped kernel regionsArd Biesheuvel2016-04-141-7/+7
| | | | | | | | | | Replace the poorly defined term chunk with segment, which is a term that is already used by the ELF spec to describe contiguous mappings with the same permission attributes of statically allocated ranges of an executable. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: consistently use p?d_set_hugeMark Rutland2016-03-241-4/+2Star
| | | | | | | | | | | | | | | | | | | Commit 324420bf91f60582 ("arm64: add support for ioremap() block mappings") added new p?d_set_huge functions which do the hard work to generate and set a correct block entry. These differ from open-coded huge page creation in the early page table code by explicitly setting the P?D_TYPE_SECT bits (which are implicitly retained by mk_sect_prot() for any valid prot), but are otherwise identical (and cannot fail on arm64). For simplicity and consistency, make use of these in the initial page table creation code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: mm: Mark .rodata as ROJeremy Linton2016-02-261-6/+13
| | | | | | | | | | | | Currently the .rodata section is actually still executable when DEBUG_RODATA is enabled. This changes that so the .rodata is actually read only, no execute. It also adds the .rodata section to the mem_init banner. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: added vm_struct vmlinux_rodata in map_kernel()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: add support for kernel ASLRArd Biesheuvel2016-02-241-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for KASLR is implemented, based on entropy provided by the bootloader in the /chosen/kaslr-seed DT property. Depending on the size of the address space (VA_BITS) and the page size, the entropy in the virtual displacement is up to 13 bits (16k/2 levels) and up to 25 bits (all 4 levels), with the sidenote that displacements that result in the kernel image straddling a 1GB/32MB/512MB alignment boundary (for 4KB/16KB/64KB granule kernels, respectively) are not allowed, and will be rounded up to an acceptable value. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is enabled, the module region is randomized independently from the core kernel. This makes it less likely that the location of core kernel data structures can be determined by an adversary, but causes all function calls from modules into the core kernel to be resolved via entries in the module PLTs. If CONFIG_RANDOMIZE_MODULE_REGION_FULL is not enabled, the module region is randomized by choosing a page aligned 128 MB region inside the interval [_etext - 128 MB, _stext + 128 MB). This gives between 10 and 14 bits of entropy (depending on page size), independently of the kernel randomization, but still guarantees that modules are within the range of relative branch and jump instructions (with the caveat that, since the module region is shared with other uses of the vmalloc area, modules may need to be loaded further away if the module region is exhausted) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>