summaryrefslogtreecommitdiffstats
path: root/arch/arm64/mm
Commit message (Collapse)AuthorAgeFilesLines
* arm64: mm: make CONFIG_ZONE_DMA32 configurableMiles Chen2019-07-261-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0c1f14ed12262f45a3af1d588e4d7bd12438b8f5 ] This change makes CONFIG_ZONE_DMA32 defuly y and allows users to overwrite it only when CONFIG_EXPERT=y. For the SoCs that do not need CONFIG_ZONE_DMA32, this is the first step to manage all available memory by a single zone(normal zone) to reduce the overhead of multiple zones. The change also fixes a build error when CONFIG_NUMA=y and CONFIG_ZONE_DMA32=n. arch/arm64/mm/init.c:195:17: error: use of undeclared identifier 'ZONE_DMA32' max_zone_pfns[ZONE_DMA32] = PFN_DOWN(max_zone_dma_phys()); Change since v1: 1. only expose CONFIG_ZONE_DMA32 when CONFIG_EXPERT=y 2. remove redundant IS_ENABLED(CONFIG_ZONE_DMA32) Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64/mm: Inhibit huge-vmap with ptdumpMark Rutland2019-06-191-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 7ba36eccb3f83983a651efd570b4f933ecad1b5c ] The arm64 ptdump code can race with concurrent modification of the kernel page tables. At the time this was added, this was sound as: * Modifications to leaf entries could result in stale information being logged, but would not result in a functional problem. * Boot time modifications to non-leaf entries (e.g. freeing of initmem) were performed when the ptdump code cannot be invoked. * At runtime, modifications to non-leaf entries only occurred in the vmalloc region, and these were strictly additive, as intermediate entries were never freed. However, since commit: commit 324420bf91f6 ("arm64: add support for ioremap() block mappings") ... it has been possible to create huge mappings in the vmalloc area at runtime, and as part of this existing intermediate levels of table my be removed and freed. It's possible for the ptdump code to race with this, and continue to walk tables which have been freed (and potentially poisoned or reallocated). As a result of this, the ptdump code may dereference bogus addresses, which could be fatal. Since huge-vmap is a TLB and memory optimization, we can disable it when the runtime ptdump code is in use to avoid this problem. Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: 324420bf91f60582 ("arm64: add support for ioremap() block mappings") Acked-by: Ard Biesheuvel <ard.biesheuvel@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: errata: Add workaround for Cortex-A76 erratum #1463225Will Deacon2019-05-311-2/+35
| | | | | | | | | | | | | | | | | | commit 969f5ea627570e91c9d54403287ee3ed657f58fe upstream. Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum that can prevent interrupts from being taken when single-stepping. This patch implements a software workaround to prevent userspace from effectively being able to disable interrupts. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtableChristoph Hellwig2019-05-311-0/+10
| | | | | | | | | | | | | | | | | commit a98d9ae937d256ed679a935fc82d9deaa710d98e upstream. DMA allocations that can't sleep may return non-remapped addresses, but we do not properly handle them in the mmap and get_sgtable methods. Resolve non-vmalloc addresses using virt_to_page to handle this corner case. Cc: <stable@vger.kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: Save and restore OSDLR_EL1 across suspend/resumeJean-Philippe Brucker2019-05-221-16/+18
| | | | | | | | | | | | | | commit 827a108e354db633698f0b4a10c1ffd2b1f8d1d0 upstream. When the CPU comes out of suspend, the firmware may have modified the OS Double Lock Register. Save it in an unused slot of cpu_suspend_ctx, and restore it on resume. Cc: <stable@vger.kernel.org> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: kaslr: Reserve size of ARM64_MEMSTART_ALIGN in linear regionYueyi Li2019-04-171-1/+1
| | | | | | | | | | | | | | | | | [ Upstream commit c8a43c18a97845e7f94ed7d181c11f41964976a2 ] When KASLR is enabled (CONFIG_RANDOMIZE_BASE=y), the top 4K of kernel virtual address space may be mapped to physical addresses despite being reserved for ERR_PTR values. Fix the randomization of the linear region so that we avoid mapping the last page of the virtual address space. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: liyueyi <liyueyi@live.com> [will: rewrote commit message; merged in suggestion from Ard] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
* arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signalsWill Deacon2019-04-051-4/+5
| | | | | | | | | | | | | | | | | | | | commit b9a4b9d084d978f80eb9210727c81804588b42ff upstream. FAR_EL1 is UNKNOWN for all debug exceptions other than those caused by taking a hardware watchpoint. Unfortunately, if a debug handler returns a non-zero value, then we will propagate the UNKNOWN FAR value to userspace via the si_addr field of the SIGTRAP siginfo_t. Instead, let's set si_addr to take on the PC of the faulting instruction, which we have available in the current pt_regs. Cc: <stable@vger.kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: Do not issue IPIs for user executable ptesCatalin Marinas2019-02-061-1/+5
| | | | | | | | | | | | | | | | | | | | commit 132fdc379eb143932d209a20fd581e1ce7630960 upstream. Commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings") was aimed at fixing the I-cache invalidation for kernel mappings. However, it inadvertently caused all cache maintenance for user mappings via set_pte_at() -> __sync_icache_dcache() -> sync_icache_aliases() to call kick_all_cpus_sync(). Reported-by: Shijith Thotton <sthotton@marvell.com> Tested-by: Shijith Thotton <sthotton@marvell.com> Reported-by: Wandun Chen <chenwandun@huawei.com> Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings") Cc: <stable@vger.kernel.org> # 4.19.x- Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: Fix minor issues with the dcache_by_line_op macroWill Deacon2019-01-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 33309ecda0070506c49182530abe7728850ebe78 ] The dcache_by_line_op macro suffers from a couple of small problems: First, the GAS directives that are currently being used rely on assembler behavior that is not documented, and probably not guaranteed to produce the correct behavior going forward. As a result, we end up with some undefined symbols in cache.o: $ nm arch/arm64/mm/cache.o ... U civac ... U cvac U cvap U cvau This is due to the fact that the comparisons used to select the operation type in the dcache_by_line_op macro are comparing symbols not strings, and even though it seems that GAS is doing the right thing here (undefined symbols by the same name are equal to each other), it seems unwise to rely on this. Second, when patching in a DC CVAP instruction on CPUs that support it, the fallback path consists of a DC CVAU instruction which may be affected by CPU errata that require ARM64_WORKAROUND_CLEAN_CACHE. Solve these issues by unrolling the various maintenance routines and using the conditional directives that are documented as operating on strings. To avoid the complexity of nested alternatives, we move the DC CVAP patching to __clean_dcache_area_pop, falling back to a branch to __clean_dcache_area_poc if DCPOP is not supported by the CPU. Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: dma-mapping: Fix FORCE_CONTIGUOUS buffer clearingRobin Murphy2018-12-191-1/+1
| | | | | | | | | | | | | | | | commit 3238c359acee4ab57f15abb5a82b8ab38a661ee7 upstream. We need to invalidate the caches *before* clearing the buffer via the non-cacheable alias, else in the worst case __dma_flush_area() may write back dirty lines over the top of our nice new zeros. Fixes: dd65a941f6ba ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag") Cc: <stable@vger.kernel.org> # 4.18.x- Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flagsSteve Capper2018-09-241-4/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For contiguous hugetlb, huge_ptep_set_access_flags performs a get_clear_flush (which then flushes the TLBs) even when no change of ptes is necessary. Unfortunately, this behaviour can lead to back-to-back page faults being generated when running with multiple threads that access the same contiguous huge page. Thread 1 | Thread 2 -----------------------------+------------------------------ hugetlb_fault | huge_ptep_set_access_flags | -> invalidate pte range | hugetlb_fault continue processing | wait for hugetlb_fault_mutex release mutex and return | huge_ptep_set_access_flags | -> invalidate pte range hugetlb_fault ... This patch changes huge_ptep_set_access_flags s.t. we first read the contiguous range of ptes (whilst preserving dirty information); the pte range is only then invalidated where necessary and this prevents further spurious page faults. Fixes: d8bdcff28764 ("arm64: hugetlb: Add break-before-make logic for contiguous entries") Reported-by: Lei Zhang <zhang.lei@jp.fujitsu.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: hugetlb: Fix handling of young ptesSteve Capper2018-09-241-3/+9
| | | | | | | | | | | | | | | In the contiguous bit hugetlb break-before-make code we assume that all hugetlb pages are young. In fact, remove_migration_pte is able to place an old hugetlb pte so this assumption is not valid. This patch fixes the contiguous hugetlb scanning code to preserve young ptes. Fixes: d8bdcff28764 ("arm64: hugetlb: Add break-before-make logic for contiguous entries") Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: fix erroneous warnings in page freeing functionsMark Rutland2018-09-061-4/+6
| | | | | | | | | | | | | | | | | In pmd_free_pte_page() and pud_free_pmd_page() we try to warn if they hit a present non-table entry. In both cases we'll warn for non-present entries, as the VM_WARN_ON() only checks the entry is not a table entry. This has been observed to result in warnings when booting a v4.19-rc2 kernel under qemu. Fix this by bailing out earlier for non-present entries. Fixes: ec28bb9c9b0826d7 ("arm64: Implement page table free interfaces") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* Merge branch 'akpm' (patches from Andrew)Linus Torvalds2018-08-182-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge updates from Andrew Morton: - a few misc things - a few Y2038 fixes - ntfs fixes - arch/sh tweaks - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) mm/hmm.c: remove unused variables align_start and align_end fs/userfaultfd.c: remove redundant pointer uwq mm, vmacache: hash addresses based on pmd mm/list_lru: introduce list_lru_shrink_walk_irq() mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one() mm/list_lru.c: move locking from __list_lru_walk_one() to its caller mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node() mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP mm/sparse: delete old sparse_init and enable new one mm/sparse: add new sparse_init_nid() and sparse_init() mm/sparse: move buffer init/fini to the common place mm/sparse: use the new sparse buffer functions in non-vmemmap mm/sparse: abstract sparse buffer allocations mm/hugetlb.c: don't zero 1GiB bootmem pages mm, page_alloc: double zone's batchsize mm/oom_kill.c: document oom_lock mm/hugetlb: remove gigantic page support for HIGHMEM mm, oom: remove sleep from under oom_lock kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous() mm/cma: remove unsupported gfp_mask parameter from cma_alloc() ...
| * kernel/dma: remove unsupported gfp_mask parameter from ↵Marek Szyprowski2018-08-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dma_alloc_from_contiguous() The CMA memory allocator doesn't support standard gfp flags for memory allocation, so there is no point having it as a parameter for dma_alloc_from_contiguous() function. Replace it by a boolean no_warn argument, which covers all the underlaying cma_alloc() function supports. This will help to avoid giving false feeling that this function supports standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer, what has already been an issue: see commit dd65a941f6ba ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag"). Link: http://lkml.kernel.org/r/20180709122020eucas1p21a71b092975cb4a3b9954ffc63f699d1~-sqUFoa-h2939329393eucas1p2Y@eucas1p2.samsung.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Michał Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm: convert return type of handle_mm_fault() caller to vm_fault_tSouptick Joarder2018-08-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t") In this patch all the caller of handle_mm_fault() are changed to return vm_fault_t type. Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Tony Luck <tony.luck@intel.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: James Hogan <jhogan@kernel.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: David S. Miller <davem@davemloft.net> Cc: Richard Weinberger <richard@nod.at> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'arm64-fixes' of ↵Linus Torvalds2018-08-171-1/+5
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "A couple of arm64 fixes - Fix boot on Hikey-960 by avoiding an IPI with interrupts disabled - Fix address truncation in pfn_valid() implementation" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid() arm64: Avoid calling stop_machine() when patching jump labels
| * arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid()Greg Hackmann2018-08-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM64's pfn_valid() shifts away the upper PAGE_SHIFT bits of the input before seeing if the PFN is valid. This leads to false positives when some of the upper bits are set, but the lower bits match a valid PFN. For example, the following userspace code looks up a bogus entry in /proc/kpageflags: int pagemap = open("/proc/self/pagemap", O_RDONLY); int pageflags = open("/proc/kpageflags", O_RDONLY); uint64_t pfn, val; lseek64(pagemap, [...], SEEK_SET); read(pagemap, &pfn, sizeof(pfn)); if (pfn & (1UL << 63)) { /* valid PFN */ pfn &= ((1UL << 55) - 1); /* clear flag bits */ pfn |= (1UL << 55); lseek64(pageflags, pfn * sizeof(uint64_t), SEEK_SET); read(pageflags, &val, sizeof(val)); } On ARM64 this causes the userspace process to crash with SIGSEGV rather than reading (1 << KPF_NOPAGE). kpageflags_read() treats the offset as valid, and stable_page_flags() will try to access an address between the user and kernel address ranges. Fixes: c1cc1552616d ("arm64: MMU initialisation") Cc: stable@vger.kernel.org Signed-off-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* | Merge tag 'arm64-upstream' of ↵Linus Torvalds2018-08-156-28/+71
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "A bunch of good stuff in here. Worth noting is that we've pulled in the x86/mm branch from -tip so that we can make use of the core ioremap changes which allow us to put down huge mappings in the vmalloc area without screwing up the TLB. Much of the positive diffstat is because of the rseq selftest for arm64. Summary: - Wire up support for qspinlock, replacing our trusty ticket lock code - Add an IPI to flush_icache_range() to ensure that stale instructions fetched into the pipeline are discarded along with the I-cache lines - Support for the GCC "stackleak" plugin - Support for restartable sequences, plus an arm64 port for the selftest - Kexec/kdump support on systems booting with ACPI - Rewrite of our syscall entry code in C, which allows us to zero the GPRs on entry from userspace - Support for chained PMU counters, allowing 64-bit event counters to be constructed on current CPUs - Ensure scheduler topology information is kept up-to-date with CPU hotplug events - Re-enable support for huge vmalloc/IO mappings now that the core code has the correct hooks to use break-before-make sequences - Miscellaneous, non-critical fixes and cleanups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits) arm64: alternative: Use true and false for boolean values arm64: kexec: Add comment to explain use of __flush_icache_range() arm64: sdei: Mark sdei stack helper functions as static arm64, kaslr: export offset in VMCOREINFO ELF notes arm64: perf: Add cap_user_time aarch64 efi/libstub: Only disable stackleak plugin for arm64 arm64: drop unused kernel_neon_begin_partial() macro arm64: kexec: machine_kexec should call __flush_icache_range arm64: svc: Ensure hardirq tracing is updated before return arm64: mm: Export __sync_icache_dcache() for xen-privcmd drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory arm64: Add support for STACKLEAK gcc plugin arm64: Add stack information to on_accessible_stack drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported arm64: fix ACPI dependencies rseq/selftests: Add support for arm64 arm64: acpi: fix alignment fault in accessing ACPI efi/arm: map UEFI memory map even w/o runtime services enabled efi/arm: preserve early mapping of UEFI memory map longer for BGRT drivers: acpi: add dependency of EFI for arm64 ...
| * arm64: mm: Export __sync_icache_dcache() for xen-privcmdBen Hutchings2018-07-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The xen-privcmd driver, which can be modular, calls set_pte_at() which in turn may call __sync_icache_dcache(). The call to __sync_icache_dcache() may be optimised out because it is conditional on !pte_special(), and xen-privcmd calls pte_mkspecial(). But it seems unwise to rely on this optimisation. Fixes: 3ad0876554ca ("xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE") Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: kill config_sctlr_el1()Mark Rutland2018-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Now that we have sysreg_clear_set(), we can consistently use this instead of config_sctlr_el1(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dave Martin <dave.martin@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: mm: Export __flush_icache_range() to modulesWill Deacon2018-07-061-1/+1
| | | | | | | | | | | | | | | | | | | | lkdtm calls flush_icache_range(), which results in an out-of-line call to __flush_icache_range(), which is not exported to modules. Export the symbol to modules to fix this build breakage. Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings") Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: numa: separate out updates to percpu nodeid and NUMA node cpumapSudeep Holla2018-07-061-8/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently numa_clear_node removes both cpu information from the NUMA node cpumap as well as the NUMA node id from the cpu. Similarly numa_store_cpu_info updates both percpu nodeid and NUMA cpumap. However we need to retain the numa node id for the cpu and only remove the cpu information from the numa node cpumap during CPU hotplug out. The same can be extended for hotplugging in the CPU. This patch separates out numa_{add,remove}_cpu from numa_clear_node and numa_store_cpu_info. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * arm64: Implement page table free interfacesChintan Pandya2018-07-061-4/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arm64 requires break-before-make. Originally, before setting up new pmd/pud entry for huge mapping, in few cases, the modifying pmd/pud entry was still valid and pointing to next level page table as we only clear off leaf PTE in unmap leg. a) This was resulting into stale entry in TLBs (as few TLBs also cache intermediate mapping for performance reasons) b) Also, modifying pmd/pud was the only reference to next level page table and it was getting lost without freeing it. So, page leaks were happening. Implement pud_free_pmd_page() and pmd_free_pte_page() to enforce BBM and also free the leaking page tables. Implementation requires, 1) Clearing off the current pud/pmd entry 2) Invalidation of TLB 3) Freeing of the un-used next level page tables Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Chintan Pandya <cpandya@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * Merge branch 'x86/mm' of ↵Will Deacon2018-07-061-2/+2
| |\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into aarch64/for-next/core Pull in core ioremap changes from -tip, since we depend on these for re-enabling huge I/O mappings on arm64. Signed-off-by: Will Deacon <will.deacon@arm.com>
| * | arm64: IPI each CPU after invalidating the I-cache for kernel mappingsWill Deacon2018-07-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When invalidating the instruction cache for a kernel mapping via flush_icache_range(), it is also necessary to flush the pipeline for other CPUs so that instructions fetched into the pipeline before the I-cache invalidation are discarded. For example, if module 'foo' is unloaded and then module 'bar' is loaded into the same area of memory, a CPU could end up executing instructions from 'foo' when branching into 'bar' if these instructions were fetched into the pipeline before 'foo' was unloaded. Whilst this is highly unlikely to occur in practice, particularly as any exception acts as a context-synchronizing operation, following the letter of the architecture requires us to execute an ISB on each CPU in order for the new instruction stream to be visible. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * | ARM64: dump: Convert to use DEFINE_SHOW_ATTRIBUTE macroPeng Donglin2018-07-021-12/+1Star
| | | | | | | | | | | | | | | | | | | | | Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Peng Donglin <dolinux.peng@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* | | Merge tag 'acpi-4.19-rc1' of ↵Linus Torvalds2018-08-141-6/+1Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These revert two ACPICA commits that are not needed any more, rework the property graphs support in ACPI to be more aligned with the analogous DT code, add some new quirks and remove one that isn't needed any more, add a special platform driver to enumerate multiple I2C devices hooked up to the same device object in the ACPI tables and update the battery and button drivers. Specifics: - Revert two ACPICA commits that are not needed any more (Erik Schmauss). - Rework property graph support in the ACPI device properties framework to make it behave more like the analogous DT code and update the documentation of it (Sakari Ailus). - Change the default ACPI device status after initialization to ACPI_STA_DEFAULT instead of 0 (Hans de Goede). - Add a special platform driver for enumerating multiple I2C devices hooked up to the same object in the ACPI tables (Hans de Goede). - Fix the ACPI battery driver to avoid reporting full capacity on systems without support for that and clean it up (Hans de Goede, Dmitry Rozhkov, Lucas Rangit Magasweran). - Add two system wakeup quirks to the ACPI EC driver (Aaron Ma, Mika Westerberg). - Add the touchscreen on Dell Venue Pro 7139 to the list of "always present" devices to make it work (Tristian Celestin). - Revert a special tables handling quirk for Dell XPS 9570 and Precision M5530 which is not needed any more (Kai Heng Feng). - Add support for a new OEM _OSI string to allow system vendors to work around issues with NVidia HDMI audio (Alex Hung). - Prevent the ACPI button driver from reporting excessive system wakeup events and clean it up (Ravi Chandra Sadineni, Randy Dunlap). - Clean up two minor code style issues in the ACPI core and GHES handling on ARM64 (Dongjiu Geng, John Garry, Tom Todd)" * tag 'acpi-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits) platform/x86: Add ACPI i2c-multi-instantiate pseudo driver ACPI / x86: utils: Remove status workaround from acpi_device_always_present() ACPI / scan: Create platform device for fwnodes with multiple i2c devices ACPI / scan: Initialize status to ACPI_STA_DEFAULT ACPI / EC: Add another entry for Thinkpad X1 Carbon 6th ACPI: bus: Fix a pointer coding style issue arm64 / ACPI: clean the additional checks before calling ghes_notify_sea() ACPI / scan: Add static attribute to indirect_io_hosts[] ACPI / battery: Do not export energy_full[_design] on devices without full_charge_capacity ACPI / EC: Use ec_no_wakeup on ThinkPad X1 Yoga 3rd ACPI / battery: get rid of negations in conditions ACPI / battery: use specialized print macros ACPI / battery: reorder headers alphabetically ACPI / battery: drop inclusion of init.h ACPI: battery: remove redundant old_present check on insertion ACPI: property: graph: Update graph documentation to use generic references ACPI: property: graph: Improve graph documentation for port/ep numbering ACPI: property: graph: Fix graph documentation ACPI: property: Update documentation for hierarchical data extension 1.1 ACPI: property: Document key numbering for hierarchical data extension refs ...
| * | | arm64 / ACPI: clean the additional checks before calling ghes_notify_sea()Dongjiu Geng2018-08-091-6/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to remove the additional check before calling the ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA. After this cleanup, we can simply call the ghes_notify_sea() to let APEI driver handle the SEA notification. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2018-08-141-2/+2
|\ \ \ \ | |/ / / |/| | / | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Thomas Gleixner: - Make lazy TLB mode even lazier to avoid pointless switch_mm() operations, which reduces CPU load by 1-2% for memcache workloads - Small cleanups and improvements all over the place * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove redundant check for kmem_cache_create() arm/asm/tlb.h: Fix build error implicit func declaration x86/mm/tlb: Make clear_asid_other() static x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off() x86/mm/tlb: Always use lazy TLB mode x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs x86/mm/tlb: Make lazy TLB mode lazier x86/mm/tlb: Restructure switch_mm_irqs_off() x86/mm/tlb: Leave lazy TLB mode at page table free time mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids x86/mm: Add TLB purge to free pmd/pte page interfaces ioremap: Update pgtable free interfaces with addr x86/mm: Disable ioremap free page handling on x86-PAE
| * | ioremap: Update pgtable free interfaces with addrChintan Pandya2018-07-041-2/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following kernel panic was observed on ARM64 platform due to a stale TLB entry. 1. ioremap with 4K size, a valid pte page table is set. 2. iounmap it, its pte entry is set to 0. 3. ioremap the same address with 2M size, update its pmd entry with a new value. 4. CPU may hit an exception because the old pmd entry is still in TLB, which leads to a kernel panic. Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table") has addressed this panic by falling to pte mappings in the above case on ARM64. To support pmd mappings in all cases, TLB purge needs to be performed in this case on ARM64. Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page() so that TLB purge can be added later in seprate patches. [toshi.kani@hpe.com: merge changes, rewrite patch description] Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Chintan Pandya <cpandya@codeaurora.org> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: mhocko@suse.com Cc: akpm@linux-foundation.org Cc: hpa@zytor.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: Will Deacon <will.deacon@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com
* | mm: do not initialize TLB stack vma's with vma_init()Linus Torvalds2018-08-011-6/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") tried to initialize various left-over ad-hoc vma's "properly", but actually made things worse for the temporary vma's used for TLB flushing. vma_init() doesn't actually initialize all of the vma, just a few fields, so doing something like - struct vm_area_struct vma = { .vm_mm = tlb->mm, }; + struct vm_area_struct vma; + + vma_init(&vma, tlb->mm); was actually very bad: instead of having a nicely initialized vma with every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma with only a couple of fields initialized. And they weren't even fields that the code in question mostly cared about. The flush_tlb_range() function takes a "struct vma" rather than a "struct mm_struct", because a few architectures actually care about what kind of range it is - being able to only do an ITLB flush if it's a range that doesn't have data accesses enabled, for example. And all the normal users already have the vma for doing the range invalidation. But a few people want to call flush_tlb_range() with a range they just made up, so they also end up using a made-up vma. x86 just has a special "flush_tlb_mm_range()" function for this, but other architectures (arm and ia64) do the "use fake vma" thing instead, and thus got caught up in the vma_init() changes. At the same time, the TLB flushing code really doesn't care about most other fields in the vma, so vma_init() is just unnecessary and pointless. This fixes things by having an explicit "this is just an initializer for the TLB flush" initializer macro, which is used by the arm/arm64/ia64 people who mis-use this interface with just a dummy vma. Fixes: 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2018-07-271-2/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kvm, mm: account shadow page tables to kmemcg zswap: re-check zswap_is_full() after do zswap_shrink() include/linux/eventfd.h: include linux/errno.h mm: fix vma_is_anonymous() false-positives mm: use vma_init() to initialize VMAs on stack and data segments mm: introduce vma_init() mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL ipc/sem.c: prevent queue.status tearing in semop mm: disallow mappings that conflict for devm_memremap_pages() kasan: only select SLUB_DEBUG with SYSFS=y delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
| * | mm: use vma_init() to initialize VMAs on stack and data segmentsKirill A. Shutemov2018-07-271-2/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure to initialize all VMAs properly, not only those which come from vm_area_cachep. Link: http://lkml.kernel.org/r/20180724121139.62570-3-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* / arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setupsJohannes Weiner2018-07-251-1/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arnd reports the following arm64 randconfig build error with the PSI patches that add another page flag: /git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init': /git/arm-soc/include/linux/compiler.h:357:38: error: call to '__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT) The additional page flag causes other information stored in page->flags to get bumped into their own struct page member: #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT #else #define LAST_CPUPID_WIDTH 0 #endif #if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0 #define LAST_CPUPID_NOT_IN_PAGE_FLAGS #endif which in turn causes the struct page size to exceed the size set in STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the VMEMMAP page array according to address space and struct page size. However, the check is performed - and triggers here - on a !VMEMMAP config, which consumes an additional 22 page bits for the sparse section id. When VMEMMAP is enabled, those bits are returned, cpupid doesn't need its own member, and the page passes the VMEMMAP check. Restrict that check to the situation it was meant to check: that we are sizing the VMEMMAP page array correctly. Says Arnd: Further experiments show that the build error already existed before, but was only triggered with larger values of CONFIG_NR_CPU and/or CONFIG_NODES_SHIFT that might be used in actual configurations but not in randconfig builds. With longer CPU and node masks, I could recreate the problem with kernels as old as linux-4.7 when arm64 NUMA support got added. Reported-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.") Fixes: 3e1907d5bf5a ("arm64: mm: move vmemmap region right below the linear region") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: mm: Ensure writes to swapper are ordered wrt subsequent cache maintenanceWill Deacon2018-06-221-2/+3
| | | | | | | | | | | | | | | | | | When rewriting swapper using nG mappings, we must performance cache maintenance around each page table access in order to avoid coherency problems with the host's cacheable alias under KVM. To ensure correct ordering of the maintenance with respect to Device memory accesses made with the Stage-1 MMU disabled, DMBs need to be added between the maintenance and the corresponding memory access. This patch adds a missing DMB between writing a new page table entry and performing a clean+invalidate on the same line. Fixes: f992b4dfd58b ("arm64: kpti: Add ->enable callback to remap swapper using nG mappings") Cc: <stable@vger.kernel.org> # 4.16.x- Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flagMarek Szyprowski2018-06-191-4/+5
| | | | | | | | | | | | | | | dma_alloc_*() buffers might be exposed to userspace via mmap() call, so they should be cleared on allocation. In case of IOMMU-based dma-mapping implementation such buffer clearing was missing in the code path for DMA_ATTR_FORCE_CONTIGUOUS flag handling, because dma_alloc_from_contiguous() doesn't honor __GFP_ZERO flag. This patch fixes this issue. For more information on clearing buffers allocated by dma_alloc_* functions, see commit 6829e274a623 ("arm64: dma-mapping: always clear allocated buffers"). Fixes: 44176bb38fa4 ("arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to IOMMU") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAXStefan Agner2018-06-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | With PHYS_ADDR_MAX there is now a type safe variant for all bits set. Make use of it. Patch created using a semantic patch as follows: // <smpl> @@ typedef phys_addr_t; @@ -(phys_addr_t)ULLONG_MAX +PHYS_ADDR_MAX // </smpl> Link: http://lkml.kernel.org/r/20180419214204.19322-1-stefan@agner.ch Signed-off-by: Stefan Agner <stefan@agner.ch> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* treewide: kzalloc() -> kcalloc()Kees Cook2018-06-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
* Merge tag 'arm64-upstream' of ↵Linus Torvalds2018-06-082-18/+33
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Apart from the core arm64 and perf changes, the Spectre v4 mitigation touches the arm KVM code and the ACPI PPTT support touches drivers/ (acpi and cacheinfo). I should have the maintainers' acks in place. Summary: - Spectre v4 mitigation (Speculative Store Bypass Disable) support for arm64 using SMC firmware call to set a hardware chicken bit - ACPI PPTT (Processor Properties Topology Table) parsing support and enable the feature for arm64 - Report signal frame size to user via auxv (AT_MINSIGSTKSZ). The primary motivation is Scalable Vector Extensions which requires more space on the signal frame than the currently defined MINSIGSTKSZ - ARM perf patches: allow building arm-cci as module, demote dev_warn() to dev_dbg() in arm-ccn event_init(), miscellaneous cleanups - cmpwait() WFE optimisation to avoid some spurious wakeups - L1_CACHE_BYTES reverted back to 64 (for performance reasons that have to do with some network allocations) while keeping ARCH_DMA_MINALIGN to 128. cache_line_size() returns the actual hardware Cache Writeback Granule - Turn LSE atomics on by default in Kconfig - Kernel fault reporting tidying - Some #include and miscellaneous cleanups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (53 commits) arm64: Fix syscall restarting around signal suppressed by tracer arm64: topology: Avoid checking numa mask for scheduler MC selection ACPI / PPTT: fix build when CONFIG_ACPI_PPTT is not enabled arm64: cpu_errata: include required headers arm64: KVM: Move VCPU_WORKAROUND_2_FLAG macros to the top of the file arm64: signal: Report signal frame size to userspace via auxv arm64/sve: Thin out initialisation sanity-checks for sve_max_vl arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests arm64: KVM: Add ARCH_WORKAROUND_2 support for guests arm64: KVM: Add HYP per-cpu accessors arm64: ssbd: Add prctl interface for per-thread mitigation arm64: ssbd: Introduce thread flag to control userspace mitigation arm64: ssbd: Restore mitigation status on CPU resume arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation arm64: ssbd: Add global mitigation state accessor arm64: Add 'ssbd' command-line option arm64: Add ARCH_WORKAROUND_2 probing arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2 arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1 ...
| * arm64: Unify kernel fault reportingMark Rutland2018-05-231-14/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In do_page_fault(), we handle some kernel faults early, and simply die() with a message. For faults handled later, we dump the faulting address, decode the ESR, walk the page tables, and perform a number of steps to ensure that this data is reported. Let's unify the handling of fatal kernel faults with a new die_kernel_fault() helper, handling all of these details. This is largely the same as the existing logic in __do_kernel_fault(), except that addresses are consistently padded to 16 hex characters, as would be expected for a 64-bit address. The messages currently logged in do_page_fault are adjusted to fit into the die_kernel_fault() message template. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: make is_permission_fault() name clearerMark Rutland2018-05-231-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | The naming of is_permission_fault() makes it sound like it should return true for permission faults from EL0, but by design, it only does so for faults from EL1. Let's make this clear by dropping el1 in the name, as we do for is_el1_instruction_abort(). Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: Increase ARCH_DMA_MINALIGN to 128Catalin Marinas2018-05-151-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | This patch increases the ARCH_DMA_MINALIGN to 128 so that it covers the currently known Cache Writeback Granule (CTR_EL0.CWG) on arm64 and moves the fallback in cache_line_size() from L1_CACHE_BYTES to this constant. In addition, it warns (and taints) if the CWG is larger than ARCH_DMA_MINALIGN as this is not safe with non-coherent DMA. Cc: Will Deacon <will.deacon@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | Merge branch 'siginfo-linus' of ↵Linus Torvalds2018-06-051-6/+12
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "This set of changes close the known issues with setting si_code to an invalid value, and with not fully initializing struct siginfo. There remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64 and x86 to get the code that generates siginfo into a simpler and more maintainable state. Most of that work involves refactoring the signal handling code and thus careful code review. Also not included is the work to shrink the in kernel version of struct siginfo. That depends on getting the number of places that directly manipulate struct siginfo under control, as it requires the introduction of struct kernel_siginfo for the in kernel things. Overall this set of changes looks like it is making good progress, and with a little luck I will be wrapping up the siginfo work next development cycle" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) signal/sh: Stop gcc warning about an impossible case in do_divide_error signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions signal/um: More carefully relay signals in relay_signal. signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR} signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code signal/signalfd: Add support for SIGSYS signal/signalfd: Remove __put_user from signalfd_copyinfo signal/xtensa: Use force_sig_fault where appropriate signal/xtensa: Consistenly use SIGBUS in do_unaligned_user signal/um: Use force_sig_fault where appropriate signal/sparc: Use force_sig_fault where appropriate signal/sparc: Use send_sig_fault where appropriate signal/sh: Use force_sig_fault where appropriate signal/s390: Use force_sig_fault where appropriate signal/riscv: Replace do_trap_siginfo with force_sig_fault signal/riscv: Use force_sig_fault where appropriate signal/parisc: Use force_sig_fault where appropriate signal/parisc: Use force_sig_mceerr where appropriate signal/openrisc: Use force_sig_fault where appropriate signal/nios2: Use force_sig_fault where appropriate ...
| * | signal: Ensure every siginfo we send has all bits initializedEric W. Biederman2018-04-251-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call clear_siginfo to ensure every stack allocated siginfo is properly initialized before being passed to the signal sending functions. Note: It is not safe to depend on C initializers to initialize struct siginfo on the stack because C is allowed to skip holes when initializing a structure. The initialization of struct siginfo in tracehook_report_syscall_exit was moved from the helper user_single_step_siginfo into tracehook_report_syscall_exit itself, to make it clear that the local variable siginfo gets fully initialized. In a few cases the scope of struct siginfo has been reduced to make it clear that siginfo siginfo is not used on other paths in the function in which it is declared. Instances of using memset to initialize siginfo have been replaced with calls clear_siginfo for clarity. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | | Merge tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2018-06-041-10/+0Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull dma-mapping updates from Christoph Hellwig: - replace the force_dma flag with a dma_configure bus method. (Nipun Gupta, although one patch is іncorrectly attributed to me due to a git rebase bug) - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai) - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the right thing for bounce buffering. - move dma-debug initialization to common code, and apply a few cleanups to the dma-debug code. - cleanup the Kconfig mess around swiotlb selection - swiotlb comment fixup (Yisheng Xie) - a trivial swiotlb fix. (Dan Carpenter) - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt) - add a new generic dma-noncoherent dma_map_ops implementation and use it for arc, c6x and nds32. - improve scatterlist validity checking in dma-debug. (Robin Murphy) - add a struct device quirk to limit the dma-mask to 32-bit due to bridge/system issues, and switch x86 to use it instead of a local hack for VIA bridges. - handle devices without a dma_mask more gracefully in the dma-direct code. * tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits) dma-direct: don't crash on device without dma_mask nds32: use generic dma_noncoherent_ops nds32: implement the unmap_sg DMA operation nds32: consolidate DMA cache maintainance routines x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag x86/pci-dma: remove the explicit nodac and allowdac option x86/pci-dma: remove the experimental forcesac boot option Documentation/x86: remove a stray reference to pci-nommu.c core, dma-direct: add a flag 32-bit dma limits dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs dma-debug: check scatterlist segments c6x: use generic dma_noncoherent_ops arc: use generic dma_noncoherent_ops arc: fix arc_dma_{map,unmap}_page arc: fix arc_dma_sync_sg_for_{cpu,device} arc: simplify arc_dma_sync_single_for_{cpu,device} dma-mapping: provide a generic dma-noncoherent implementation dma-mapping: simplify Kconfig dependencies riscv: add swiotlb support riscv: only enable ZONE_DMA32 for 64-bit ...
| * | | dma-debug: move initialization to common codeChristoph Hellwig2018-05-081-10/+0Star
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most mainstream architectures are using 65536 entries, so lets stick to that. If someone is really desperate to override it that can still be done through <asm/dma-mapping.h>, but I'd rather see a really good rationale for that. dma_debug_init is now called as a core_initcall, which for many architectures means much earlier, and provides dma-debug functionality earlier in the boot process. This should be safe as it only relies on the memory allocator already being available. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
* | | arm64: Make sure permission updates happen for pmd/pudLaura Abbott2018-05-241-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings") disallowed block mappings for ioremap since that code does not honor break-before-make. The same APIs are also used for permission updating though and the extra checks prevent the permission updates from happening, even though this should be permitted. This results in read-only permissions not being fully applied. Visibly, this can occasionaly be seen as a failure on the built in rodata test when the test data ends up in a section or as an odd RW gap on the page table dump. Fix this by using pgattr_change_is_safe instead of p*d_present for determining if the change is permitted. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Peter Robinson <pbrobinson@gmail.com> Reported-by: Peter Robinson <pbrobinson@gmail.com> Fixes: 15122ee2c515 ("arm64: Enforce BBM for huge IO/VMAP mappings") Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* | | arm64: fault: Don't leak data in ESR context for user fault on kernel VAPeter Maydell2018-05-221-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If userspace faults on a kernel address, handing them the raw ESR value on the sigframe as part of the delivered signal can leak data useful to attackers who are using information about the underlying hardware fault type (e.g. translation vs permission) as a mechanism to defeat KASLR. However there are also legitimate uses for the information provided in the ESR -- notably the GCC and LLVM sanitizers use this to report whether wild pointer accesses by the application are reads or writes (since a wild write is a more serious bug than a wild read), so we don't want to drop the ESR information entirely. For faulting addresses in the kernel, sanitize the ESR. We choose to present userspace with the illusion that there is nothing mapped in the kernel's part of the address space at all, by reporting all faults as level 0 translation faults taken to EL1. These fields are safe to pass through to userspace as they depend only on the instruction that userspace used to provoke the fault: EC IL (always) ISV CM WNR (for all data aborts) All the other fields in ESR except DFSC are architecturally RES0 for an L0 translation fault taken to EL1, so can be zeroed out without confusing userspace. The illusion is not entirely perfect, as there is a tiny wrinkle where we will report an alignment fault that was not due to the memory type (for instance a LDREX to an unaligned address) as a translation fault, whereas if you do this on real unmapped memory the alignment fault takes precedence. This is not likely to trip anybody up in practice, as the only users we know of for the ESR information who care about the behaviour for kernel addresses only really want to know about the WnR bit. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
* | | arm64: To remove initrd reserved area entry from memblockCHANDAN VN2018-05-011-1/+3
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | INITRD reserved area entry is not removed from memblock even though initrd reserved area is freed. After freeing the memory it is released from memblock. The same can be checked from /sys/kernel/debug/memblock/reserved. The patch makes sure that the initrd entry is removed from memblock when keepinitrd is not enabled. The patch only affects accounting and debugging. This does not fix any memory leak. Acked-by: Laura Abbott <labbott@redhat.com> Signed-off-by: CHANDAN VN <chandan.vn@samsung.com> Signed-off-by: Will Deacon <will.deacon@arm.com>