summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* fs: buffer: move allocation failure loop into the allocatorJohannes Weiner2013-10-172-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Buffer allocation has a very crude indefinite loop around waking the flusher threads and performing global NOFS direct reclaim because it can not handle allocation failures. The most immediate problem with this is that the allocation may fail due to a memory cgroup limit, where flushers + direct reclaim might not make any progress towards resolving the situation at all. Because unlike the global case, a memory cgroup may not have any cache at all, only anonymous pages but no swap. This situation will lead to a reclaim livelock with insane IO from waking the flushers and thrashing unrelated filesystem cache in a tight loop. Use __GFP_NOFAIL allocations for buffers for now. This makes sure that any looping happens in the page allocator, which knows how to orchestrate kswapd, direct reclaim, and the flushers sensibly. It also allows memory cgroups to detect allocations that can't handle failure and will allow them to ultimately bypass the limit if reclaim can not make progress. Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: memcg: handle non-error OOM situations more gracefullyJohannes Weiner2013-10-176-148/+79Star
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full callstack on OOM") assumed that only a few places that can trigger a memcg OOM situation do not return VM_FAULT_OOM, like optional page cache readahead. But there are many more and it's impractical to annotate them all. First of all, we don't want to invoke the OOM killer when the failed allocation is gracefully handled, so defer the actual kill to the end of the fault handling as well. This simplifies the code quite a bit for added bonus. Second, since a failed allocation might not be the abrupt end of the fault, the memcg OOM handler needs to be re-entrant until the fault finishes for subsequent allocation attempts. If an allocation is attempted after the task already OOMed, allow it to bypass the limit so that it can quickly finish the fault and invoke the OOM killer. Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* tools/testing/selftests: fix uninitialized variableFelipe Pena2013-10-171-1/+1
| | | | | | | | | | The err variable is intended to receive the timer_create() return before checking it Signed-off-by: Felipe Pena <felipensp@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* block/partitions/efi.c: treat size mismatch as a warning, not an errorDoug Anderson2013-10-171-1/+6
| | | | | | | | | | | | | | | | | | | | In commit 27a7c642174e ("partitions/efi: account for pmbr size in lba") we started treating bad sizes in lba field of the partition that has the 0xEE (GPT protective) as errors. However, we may run into these "bad sizes" in the real world if someone uses dd to copy an image from a smaller disk to a bigger disk. Since this case used to work (even without using force_gpt), keep it working and treat the size mismatch as a warning instead of an error. Reported-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Tested-by: Artem Bityutskiy <dedekind1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pagesAndrea Arcangeli2013-10-171-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 11feeb498086 ("kvm: optimize away THP checks in kvm_is_mmio_pfn()") introduced a memory leak when KVM is run on gigantic compound pages. That commit depends on the assumption that PG_reserved is identical for all head and tail pages of a compound page. So that if get_user_pages returns a tail page, we don't need to check the head page in order to know if we deal with a reserved page that requires different refcounting. The assumption that PG_reserved is the same for head and tail pages is certainly correct for THP and regular hugepages, but gigantic hugepages allocated through bootmem don't clear the PG_reserved on the tail pages (the clearing of PG_reserved is done later only if the gigantic hugepage is freed). This patch corrects the gigantic compound page initialization so that we can retain the optimization in 11feeb498086. The cacheline was already modified in order to set PG_tail so this won't affect the boot time of large memory systems. [akpm@linux-foundation.org: tweak comment layout and grammar] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: andy123 <ajs124.ajs124@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/zswap: bugfix: memory leak when re-swaponWeijie Yang2013-10-171-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zswap_tree is not freed when swapoff, and it got re-kmalloced in swapon, so a memory leak occurs. Free the memory of zswap_tree in zswap_frontswap_invalidate_area(). Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> From: Weijie Yang <weijie.yang@samsung.com> Subject: mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently Consider the following scenario: thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page) thread 1: call zswap_frontswap_invalidate_page to invalidate entry x. finished, entry x and its zbud is not freed as its refcount != 0 now, the swap_map[x] = 0 thread 0: now call zswap_get_swap_cache_page swapcache_prepare return -ENOENT because entry x is not used any more zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM zswap_writeback_entry do nothing except put refcount Now, the memory of zswap_entry x and its zpage leak. Modify: - check the refcount in fail path, free memory if it is not referenced. - use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path can be not only caused by nomem but also by invalidate. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pagesCyrill Gorcunov2013-10-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | If a page we are inspecting is in swap we may occasionally report it as having soft dirty bit (even if it is clean). The pte_soft_dirty helper should be called on present pte only. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: migration: do not lose soft dirty bit if page is in migration stateCyrill Gorcunov2013-10-173-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | If page migration is turned on in config and the page is migrating, we may lose the soft dirty bit. If fork and mprotect are called on migrating pages (once migration is complete) pages do not obtain the soft dirty bit in the correspond pte entries. Fix it adding an appropriate test on swap entries. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* gcov: MAINTAINERS: Add an entry for gcovPeter Oberparleiter2013-10-171-0/+6
| | | | | | Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/hugetlb.c: correct missing private flag clearingJoonsoo Kim2013-10-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | We should clear the page's private flag when returing the page to the hugepage pool. Otherwise, marked hugepage can be allocated to the user who tries to allocate the non-reserved hugepage. If this user fail to map this hugepage, he would try to return the page to the hugepage pool. Since this page has a private flag, resv_huge_pages would mistakenly increase. This patch fixes this situation. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmscan.c: don't forget to free shrinker->nr_deferredAndrew Vagin2013-10-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This leak was added by commit 1d3d4437eae1 ("vmscan: per-node deferred work"). unreferenced object 0xffff88006ada3bd0 (size 8): comm "criu", pid 14781, jiffies 4295238251 (age 105.641s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<ffffffff8170caee>] kmemleak_alloc+0x5e/0xc0 [<ffffffff811c0527>] __kmalloc+0x247/0x310 [<ffffffff8117848c>] register_shrinker+0x3c/0xa0 [<ffffffff811e115b>] sget+0x5ab/0x670 [<ffffffff812532f4>] proc_mount+0x54/0x170 [<ffffffff811e1893>] mount_fs+0x43/0x1b0 [<ffffffff81202dd2>] vfs_kern_mount+0x72/0x110 [<ffffffff81202e89>] kern_mount_data+0x19/0x30 [<ffffffff812530a0>] pid_ns_prepare_proc+0x20/0x40 [<ffffffff81083c56>] alloc_pid+0x466/0x4a0 [<ffffffff8105aeda>] copy_process+0xc6a/0x1860 [<ffffffff8105beab>] do_fork+0x8b/0x370 [<ffffffff8105c1a6>] SyS_clone+0x16/0x20 [<ffffffff8171f739>] stub_clone+0x69/0x90 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Andrew Vagin <avagin@openvz.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Glauber Costa <glommer@openvz.org> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc/sem.c: synchronize semop and semctl with IPC_RMIDManfred Spraul2013-10-171-13/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After acquiring the semlock spinlock, operations must test that the array is still valid. - semctl() and exit_sem() would walk stale linked lists (ugly, but should be ok: all lists are empty) - semtimedop() would sleep forever - and if woken up due to a signal - access memory after free. The patch also: - standardizes the tests for .deleted, so that all tests in one function leave the function with the same approach. - unconditionally tests for .deleted immediately after every call to sem_lock - even it it means that for semctl(GETALL), .deleted will be tested twice. Both changes make the review simpler: After every sem_lock, there must be a test of .deleted, followed by a goto to the cleanup code (if the function uses "goto cleanup"). The only exception is semctl_down(): If sem_ids().rwsem is locked, then the presence in ids->ipcs_idr is equivalent to !.deleted, thus no additional test is required. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Mike Galbraith <efault@gmx.de> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc: update locking scheme commentsDavidlohr Bueso2013-10-171-6/+21
| | | | | | | | | | The initial documentation was a bit incomplete, update accordingly. [akpm@linux-foundation.org: make it more readable in 80 columns] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, memcg: protect mem_cgroup_read_events for cpu hotplugDavid Rientjes2013-10-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | for_each_online_cpu() needs the protection of {get,put}_online_cpus() so cpu_online_mask doesn't change during the iteration. cpu_hotplug.lock is held while a cpu is going down, it's a coarse lock that is used kernel-wide to synchronize cpu hotplug activity. Memcg has a cpu hotplug notifier, called while there may not be any cpu hotplug refcounts, which drains per-cpu event counts to memcg->nocpu_base.events to maintain a cumulative event count as cpus disappear. Without get_online_cpus() in mem_cgroup_read_events(), it's possible to account for the event count on a dying cpu twice, and this value may be significantly large. In fact, all memcg->pcp_counter_lock use should be nested by {get,put}_online_cpus(). This fixes that issue and ensures the reported statistics are not vastly over-reported during cpu hotplug. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linuxLinus Torvalds2013-10-169-384/+1Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull device tree fixes and reverts from Grant Likely: "One bug fix and three reverts. The reverts back out the slightly controversial feeding the entire device tree into the random pool and the reserved-memory binding which isn't fully baked yet. Expect the reserved-memory patches at least to resurface for v3.13. The bug fixes removes a scary but harmless warning on SPARC that was introduced in the v3.12 merge window. v3.13 will contain a proper fix that makes the new code work on SPARC. On the plus side, the diffstat looks *awesome*. I love removing lines of code" * tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux: Revert "drivers: of: add initialization code for dma reserved memory" Revert "ARM: init: add support for reserved memory defined by device tree" Revert "of: Feed entire flattened device tree into the random pool" of: fix unnecessary warning on missing /cpus node
| * Revert "drivers: of: add initialization code for dma reserved memory"Marek Szyprowski2013-10-156-366/+0Star
| | | | | | | | | | | | | | | | | | | | This reverts commit 9d8eab7af79cb4ce2de5de39f82c455b1f796963. There is still no consensus on the bindings for the reserved memory and various drawbacks of the proposed solution has been shown, so the best now is to revert it completely and start again from scratch later. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Grant Likely <grant.likely@linaro.org>
| * Revert "ARM: init: add support for reserved memory defined by device tree"Marek Szyprowski2013-10-151-3/+0Star
| | | | | | | | | | | | | | | | | | This reverts commit 10bcdfb8ba24760f715f0a700c3812747eddddf5. There is no consensus on the bindings for the reserved memory, so the code for handing it will be reverted. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Grant Likely <grant.likely@linaro.org>
| * Revert "of: Feed entire flattened device tree into the random pool"Grant Likely2013-10-141-12/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 109b6236294b53d8eaa50be7d9e9ad37079f5f7e. Tim Bird expressed concern that this will have a bad effect on boot time, and while simple tests have shown it to be okay with simple tree, a device tree blob can potentially be quite large and add_device_randomness() is not a fast function. Rather than do this for all platforms unconditionally, I'm reverting this patch and would like to see it revisited. Instead of feeding the entire tree into the random pool, it would probably be appropriate to hash the tree and feed the hash result into the pool. There really isn't a lot of randomness in a device tree anyway. In the majority of cases only a handful of properties are going to be different between machines with the same baseboard. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
| * of: fix unnecessary warning on missing /cpus nodeGrant Likely2013-10-141-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not all DT platforms have all the cpus collected under a /cpus node. That just happens to be a details of FDT, ePAPR and PowerPC platforms. Sparc does something different, but unfortunately the current code complains with a warning if /cpus isn't there. This became a problem with commit f86e4718, "driver/core cpu: initialize of_node in cpu's device structure", which caused the function to get called for all architectures. This commit is a temporary fix to fail silently if the cpus node isn't present. A proper fix will come later to allow arch code to provide a custom mechanism for decoding the CPU hwid if the 'reg' property isn't appropriate. Signed-off-by: Grant Likely <grant.likely@linaro.org> Cc: David Miller <davem@davemloft.net> Cc: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com> Cc: Rob Herring <rob.herring@calxeda.com>
* | Merge branch 'fixes-for-v3.12' of ↵Linus Torvalds2013-10-161-15/+28
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull DMA-mapping fix from Marek Szyprowski: "A bugfix for the IOMMU-based implementation of dma-mapping subsystem for ARM architecture" * 'fixes-for-v3.12' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: ARM: dma-mapping: Always pass proper prot flags to iommu_map()
| * | ARM: dma-mapping: Always pass proper prot flags to iommu_map()Andreas Herrmann2013-10-021-15/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... otherwise it is impossible for the low level iommu driver to figure out which pte flags should be used. In __map_sg_chunk we can derive the flags from dma_data_direction. In __iommu_create_mapping we should treat the memory like DMA_BIDIRECTIONAL and pass both IOMMU_READ and IOMMU_WRITE to iommu_map. __iommu_create_mapping is used during dma_alloc_coherent (via arm_iommu_alloc_attrs). AFAIK dma_alloc_coherent is responsible for allocation _and_ mapping. I think this implies that access to the mapped pages should be allowed. Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andreas Herrmann <andreas.herrmann@calxeda.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
* | | Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-10-161-3/+14
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fix from Gleb Natapov. * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Enable pvspinlock after jump_label_init() to avoid VM hang
| * | | KVM: Enable pvspinlock after jump_label_init() to avoid VM hangRaghavendra K T2013-10-151-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use jump label to enable pv-spinlock. With the changes in (442e0973e927 Merge branch 'x86/jumplabel'), the jump label behaviour has changed that would result in eventual hang of the VM since we would end up in a situation where slow path locks would halt the vcpus but we will not be able to wakeup the vcpu by lock releaser using unlock kick. Similar problem in Xen and more detailed description is available in a945928ea270 (xen: Do not enable spinlocks before jump_label_init() has executed) This patch splits kvm_spinlock_init to separate jump label changes with pvops patching and also make jump label enabling after jump_label_init(). Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | | | Merge tag 'stable/for-linus-3.12-rc4-tag' of ↵Linus Torvalds2013-10-162-0/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from Stefano Stabellini: "A small fix for Xen on x86_32 and a build fix for xen-tpmfront on arm64" * tag 'stable/for-linus-3.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Fix possible user space selector corruption tpm: xen-tpmfront: fix missing declaration of xen_domain
| * | | | xen: Fix possible user space selector corruptionFrediano Ziglio2013-10-101-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the way kernel is initialized under Xen is possible that the ring1 selector used by the kernel for the boot cpu end up to be copied to userspace leading to segmentation fault in the userspace. Xen code in the kernel initialize no-boot cpus with correct selectors (ds and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen). On task context switch (switch_to) we assume that ds, es and cs already point to __USER_DS and __KERNEL_CSso these selector are not changed. If processor is an Intel that support sysenter instruction sysenter/sysexit is used so ds and es are not restored switching back from kernel to userspace. In the case the selectors point to a ring1 instead of __USER_DS the userspace code will crash on first memory access attempt (to be precise Xen on the emulated iret used to do sysexit will detect and set ds and es to zero which lead to GPF anyway). Now if an userspace process call kernel using sysenter and get rescheduled (for me it happen on a specific init calling wait4) could happen that the ring1 selector is set to ds and es. This is quite hard to detect cause after a while these selectors are fixed (__USER_DS seems sticky). Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears to be the first one that have this issue. Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
| * | | | tpm: xen-tpmfront: fix missing declaration of xen_domainRob Herring2013-10-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xen-tpmfront fails to build on arm64 with the following error: drivers/char/tpm/xen-tpmfront.c: In function ‘xen_tpmfront_init’: drivers/char/tpm/xen-tpmfront.c:422:2: error: implicit declaration of function ‘xen_domain’ [-Werror=implicit-function-declaration] Add include of xen/xen.h to fix this. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Ashley Lai <adlai@linux.vnet.ibm.com> Acked-by: Ashley Lai <adlai@linux.vnet.ibm.com> Cc: Leonidas Da Silva Barbosa <leosilva@linux.vnet.ibm.com> Cc: Rajiv Andrade <mail@srajiv.net> Cc: Marcel Selhorst <tpmdd@selhorst.net> Cc: Sirrix AG <tpmdd@sirrix.com> Cc: tpmdd-devel@lists.sourceforge.net
* | | | | Merge tag 'vfio-v3.12-rc5' of git://github.com/awilliam/linux-vfioLinus Torvalds2013-10-151-19/+21
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull vfio fix from Alex Williamson: "Fix an incorrect break out of nested loop in iommu mapping code" * tag 'vfio-v3.12-rc5' of git://github.com/awilliam/linux-vfio: VFIO: vfio_iommu_type1: fix bug caused by break in nested loop
| * | | | | VFIO: vfio_iommu_type1: fix bug caused by break in nested loopAntonios Motakis2013-10-111-19/+21
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In vfio_iommu_type1.c there is a bug in vfio_dma_do_map, when checking that pages are not already mapped. Since the check is being done in a for loop nested within the main loop, breaking out of it does not create the intended behavior. If the underlying IOMMU driver returns a non-NULL value, this will be ignored and mapping the DMA range will be attempted anyway, leading to unpredictable behavior. This interracts badly with the ARM SMMU driver issue fixed in the patch that was submitted with the title: "[PATCH 2/2] ARM: SMMU: return NULL on error in arm_smmu_iova_to_phys" Both fixes are required in order to use the vfio_iommu_type1 driver with an ARM SMMU. This patch refactors the function slightly, in order to also make this kind of bug less likely. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | | | | Merge tag 'rdma-for-linus' of ↵Linus Torvalds2013-10-1515-141/+126Star
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband updates from Roland Dreier: "Last batch of IB changes for 3.12: many mlx5 hardware driver fixes plus one trivial semicolon cleanup" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB: Remove unnecessary semicolons IB/mlx5: Ensure proper synchronization accessing memory IB/mlx5: Fix alignment of reg umr gather buffers IB/mlx5: Fix eq names to display nicely in /proc/interrupts mlx5: Fix error code translation from firmware to driver IB/mlx5: Fix opt param mask according to firmware spec mlx5: Fix opt param mask for sq err to rts transition IB/mlx5: Disable atomic operations mlx5: Fix layout of struct mlx5_init_seg mlx5: Keep polling to reclaim pages while any returned IB/mlx5: Avoid async events on invalid port number IB/mlx5: Decrease memory consumption of mr caches mlx5: Remove checksum on command interface commands IB/mlx5: Fix memory leak in mlx5_ib_create_srq IB/mlx5: Flush cache workqueue before destroying it IB/mlx5: Fix send work queue size calculation
| * \ \ \ \ Merge branch 'misc' into for-nextRoland Dreier2013-10-145-9/+9
| |\ \ \ \ \
| | * | | | | IB: Remove unnecessary semicolonsJoe Perches2013-10-145-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These aren't necessary after switch blocks. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Ensure proper synchronization accessing memoryEli Cohen2013-10-101-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call mlx5_ib_populate_pas() before mapping the DMA buffer to ensure the hardware reads the values written by the CPU. Found by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Fix alignment of reg umr gather buffersEli Cohen2013-10-101-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hardware requires that gather buffers for UMR work requests be aligned to 2K. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Fix eq names to display nicely in /proc/interruptsSagi Grimberg2013-10-103-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's helpful for a driver to put the pci slot name in its interrupt names, so /proc/interrupts will show the pci slot of the device. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | mlx5: Fix error code translation from firmware to driverEli Cohen2013-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Limits exceeded should be translated to ENOMEM. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Fix opt param mask according to firmware specEli Cohen2013-10-101-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Failed to configure opt mask to configure rre from init to rtr. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | mlx5: Fix opt param mask for sq err to rts transitionEli Cohen2013-10-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing entry in the table for UC transport. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Disable atomic operationsEli Cohen2013-10-102-47/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently Atomic operations don't work properly. Disable them for the time being. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | mlx5: Fix layout of struct mlx5_init_segEli Cohen2013-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The layout of struct health_buffer was not according to firmware specification. Fix it to comply. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | mlx5: Keep polling to reclaim pages while any returnedEli Cohen2013-10-101-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change mlx5_reclaim_startup_pages() to keep polling while any pages are returned. If none are returned, keep polling for five more seconds before exiting with an error message. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Avoid async events on invalid port numberEli Cohen2013-10-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a single ported Connect-IB, its possible for the firmware to issue events on the non-existing 2nd port. Make sure to ignore events generated for such ports. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Decrease memory consumption of mr cachesEli Cohen2013-10-101-35/+22Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the logic so we do not allocate memory nor map the device before actually posting to the REG_UMR QP. In addition, unmap and free the memory after we get completion. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | mlx5: Remove checksum on command interface commandsEli Cohen2013-10-104-32/+21Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Checksum calculations consume CPU resources and can be significant to the rate of resource creation/destruction. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Fix memory leak in mlx5_ib_create_srqMoshe Lazer2013-10-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch fixes the rollback in case of failure in creating SRQ. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Flush cache workqueue before destroying itMoshe Lazer2013-10-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Destroying the workqueue without flushing it first can lead to a case in which the kernel tries to push a delayed work to the workqueue which does not exist anymore. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | | | | IB/mlx5: Fix send work queue size calculationEli Cohen2013-10-101-6/+16
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Make sure wqe_cnt does not exceed the limit published by firmware. 2. There is no requirement that the number of outstanding work requests will be a power of two. Remove the ilog2 in the calculation of sq.max_post to fix that. 3. Add case for IB_QPT_XRC_TGT in sq_overhead and return 0 as XRC target QPs do not have a send queue. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | | | | | Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds2013-10-146-9/+44
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ARM fixes from Russell King: "Some more ARM fixes, nothing particularly major here. The biggest change is to fix the SMP_ON_UP code so that it works with TI's Aegis cores" * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: ARM: 7851/1: check for number of arguments in syscall_get/set_arguments() ARM: 7846/1: Update SMP_ON_UP code to detect A9MPCore with 1 CPU devices ARM: 7845/1: sharpsl_param.c: fix invalid memory access for pxa devices ARM: 7843/1: drop asm/types.h from generic-y ARM: 7842/1: MCPM: don't explode if invoked without being initialized first
| * | | | | | ARM: 7851/1: check for number of arguments in syscall_get/set_arguments()AKASHI Takahiro2013-10-131-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ftrace_syscall_enter(), syscall_get_arguments(..., 0, n, ...) if (i == 0) { <handle ORIG_r0> ...; n--;} memcpy(..., n * sizeof(args[0])); If 'number of arguments(n)' is zero and 'argument index(i)' is also zero in syscall_get_arguments(), none of arguments should be copied by memcpy(). Otherwise 'n--' can be a big positive number and unexpected amount of data will be copied. Tracing system calls which take no argument, say sync(void), may hit this case and eventually make the system corrupted. This patch fixes the issue both in syscall_get_arguments() and syscall_set_arguments(). Cc: <stable@vger.kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | | ARM: 7846/1: Update SMP_ON_UP code to detect A9MPCore with 1 CPU devicesSantosh Shilimkar2013-10-031-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic code is well equipped to differentiate between SMP and UP configurations.However, there are some devices which use Cortex-A9 MP core IP with 1 CPU as configuration. To let these SOCs to co-exist in a CONFIG_SMP=y build by leveraging the SMP_ON_UP support, we need to additionally check the number the cores in Cortex-A9 MPCore configuration. Without such a check in place, the startup code tries to execute ALT_SMP() set of instructions which lead to CPU faults. The issue was spotted on TI's Aegis device and this patch makes now the device work with omap2plus_defconfig which enables SMP by default. The change is kept limited to only Cortex-A9 MPCore detection code. Note that if any future SoC *does* use 0x0 as the PERIPH_BASE, then the SCU address check code needs to be #ifdef'd for for the Aegis platform. Acked-by: Sricharan R <r.sricharan@ti.com> Signed-off-by: Vaibhav Bedia <vaibhav.bedia@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | | ARM: 7845/1: sharpsl_param.c: fix invalid memory access for pxa devicesAndrea Adami2013-10-031-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression for kernels after v3.2 After commit 72662e01088394577be4a3f14da94cf87bea2591 ARM: head.S: only include __turn_mmu_on in the initial identity mapping Zaurus PXA devices call sharpsl_save_param() during fixup and hang on boot because memcpy refers to physical addresses no longer valid if the MMU is setup. Zaurus collie (SA1100) is unaffected (function is called in init_machine). The code was making assumptions and for PXA the virtual address should have been used before. Signed-off-by: Marko Katic <dromede@gmail.com> Signed-off-by: Andrea Adami <andrea.adami@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>