summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit tooTejun Heo2011-05-022-11/+12
| | | | | | | | | | | | | | | Now that NUMA init path is unified, amdtopology can be enabled on 32bit. Make amdtopology.c safe on 32bit by explicitly using u64 and drop X86_64 dependency from Kconfig. Inclusion of bootmem.h is added for max_pfn declaration. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: Rename amdtopology_64.c to amdtopology.cTejun Heo2011-05-022-1/+1
| | | | | | | | | | | | amdtopology is going to be used by 32bit too drop _64 suffix. This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: Make numa_init_array() staticTejun Heo2011-05-022-3/+1Star
| | | | | | | | | | | | numa_init_array() no longer has users outside of numa.c. Make it static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: Make 32bit use common NUMA init pathTejun Heo2011-05-023-241/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With both _numa_init() methods converted and the rest of init code adjusted, numa_32.c now can switch from the 32bit only init code to the common one in numa.c. * Shim get_memcfg_*()'s are dropped and initmem_init() calls x86_numa_init(), which is updated to handle NUMAQ. * All boilerplate operations including node range limiting, pgdat alloc/init are handled by numa_init(). 32bit only implementation is removed. * 32bit numa_add_memblk(), numa_set_distance() and memory_add_physaddr_to_nid() removed and common versions in numa_32.c enabled for 32bit. This change causes the following behavior changes. * NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized for 32bit too. * Much more sanity checks and configuration cleanups. * Proper handling of node distances. * The same NUMA init messages as 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: Initialize and use remap allocator from setup_node_bootmem()Tejun Heo2011-05-023-15/+34
| | | | | | | | | | | | | | | | | | setup_node_bootmem() is taken from 64bit and doesn't use remap allocator. It's about to be shared with 32bit so add support for it. If NODE_DATA is remapped, it's noted in the debug message and node locality check is skipped as the __pa() of the remapped address doesn't reflect the actual physical address. On 64bit, remap allocator becomes noop and doesn't affect the behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-32, NUMA: Add @start and @end to init_alloc_remap()Tejun Heo2011-05-021-15/+14Star
| | | | | | | | | | | | | | Instead of dereferencing node_start/end_pfn[] directly, make init_alloc_remap() take @start and @end and let the caller be responsible for making sure the range is sane. This is to prepare for use from unified NUMA init code. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: Remove long 64bit assumption from numa.cTejun Heo2011-05-021-23/+22Star
| | | | | | | | | | | | | | Code moved from numa_64.c has assumption that long is 64bit in several places. This patch removes the assumption by using {s|u}64_t explicity, using PFN_PHYS() for page number -> addr conversions and adjusting printf formats. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: Enable build of generic NUMA init code on 32bitTejun Heo2011-05-023-12/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Generic NUMA init code was moved to numa.c from numa_64.c but is still guaraded by CONFIG_X86_64. This patch removes the compile guard and enables compiling on 32bit. * numa_add_memblk() and numa_set_distance() clash with the shim implementation in numa_32.c and are left out. * memory_add_physaddr_to_nid() clashes with 32bit implementation and is left out. * MAX_DMA_PFN definition in dma.h moved out of !CONFIG_X86_32. * node_data definition in numa_32.c removed in favor of the one in numa.c. There are places where ulong is assumed to be 64bit. The next patch will fix them up. Note that although the code is compiled it isn't used yet and this patch doesn't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: Move NUMA init logic from numa_64.c to numa.cTejun Heo2011-05-026-526/+539
| | | | | | | | | | | | | | | | | | | | | | | | | | | Move the generic 64bit NUMA init machinery from numa_64.c to numa.c. * node_data[], numa_mem_info and numa_distance * numa_add_memblk[_to](), numa_remove_memblk[_from]() * numa_set_distance() and friends * numa_init() and all the numa_meminfo handling helpers called from it * dummy_numa_init() * memory_add_physaddr_to_nid() A new function x86_numa_init() is added and the content of numa_64.c::initmem_init() is moved into it. initmem_init() now simply calls x86_numa_init(). Constants and numa_off declaration are moved from numa_{32|64}.h to numa.h. This is code reorganization and doesn't involve any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-32, NUMA: Update numaq to use new NUMA init protocolTejun Heo2011-05-023-24/+34
| | | | | | | | | | | | | | | | | | | | | Update numaq such that it calls numa_add_memblk() and sets numa_nodes_parsed instead of directly diddling with NUMA states. The original get_memcfg_numaq() is renamed to numaq_numa_init() and new get_memcfg_numaq() is created in numa_32.c. The shim numa_add_memblk() implementation handles node_start/end_pfn[] and node_set_online() for nodes with memory. The new get_memcfg_numaq() exactly the same with get_memcfg_from_srat() other than calling the numaq init function. Things get_memcfgs_numaq() do are not strictly necessary for numaq but added for consistency and to help unifying NUMA init handling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-32, NUMA: Replace srat_32.c with srat.cTejun Heo2011-05-025-326/+24Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SRAT support implementation in srat_32.c and srat.c are generally similar; however, there are some differences. First of all, 64bit implementation supports more types of SRAT entries. 64bit supports x2apic, affinity, memory and SLIT. 32bit only supports processor and memory. Most other differences stem from different initialization protocols employed by 64bit and 32bit NUMA init paths. On 64bit, * Mappings among PXM, node and apicid are directly done in each SRAT entry callback. * Memory affinity information is passed to numa_add_memblk() which takes care of all interfacing with NUMA init. * Doesn't directly initialize NUMA configurations. All the information is recorded in numa_nodes_parsed and memblks. On 32bit, * Checks numa_off. * Things go through one more level of indirection via private tables but eventually end up initializing the same mappings. * node_start/end_pfn[] are initialized and memblock_x86_register_active_regions() is called for each memory chunk. * node_set_online() is called for each online node. * sort_node_map() is called. There are also other minor differences in sanity checking and messages but taking 64bit version should be good enough. This patch drops the 32bit specific implementation and makes the 64bit implementation common for both 32 and 64bit. The init protocol differences are dealt with in two places - the numa_add_memblk() shim added in the previous patch and new temporary numa_32.c:get_memcfg_from_srat() which wraps invocation of x86_acpi_numa_init(). The shim numa_add_memblk() handles the folowings. * node_start/end_pfn[] initialization. * node_set_online() for memory nodes. * Invocation of memblock_x86_register_active_regions(). The shim get_memcfg_from_srat() handles the followings. * numa_off check. * node_set_online() for CPU nodes. * sort_node_map() invocation. * Clearing of numa_nodes_parsed and active_ranges on failure. The shims are temporary and will be removed as the generic NUMA init path in 32bit is replaced with 64bit one. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-32, NUMA: implement temporary NUMA init shimsTejun Heo2011-05-023-3/+37
| | | | | | | | | | | | | | | | | | | | To help transition to common NUMA init, implement temporary 32bit shims for numa_add_memblk() and numa_set_distance(). numa_add_memblk() registers the memblk and adjusts node_start/end_pfn[]. numa_set_distance() is noop. These shims will allow using 64bit NUMA init functions on 32bit and gradual transition to common NUMA init path. For detailed description, please read description of commits which make use of the shim functions. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: Move numa_nodes_parsed to numa.[hc]Tejun Heo2011-05-024-6/+2Star
| | | | | | | | | | | | Move numa_nodes_parsed from numa_64.[hc] to numa.[hc] to prepare for NUMA init path unification. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-32, NUMA: Move get_memcfg_numa() into numa_32.cTejun Heo2011-05-022-19/+10Star
| | | | | | | | | | | | | | There's no reason get_memcfg_numa() to be implemented inline in mmzone_32.h. Move it to numa_32.c and also make get_memcfg_numa_flag() static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: make srat.c 32bit safeTejun Heo2011-05-021-2/+2
| | | | | | | | | | | | Make srat.c 32bit safe by removing the assumption that unsigned long is 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: rename srat_64.c to srat.cTejun Heo2011-05-022-1/+4
| | | | | | | | | | | | Rename srat_64.c to srat.c. This is to prepare for unification of NUMA init paths between 32 and 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: trivial cleanupsTejun Heo2011-05-025-13/+1Star
| | | | | | | | | | | | | | | * Kill no longer used struct bootnode. * Kill dangling declaration of pxm_to_nid() in numa_32.h. * Make setup_node_bootmem() static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-32, NUMA: use sparse_memory_present_with_active_regions()Tejun Heo2011-05-024-10/+2Star
| | | | | | | | | | | | | | | | | | | Instead of calling memory_present() for each region from NUMA init, call sparse_memory_present_with_active_regions() from paging_init() similarly to x86-64. For flat and numaq, this results in exactly the same memory_present() calls. For srat, if there are multiple memory chunks for a node, after this change, memory_present() will be called separately for each chunk instead of being called once to encompass the whole range, which doesn't cause any harm and actually is the better behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optionalTejun Heo2011-05-027-39/+9Star
| | | | | | | | | | | | | | NUMAQ is the only meaningful user of this callback and setup_local_APIC() the only callsite. Stop torturing everyone else by making the callback optional and removing all the boilerplate implementations and assignments. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86, NUMA: Unify 32/64bit numa_cpu_node() implementationTejun Heo2011-05-026-23/+19Star
| | | | | | | | | | | | | | | | | | | Currently, the only meaningful user of apic->x86_32_numa_cpu_node() is NUMAQ which returns valid mapping only after CPU is initialized during SMP bringup; thus, the previous patch to set apicid -> node in setup_local_APIC() makes __apicid_to_node[] always contain the correct mapping whether custom apic->x86_32_numa_cpu_node() is used or not. So, there is no reason to keep separate 32bit implementation. We can always consult __apicid_to_node[]. Move 64bit implementation from numa_64.c to numa.c and remove 32bit implementation from numa_32.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC()Tejun Heo2011-05-021-0/+10
| | | | | | | | | | | | | | | | | | | | | Some x86-32 NUMA implementations (NUMAQ) don't initialize apicid -> node mapping using set_apicid_to_node() during NUMA init but implement custom apic->x86_32_numa_cpu_node() instead. This patch automatically initializes the default apic -> node mapping table from apic->x86_32_numa_cpu_node() from setup_local_APIC() such that the mapping table is in sync with the actual mapping. As the table isn't used by custom implementations, this doesn't make any difference at this point. This is in preparation of unifying numa_cpu_node() between x86-32 and 64. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-64, NUMA: simplify nodedata allocationTejun Heo2011-05-021-36/+17Star
| | | | | | | | | | | | | | | | With top-down memblock allocation, the allocation range limits in ealry_node_mem() can be simplified - try node-local first, then any node but in any case don't allocate below DMA limit. Remove early_node_mem() and implement simplified allocation directly in setup_node_bootmem(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-64, NUMA: trivial cleanups for setup_node_bootmem()Tejun Heo2011-05-021-29/+23Star
| | | | | | | | | | | | | | | Make the following trivial changes in preparation for further updates. * nodeid -> nid, nid -> tnid * use nd_ prefix for nodedata related variables * remove start/end_pfn and use start/end directly Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* x86-64, NUMA: Simplify hotadd memory handlingTejun Heo2011-05-023-86/+22Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only special handling NUMA needs to do for hotadd memory is determining the node for the hotadd memory given the address of it and there's nothing specific to specific config method used. srat_64.c does somewhat elaborate error checking on ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements memory_add_physaddr_to_nid() which determines the node for given hotadd address. This is almost completely redundant. All the information is already available to the generic NUMA code which already performs all the sanity checking and merging. All that's necessary is not using __initdata from numa_meminfo and providing a function which uses it to map address to node. Drop the specific implementation from srat_64.c and add generic memory_add_physaddr_to_nid() in numa_64.c, which is enabled if CONFIG_MEMORY_HOTPLUG is set. Other than dropping the code, srat_64.c doesn't need any change as it already calls numa_add_memblk() for hot pluggable regions which is enough. While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the same. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
* Merge branch 'x86/urgent' into x86-mmTejun Heo2011-05-02533-937/+1482
|\ | | | | | | | | | | | | | | | | | | | | Merge reason: Pick up the following two fix commits. 2be19102b7: x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() 765af22da8: x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change Scheduled NUMA init 32/64bit unification changes depend on these. Signed-off-by: Tejun Heo <tj@kernel.org>
| * x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()Yinghai Lu2011-05-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | numa_cleanup_meminfo() trims each memblk between low (0) and high (max_pfn) limits and discards empty ones. However, the emptiness detection incorrectly used equality test. If the start of a memblk is higher than max_pfn, it is empty but fails the equality test and doesn't get discarded. The condition triggers when max_pfn is lower than start of a NUMA node and results in memory misconfiguration - leading to WARN_ON()s and other funnies. The bug was discovered in devel branch where 32bit too uses this code path for NUMA init. If a node is above the addressing limit, max_pfn ends up lower than the node triggering this problem. The failure hasn't been observed on x86-64 but is still possible with broken hardware e820/NUMA info. As the fix is very low risk, it would be better to apply it even for 64bit. Fix it by using >= instead of ==. Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Extracted the actual fix from the original patch and rewrote patch description. ] Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processorsBoris Ostrovsky2011-05-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Older AMD K8 processors (Revisions A-E) are affected by erratum 400 (APIC timer interrupts don't occur in C states greater than C1). This, for example, means that X86_FEATURE_ARAT flag should not be set for these parts. This addresses regression introduced by commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 ("x86, AMD: Set ARAT feature on AMD processors") where the system may become unresponsive until external interrupt (such as keyboard input) occurs. This results, for example, in time not being reported correctly, lack of progress on the system and other lockups. Reported-by: Joerg-Volker Peetz <jvpeetz@web.de> Tested-by: Joerg-Volker Peetz <jvpeetz@web.de> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2011-04-303-10/+52
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86, nmi: Move LVT un-masking into irq handlers perf events, x86: Work around the Nehalem AAJ80 erratum perf, x86: Fix BTS condition ftrace: Build without frame pointers on Microblaze
| | * perf, x86, nmi: Move LVT un-masking into irq handlersDon Zickus2011-04-273-6/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was noticed that P4 machines were generating double NMIs for each perf event. These extra NMIs lead to 'Dazed and confused' messages on the screen. I tracked this down to a P4 quirk that said the overflow bit had to be cleared before re-enabling the apic LVT mask. My first attempt was to move the un-masking inside the perf nmi handler from before the chipset NMI handler to after. This broke Nehalem boxes that seem to like the unmasking before the counters themselves are re-enabled. In order to keep this change simple for 2.6.39, I decided to just simply move the apic LVT un-masking to the beginning of all the chipset NMI handlers, with the exception of Pentium4's to fix the double NMI issue. Later on we can move the un-masking to later in the handlers to save a number of 'extra' NMIs on those particular chipsets. I tested this change on a P4 machine, an AMD machine, a Nehalem box, and a core2quad box. 'perf top' worked correctly along with various other small 'perf record' runs. Anything high stress breaks all the machines but that is a different problem. Thanks to various people for testing different versions of this patch. Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu> CC: Cyrill Gorcunov <gorcunov@gmail.com>
| | * perf events, x86: Work around the Nehalem AAJ80 erratumIngo Molnar2011-04-261-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Nehalem CPUs the retired branch-misses event can be completely bogus, when there are no branch-misses occuring. When there are a lot of branch misses then the count is pretty accurate. Still, this leaves us with an event that over-counts a lot. Detect this erratum and work it around by using BR_MISP_EXEC.ANY events. These will also count speculated branches but still it's a lot more precise in practice than the architectural event. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-yyfg0bxo9jsqxd6a0ovfny27@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * perf, x86: Fix BTS conditionPeter Zijlstra2011-04-262-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the x86 backend incorrectly assumes that any BRANCH_INSN with sample_period==1 is a BTS request. This is not true when we do frequency driven profiling such as 'perf record -e branches'. Solves this error: $ perf record -e branches ./array Error: sys_perf_event_open() syscall returned with 95 (Operation not supported). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Ingo Molnar <mingo@elte.hu> Cc: "Metzger, Markus T" <markus.t.metzger@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-rd2y4ct71hjawzz6fpvsy9hg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2011-04-305-11/+11
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: ce4100: Configure IOAPIC pins for USB and SATA to level type x86: devicetree: Configure IOAPIC pin only once x86, setup: When probing memory with e801, use ax/bx as a pair
| | * | x86: ce4100: Configure IOAPIC pins for USB and SATA to level typeSebastian Andrzej Siewior2011-04-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The USB and SATA ioapic interrrupt pins are configured as edge type, but need to be level type interrupts to work correctly. [ tglx: Split out from the combo patch ] Cc: Torben Hohn <torbenh@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/%3C20110427143052.GA15211%40linutronix.de%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | x86: devicetree: Configure IOAPIC pin only onceSebastian Andrzej Siewior2011-04-283-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use io_apic_setup_irq_pin() in order to configure pin's interrupt number polarity and type. This is done on every irq_create_of_mapping() which happens for instance during pci enable calls. Level typed interrupts are masked by default, edge are unmasked. On the first ->xlate() call the level interrupt is configured and masked. The driver calls request_irq() and the line is unmasked. Lets assume the interrupt line is shared with another device and we call pci_enable_device() for this device. The ->xlate() configures the pin again and it is masked. request_irq() does not unmask the line because it _is_ already unmasked according to its internal state. So the interrupt will never be unmasked again. This patch is based on an earlier work by Torben Hohn and solves the problem by configuring the pin only once. Since all devices must agree on the same type and polarity there is no point in configuring the pin more than once. [ tglx: Split out the ce4100 part into a separate patch ] Cc: Torben Hohn <torbenh@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/%3C20110427143052.GA15211%40linutronix.de%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | x86, setup: When probing memory with e801, use ax/bx as a pairH. Peter Anvin2011-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we use BIOS function e801 to probe memory, we should use ax/bx (or cx/dx) as a pair, not mix and match. This was a typo during the translation from assembly code, and breaks at least one set of machines in the field (which return cx = dx = 0). Reported-and-tested-by: Chris Samuel <chris@csamuel.org> Fix-proposed-by: Thomas Meyer <thomas@m3y3r.de> Link: http://lkml.kernel.org/r/1303566747.12067.10.camel@localhost.localdomain
| * | | Merge branch 'omap-fixes-for-linus' of ↵Linus Torvalds2011-04-2912-18/+56
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 * 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: OMAP3+: voltage: remove initial voltage OMAP4: Intialize IVA Device in addition to DSP device. omap: rx51: mark reserved memory earlier OMAP3: l3: fix for "irq 10: nobody cared" message arm: omap2: enable smc instruction for sleep34xx OMAP2/3: hwmod: fix gpio-reset timeouts seen during bootup. OMAP3: PM: Do not rely on ROM code to restore CM_AUTOIDLE_PLL.AUTO_PERIPH_DPLL OMAP2+: PM: Fix the saving of CM_AUTOIDLE_PLL register on scratchpad area OMAP4: clock data: Change DSS clock aliases OMAP2+: hwmod data: Fix wrong dma_system end address
| | * \ \ Merge branch 'for_tony_a_2.6.39rc' of git://git.pwsan.com/linux-2.6 into ↵Tony Lindgren2011-04-277-12/+45
| | |\ \ \ | | | | | | | | | | | | | | | | | | devel-fixes
| | | * | | OMAP2/3: hwmod: fix gpio-reset timeouts seen during bootup.Avinash.H.M2011-04-203-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GPIO module expects the debounce clocks to be enabled during reset. It doesn't reset properly and timeouts are seen, if this clock isn't enabled during reset. Add the HWMOD_CONTROL_OPT_CLKS_IN_RESET flags to the GPIO HWMODs, with which the debounce clocks are enabled during reset. Cc: Rajendra Nayak <rnayak@ti.com> Cc: Paul Walmsley <paul@pwsan.com> Cc: Benoit Cousson <b-cousson@ti.com> Cc: Kevin Hilman <khilman@ti.com> Signed-off-by: Avinash.H.M <avinashhm@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>
| | | * | | OMAP3: PM: Do not rely on ROM code to restore CM_AUTOIDLE_PLL.AUTO_PERIPH_DPLLEduardo Valentin2011-04-202-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per OMAP3 erratum (i671), ROM code adds extra latencies while restoring CM_AUTOIDLE_PLL register, if AUTO_PERIPH_DPLL is equal to 1. This patch stores 0's in scratchpad content area corresponding to AUTO_PERIPH_DPLL, to prevent ROM code to try to lock per DPLL, since it won't respect proper programing scheme. This register is then stored in prcm context. The saving and restore is now done by kernel side. Here follow the erratum description DESCRIPTION After OFF mode transition, among many restorations, the ROM Code restores the CM_AUTOIDLE_PLL register, and after that, it tries to relock the PER DPLL. In case the restoration data stored in scratchpad memory contains a field CM_AUTOIDLE_PLL.AUTO_PERIPH_DPLL = 1, then the way the ROM Code restores and locks the PER DPLL does not respect the PER DPLL programming scheme. In that case, the DPLL might not lock. Meanwhile, when trying to lock the PER DPLL, the ROM Code does not hang. Only extra latencies are introduced at wake-up. WORKAROUND When saving the context-restore structure in scratchpad memory, in order to respect the PER DPLL programming scheme, it is advised to store 0 in the CM_AUTOIDLE_PLL.AUTO_PERIPH_DPLL field of the saved structure. After wake-up, the application should store in CM_AUTOIDLE_PLL register the right desired value. Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>
| | | * | | OMAP2+: PM: Fix the saving of CM_AUTOIDLE_PLL register on scratchpad areaEduardo Valentin2011-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The saving of CCR.CM_AUTOIDLE_PLL is done in scratchpad area. However, in current code, the saving is done for CM_AUTOIDLE2_PLL (offset 0x34) instead of CM_AUTOIDLE_PLL (offset 0x30). This patch changes the code to save the correct register. Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>
| | | * | | OMAP4: clock data: Change DSS clock aliasesTomi Valkeinen2011-04-201-7/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DSS driver has used fck and ick clocks on OMAP2/3 to get DSS HW up and running, and also to get the pixel clock's source clock rate from the fck. On OMAP4 the clock data is set up in a different way, as there's no ick, dss_fck points to a fake clock which just affects DSS's MODULEMODE, and dss_dss_clk if the DSS_FCK. >From DSS driver's point of view the dss_fck sounds like an ick, and dss_dss_clk is the fck. While this is not entirely correct from HW point of view, especially for the ick, configuring the clock aliases that way makes DSS "just work" with OMAP4's clock setup. In the (hopefully near) future DSS driver will be reworked to use pm_runtime support which should clean up the clock code. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Benoît Cousson <b-cousson@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>
| | | * | | OMAP2+: hwmod data: Fix wrong dma_system end addressBenoit Cousson2011-04-194-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OMAP2420, 2430 and 3xxx were using the OMAP4 end address that unfortunately is not located at the same base address. Moreover the OMAP4 size was set to 256 instead of 4096. Change all .pa_end to set them to .pa_start + 0xfff Cc: "G, Manjunath Kondaiah" <manjugk@ti.com> Cc: Benoit Cousson <b-cousson@ti.com> Reported-by: Michael Fillinger <m-fillinger@ti.com> Signed-off-by: Benoit Cousson <b-cousson@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>
| | * | | | OMAP3+: voltage: remove initial voltageNishanth Menon2011-04-261-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blindly setting 1.2V in the initial structure may not even match the default voltages stored in the voltage table which are supported for the domain. For example, OMAP3430 core domain does not use 1.2V and ends up generating a warning on the first transition. Further, since omap2_set_init_voltage is called as part of the pm framework's initialization sequence to configure the voltage required for the current OPP, the call does(and has to) setup the system voltage(curr_volt as a result) using the right mechanisms appropriate for the system at that point of time. This also overrides initialization we are currently doing in voltage.c making it redundant. So, remove the wrong and redundant initialization. Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | | | OMAP4: Intialize IVA Device in addition to DSP device.Shweta Gulati2011-04-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OMAP4 has two different Devices IVA and DSP. DSP is bound with IVA for DVFS. The registration of IVA dev in API 'omap2_init_processor_devices' was missing. Init dev for 'iva_dev' is added. This also fixes the following error seen during boot as omap2_set_init_voltage can now find the iva device omap2_set_init_voltage: Invalid parameters! omap2_set_init_voltage: Unable to put vdd_iva to its init voltage Signed-off-by: Shweta Gulati <shweta.gulati@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | | | omap: rx51: mark reserved memory earlierFelipe Contreras2011-04-261-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So that omap_vram_set_sdram_vram() is called before omap_vram_reserve_sdram_memblock(). Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Acked-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | | | OMAP3: l3: fix for "irq 10: nobody cared" messageomar ramirez2011-04-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an error occurs in the L3 on any other initiator than MPU, the interrupt goes unhandled given that the 'base' register was calculated with the initialized err_source value (which coincidentally points to MPU) and not with the actual source of the error. Removed parenthesis that are not needed for the touched lines. Signed-off-by: Omar Ramirez Luna <omar.ramirez@ti.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| | * | | | arm: omap2: enable smc instruction for sleep34xxOskar Andero2011-04-261-1/+1
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes broken build when using binutils 2.21. Signed-off-by: Oskar Andero <oskar.andero@sonyericsson.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
| * | | | Merge branch 'for-2.6.39' of ↵Linus Torvalds2011-04-281-0/+2
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k/mm: Set all online nodes in N_NORMAL_MEMORY
| | * | | | m68k/mm: Set all online nodes in N_NORMAL_MEMORYMichael Schmitz2011-04-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For m68k, N_NORMAL_MEMORY represents all nodes that have present memory since it does not support HIGHMEM. This patch sets the bit at the time node_present_pages has been set by free_area_init_node. At the time the node is brought online, the node state would have to be done unconditionally since information about present memory has not yet been recorded. If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it uses this nodemask to setup per-cache kmem_cache_node data structures. This pach is an alternative to the one proposed by David Rientjes <rientjes@google.com> attempting to set node state immediately when bringing the node online. Signed-off-by: Michael Schmitz <schmitz@debian.org> Tested-by: Thorsten Glaser <tg@debian.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> CC: stable@kernel.org
| * | | | | um: adjust current_thread_info() for newer gcc versionsRichard Weinberger2011-04-281-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases gcc >= 4.5.2 will optimize away current_thread_info(). To prevent gcc from doing so the stack address has to be obtained via inline asm. Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>