summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel
Commit message (Collapse)AuthorAgeFilesLines
* kvm: kvmclock: use get_cpu() and put_cpu()Tiejun Chen2014-11-031-11/+8Star
| | | | | | | | | We can use get_cpu() and put_cpu() to replace preempt_disable()/cpu = smp_processor_id() and preempt_enable() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2014-10-3110-28/+31
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fixes from all around the place: - hyper-V 32-bit PAE guest kernel fix - two IRQ allocation fixes on certain x86 boards - intel-mid boot crash fix - intel-quark quirk - /proc/interrupts duplicate irq chip name fix - cma boot crash fix - syscall audit fix - boot crash fix with certain TSC configurations (seen on Qemu) - smpboot.c build warning fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi() x86, intel-mid: Create IRQs for APB timers and RTC timers x86: Don't enable F00F workaround on Intel Quark processors x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts x86, cma: Reserve DMA contiguous area after initmem_init() i386/audit: stop scribbling on the stack frame x86, apic: Handle a bad TSC more gracefully x86: ACPI: Do not translate GSI number if IOAPIC is disabled x86/smpboot: Move data structure to its primary usage scope
| * ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()Jiang Liu2014-10-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function mp_register_gsi() returns blindly the GSI number for the ACPI SCI interrupt. That causes a regression when the GSI for ACPI SCI is shared with other devices. The regression was caused by commit 84245af7297ced9e8fe "x86, irq, ACPI: Change __acpi_register_gsi to return IRQ number instead of GSI" and exposed on a SuperMicro system, which shares one GSI between ACPI SCI and PCI device, with following failure: http://sourceforge.net/p/linux1394/mailman/linux1394-user/?viewmonth=201410 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low level) [ 2.699224] firewire_ohci 0000:06:00.0: failed to allocate interrupt 20 Return mp_map_gsi_to_irq(gsi, 0) instead of the GSI number. Reported-and-Tested-by: Daniel Robbins <drobbins@funtoo.org> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: <stable@vger.kernel.org> # 3.17 Link: http://lkml.kernel.org/r/1414387308-27148-4-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86, intel-mid: Create IRQs for APB timers and RTC timersJiang Liu2014-10-291-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel MID platforms has no legacy interrupts, so no IRQ descriptors preallocated. We need to call mp_map_gsi_to_irq() to create IRQ descriptors for APB timers and RTC timers, otherwise it may cause invalid memory access as: [ 0.116839] BUG: unable to handle kernel NULL pointer dereference at 0000003a [ 0.123803] IP: [<c1071c0e>] setup_irq+0xf/0x4d Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: <stable@vger.kernel.org> # 3.17 Link: http://lkml.kernel.org/r/1414387308-27148-3-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86: Don't enable F00F workaround on Intel Quark processorsDave Jones2014-10-291-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | The Intel Quark processor is a part of family 5, but does not have the F00F bug present in Pentiums of the same family. Pentiums were models 0 through 8, Quark is model 9. Signed-off-by: Dave Jones <davej@redhat.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Link: http://lkml.kernel.org/r/20141028175753.GA12743@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/irq: Fix XT-PIC-XT-PIC in /proc/interruptsMaciej W. Rozycki2014-10-282-4/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems that make use of 8259A Programmable Interrupt Controllers. Specifically convert output like this: CPU0 0: 76573 XT-PIC-XT-PIC timer 1: 11 XT-PIC-XT-PIC i8042 2: 0 XT-PIC-XT-PIC cascade 4: 8 XT-PIC-XT-PIC serial 6: 3 XT-PIC-XT-PIC floppy 7: 0 XT-PIC-XT-PIC parport0 8: 1 XT-PIC-XT-PIC rtc0 10: 448 XT-PIC-XT-PIC fddi0 12: 23 XT-PIC-XT-PIC eth0 14: 2464 XT-PIC-XT-PIC ide0 NMI: 0 Non-maskable interrupts ERR: 0 to one like this: CPU0 0: 122033 XT-PIC timer 1: 11 XT-PIC i8042 2: 0 XT-PIC cascade 4: 8 XT-PIC serial 6: 3 XT-PIC floppy 7: 0 XT-PIC parport0 8: 1 XT-PIC rtc0 10: 145 XT-PIC fddi0 12: 31 XT-PIC eth0 14: 2245 XT-PIC ide0 NMI: 0 Non-maskable interrupts ERR: 0 that is one like we used to have from ~2.2 till it was changed sometime. The rationale is there is no value in this duplicate information, it merely clutters output and looks ugly. We only have one handler for 8259A interrupts so there is no need to give it a name separate from the name already given to irq_chip. We could define meaningful names for handlers based on bits in the ELCR register on systems that have it or the value of the LTIM bit we use in ICW1 otherwise (hardcoded to 0 though with MCA support gone), to tell edge-triggered and level-triggered inputs apart. While that information does not affect 8259A interrupt handlers it could help people determine which lines are shareable and which are not. That is material for a separate change though. Any tools that parse /proc/interrupts are supposed not to be affected since it was many years we used the format this change converts back to. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86, cma: Reserve DMA contiguous area after initmem_init()Weijie Yang2014-10-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fengguang Wu reported a boot crash on the x86 platform via the 0-day Linux Kernel Performance Test: cma: dma_contiguous_reserve: reserving 31 MiB for global area BUG: Int 6: CR2 (null) [<41850786>] dump_stack+0x16/0x18 [<41d2b1db>] early_idt_handler+0x6b/0x6b [<41072227>] ? __phys_addr+0x2e/0xca [<41d4ee4d>] cma_declare_contiguous+0x3c/0x2d7 [<41d6d359>] dma_contiguous_reserve_area+0x27/0x47 [<41d6d4d1>] dma_contiguous_reserve+0x158/0x163 [<41d33e0f>] setup_arch+0x79b/0xc68 [<41d2b7cf>] start_kernel+0x9c/0x456 [<41d2b2ca>] i386_start_kernel+0x79/0x7d (See details at: https://lkml.org/lkml/2014/10/8/708) It is because dma_contiguous_reserve() is called before initmem_init() in x86, the variable high_memory is not initialized but accessed by __pa(high_memory) in dma_contiguous_reserve(). This patch moves dma_contiguous_reserve() after initmem_init() so that high_memory is initialized before accessed. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: iamjoonsoo.kim@lge.com Cc: 'Linux-MM' <linux-mm@kvack.org> Cc: 'Weijie Yang' <weijie.yang.kh@gmail.com> Link: http://lkml.kernel.org/r/000101cfef69%2431e528a0%2495af79e0%24%25yang@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * i386/audit: stop scribbling on the stack frameEric Paris2014-10-241-8/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git commit b4f0d3755c5e9cc86292d5fd78261903b4f23d4a was very very dumb. It was writing over %esp/pt_regs semi-randomly on i686 with the expected "system can't boot" results. As noted in: https://bugs.freedesktop.org/show_bug.cgi?id=85277 This patch stops fscking with pt_regs. Instead it sets up the registers for the call to __audit_syscall_entry in the most obvious conceivable way. It then does just a tiny tiny touch of magic. We need to get what started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp). This is as easy as a pair of pushes. After the call to __audit_syscall_entry all we need to do is get that now useless junk off the stack (pair of pops) and reload %eax with the original syscall so other stuff can keep going about it's business. Reported-by: Paulo Zanoni <przanoni@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com> Link: http://lkml.kernel.org/r/1414037043-30647-1-git-send-email-eparis@redhat.com Cc: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * Merge tag 'v3.18-rc1' into x86/urgentH. Peter Anvin2014-10-2423-132/+144
| |\ | | | | | | | | | | | | | | | | | | Reason: Need to apply audit patch on top of v3.18-rc1. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | x86, apic: Handle a bad TSC more gracefullyAndy Lutomirski2014-10-222-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the TSC is unusable or disabled, then this patch fixes: - Confusion while trying to clear old APIC interrupts. - Division by zero and incorrect programming of the TSC deadline timer. This fixes boot if the CPU has a TSC deadline timer but a missing or broken TSC. The failure to boot can be observed with qemu using -cpu qemu64,-tsc,+tsc-deadline This also happens to me in nested KVM for unknown reasons. With this patch, I can boot cleanly (although without a TSC). Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Bandan Das <bsd@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86: ACPI: Do not translate GSI number if IOAPIC is disabledJiang Liu2014-10-201-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When IOAPIC is disabled, acpi_gsi_to_irq() should return gsi directly instead of calling mp_map_gsi_to_irq() to translate gsi to IRQ by IOAPIC. It fixes https://bugzilla.kernel.org/show_bug.cgi?id=84381. This regression was introduced with commit 6b9fb7082409 "x86, ACPI, irq: Consolidate algorithm of mapping (ioapic, pin) to IRQ number" Reported-and-Tested-by: Thomas Richter <thor@math.tu-berlin.de> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Richter <thor@math.tu-berlin.de> Cc: rui.zhang@intel.com Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org> # 3.17 Link: http://lkml.kernel.org/r/1413816327-12850-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/smpboot: Move data structure to its primary usage scopeIngo Molnar2014-10-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Makes the code more readable by moving variable and usage closer to each other, which also avoids this build warning in the !CONFIG_HOTPLUG_CPU case: arch/x86/kernel/smpboot.c:105:42: warning: ‘die_complete’ defined but not used [-Wunused-variable] Cc: Prarit Bhargava <prarit@redhat.com> Cc: Lan Tianyu <tianyu.lan@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: srostedt@redhat.com Cc: toshi.kani@hp.com Cc: imammedo@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409039025-32310-1-git-send-email-tianyu.lan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | perf/x86/intel: Revert incomplete and undocumented Broadwell client supportIngo Molnar2014-10-293-181/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These patches: 86a349a28b24 ("perf/x86/intel: Add Broadwell core support") c46e665f0377 ("perf/x86: Add INST_RETIRED.ALL workarounds") fdda3c4aacec ("perf/x86/intel: Use Broadwell cache event list for Haswell") introduced magic constants and unexplained changes: https://lkml.org/lkml/2014/10/28/1128 https://lkml.org/lkml/2014/10/27/325 https://lkml.org/lkml/2014/8/27/546 https://lkml.org/lkml/2014/10/28/546 Peter Zijlstra has attempted to help out, to clean up the mess: https://lkml.org/lkml/2014/10/28/543 But has not received helpful and constructive replies which makes me doubt wether it can all be finished in time until v3.18 is released. Despite various review feedback the author (Andi Kleen) has answered only few of the review questions and has generally been uncooperative, only giving replies when prompted repeatedly, and only giving minimal answers instead of constructively explaining and helping along the effort. That kind of behavior is not acceptable. There's also a boot crash on Intel E5-1630 v3 CPUs reported for another commit from Andi Kleen: e735b9db12d7 ("perf/x86/intel/uncore: Add Haswell-EP uncore support") https://lkml.org/lkml/2014/10/22/730 Which is not yet resolved. The uncore driver is independent in theory, but the crash makes me worry about how well all these patches were tested and makes me uneasy about the level of interminging that the Broadwell and Haswell code has received by the commits above. As a first step to resolve the mess revert the Broadwell client commits back to the v3.17 version, before we run out of time and problematic code hits a stable upstream kernel. ( If the Haswell-EP crash is not resolved via a simple fix then we'll have to revert the Haswell-EP uncore driver as well. ) The Broadwell client series has to be submitted in a clean fashion, with single, well documented changes per patch. If they are submitted in time and are accepted during review then they can possibly go into v3.19 but will need additional scrutiny due to the rocky history of this patch set. Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | perf/x86: Fix compile warnings for intel_uncorePeter Zijlstra2014-10-281-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The uncore drivers require PCI and generate compile time warnings when !CONFIG_PCI. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | perf: Fix bogus kernel printkPeter Zijlstra (Intel)2014-10-281-2/+3
| |/ |/| | | | | | | | | | | | | | | | | | | | | Andy spotted the fail in what was intended as a conditional printk level. Reported-by: Andy Lutomirski <luto@amacapital.net> Fixes: cc6cd47e7395 ("perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141007124757.GH19379@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge git://git.infradead.org/users/eparis/auditLinus Torvalds2014-10-202-8/+7Star
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull audit updates from Eric Paris: "So this change across a whole bunch of arches really solves one basic problem. We want to audit when seccomp is killing a process. seccomp hooks in before the audit syscall entry code. audit_syscall_entry took as an argument the arch of the given syscall. Since the arch is part of what makes a syscall number meaningful it's an important part of the record, but it isn't available when seccomp shoots the syscall... For most arch's we have a better way to get the arch (syscall_get_arch) So the solution was two fold: Implement syscall_get_arch() everywhere there is audit which didn't have it. Use syscall_get_arch() in the seccomp audit code. Having syscall_get_arch() everywhere meant it was a useless flag on the stack and we could get rid of it for the typical syscall entry. The other changes inside the audit system aren't grand, fixed some records that had invalid spaces. Better locking around the task comm field. Removing some dead functions and structs. Make some things static. Really minor stuff" * git://git.infradead.org/users/eparis/audit: (31 commits) audit: rename audit_log_remove_rule to disambiguate for trees audit: cull redundancy in audit_rule_change audit: WARN if audit_rule_change called illegally audit: put rule existence check in canonical order next: openrisc: Fix build audit: get comm using lock to avoid race in string printing audit: remove open_arg() function that is never used audit: correct AUDIT_GET_FEATURE return message type audit: set nlmsg_len for multicast messages. audit: use union for audit_field values since they are mutually exclusive audit: invalid op= values for rules audit: use atomic_t to simplify audit_serial() kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0] audit: reduce scope of audit_log_fcaps audit: reduce scope of audit_net_id audit: arm64: Remove the audit arch argument to audit_syscall_entry arm64: audit: Add audit hook in syscall_trace_enter/exit() audit: x86: drop arch from __audit_syscall_entry() interface sparc: implement is_32bit_task sparc: properly conditionalize use of TIF_32BIT ...
| * | audit: x86: drop arch from __audit_syscall_entry() interfaceRichard Guy Briggs2014-09-232-12/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the arch is found locally in __audit_syscall_entry(), there is no need to pass it in as a parameter. Delete it from the parameter list. x86* was the only arch to call __audit_syscall_entry() directly and did so from assembly code. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-audit@redhat.com Signed-off-by: Eric Paris <eparis@redhat.com> --- As this patch relies on changes in the audit tree, I think it appropriate to send it through my tree rather than the x86 tree.
| * | ARCH: AUDIT: audit_syscall_entry() should not require the archEric Paris2014-09-231-6/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a function where the arch can be queried, syscall_get_arch(). So rather than have every single piece of arch specific code use and/or duplicate syscall_get_arch(), just have the audit code use the syscall_get_arch() code. Based-on-patch-by: Richard Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> Cc: linux-alpha@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-ia64@vger.kernel.org Cc: microblaze-uclinux@itee.uq.edu.au Cc: linux-mips@linux-mips.org Cc: linux@lists.openrisc.net Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: user-mode-linux-devel@lists.sourceforge.net Cc: linux-xtensa@linux-xtensa.org Cc: x86@kernel.org
* | | Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds2014-10-1519-112/+112
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
| * | | percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fixMel Gorman2014-09-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A commit in linux-next was causing boot to fail and bisection identified the patch 4ba2968420fa ("percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_"). One of the changes in that patch looks very suspicious. Reverting the full patch fixes boot as does this fixlet. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com>
| * | | percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_tChristoph Lameter2014-08-281-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __get_cpu_var can paper over differences in the definitions of cpumask_var_t and either use the address of the cpumask variable directly or perform a fetch of the address of the struct cpumask allocated elsewhere. This is important particularly when using per cpu cpumask_var_t declarations because in one case we have an offset into a per cpu area to handle and in the other case we need to fetch a pointer from the offset. This patch introduces a new macro this_cpu_cpumask_var_ptr() that is defined where cpumask_var_t is defined and performs the proper actions. All use cases where __get_cpu_var is used with cpumask_var_t are converted to the use of this_cpu_cpumask_var_ptr(). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | x86: Replace __get_cpu_var usesChristoph Lameter2014-08-2618-111/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | | Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds2014-10-144-12/+25
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge second patch-bomb from Andrew Morton: - a few hotfixes - drivers/dma updates - MAINTAINERS updates - Quite a lot of lib/ updates - checkpatch updates - binfmt updates - autofs4 - drivers/rtc/ - various small tweaks to less used filesystems - ipc/ updates - kernel/watchdog.c changes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits) mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h> ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[] frv: remove unused declarations of __start___ex_table and __stop___ex_table kvm: ensure hard lockup detection is disabled by default kernel/watchdog.c: control hard lockup detection default staging: rtl8192u: use %*pEn to escape buffer staging: rtl8192e: use %*pEn to escape buffer staging: wlan-ng: use %*pEhp to print SN lib80211: remove unused print_ssid() wireless: hostap: proc: print properly escaped SSID wireless: ipw2x00: print SSID via %*pE wireless: libertas: print esaped string via %*pE lib/vsprintf: add %*pE[achnops] format specifier lib / string_helpers: introduce string_escape_mem() lib / string_helpers: refactoring the test suite lib / string_helpers: move documentation to c-file include/linux: remove strict_strto* definitions arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable fs: check bh blocknr earlier when searching lru ...
| * | | kvm: ensure hard lockup detection is disabled by defaultUlrich Obergfell2014-10-141-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use watchdog_enable_hardlockup_detector() to set hard lockup detection's default value to false. It's risky to run this detection in a guest, as false positives are easy to trigger, especially if the host is overcommitted. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | arch/x86/kernel/cpu/common.c: fix unused symbol warningAndrew Morton2014-10-141-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86_64 allnoconfig: arch/x86/kernel/cpu/common.c:968: warning: 'syscall32_cpu_init' defined but not used Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | kexec-bzimage64: fix sparse warningsVivek Goyal2014-10-141-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David Howells brought to my attention the mails generated by kbuild test bot and following sparse warnings were present. This patch fixes these warnings. arch/x86/kernel/kexec-bzimage64.c:270:5: warning: symbol 'bzImage64_probe' was not declared. Should it be static? arch/x86/kernel/kexec-bzimage64.c:328:6: warning: symbol 'bzImage64_load' was not declared. Should it be static? arch/x86/kernel/kexec-bzimage64.c:517:5: warning: symbol 'bzImage64_cleanup' was not declared. Should it be static? arch/x86/kernel/kexec-bzimage64.c:531:5: warning: symbol 'bzImage64_verify_sig' was not declared. Should it be static? arch/x86/kernel/kexec-bzimage64.c:546:23: warning: symbol 'kexec_bzImage64_ops' was not declared. Should it be static? Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | kexec: check if crashk_res_low exists when exclude it from crash mem rangesBaoquan He2014-10-141-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a check if crashk_res_low exists just like GART region does. If crashk_res_low doesn't exist, calling exclude_mem_range is unnecessary. Meanwhile, since crashk_res_low has been initialized at definition, it's safe just use "if (crashk_low_res.end)" to check if it's exist. And this can make it consistent with other places of check. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| | | |
| \ \ \
| \ \ \
| \ \ \
*---. \ \ \ Merge branches 'x86-ras-for-linus', 'x86-uv-for-linus' and ↵Linus Torvalds2014-10-142-4/+2Star
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 ras, uv and vdso fixlets from Ingo Molnar: "ras: tone down a kernel message to only occur during initial bootup, not during suspend/resume cycles. uv: a cleanup commit vdso: a fix to error checking" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Avoid showing repetitive message from intel_init_thermal() * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/uv: Remove unnecessary #ifdef * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Fix vdso2c's special_pages[] error checking
| | * | | | | x86/apic/uv: Remove unnecessary #ifdefAndreas Ruprecht2014-08-201-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the file x2apic_uv_x.c, some code is compiled conditionally depending on CONFIG_SMP. However, the file is only built, if CONFIG_X86_UV is enabled. CONFIG_X86_UV depends on CONFIG_NUMA, which itself depends on CONFIG_SMP, so the #ifdef will always evaluate to true, if the file is compiled. Thus, it is unnecessary and can be removed. Signed-off-by: Andreas Ruprecht <rupran@einserver.de> Cc: David Rientjes <rientjes@google.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Hedi Berriche <hedi@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <travis@sgi.com> Cc: Russ Anderson <rja@sgi.com> Link: http://lkml.kernel.org/r/1408522561-23389-1-git-send-email-rupran@einserver.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | x86/mce: Avoid showing repetitive message from intel_init_thermal()Rakib Mullick2014-09-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intel_init_thermal() is called from a) at the time of system initializing and b) at the time of system resume to initialize thermal monitoring. In case when thermal monitoring is handled by SMI, we get to know it via printk(). Currently it gives the message at both cases, but its okay if we get it only once and no need to get the same message at every time system resumes. So, limit showing this message only at system boot time by avoid showing at system resume and reduce abusing kernel log buffer. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1411068135.5121.10.camel@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2014-10-142-2/+2
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc smaller fixes that missed the v3.17 cycle" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Add arch/x86/purgatory/ make generated files to gitignore x86: Fix section conflict for numachip x86: Reject x32 executables if x32 ABI not supported x86_64, entry: Filter RFLAGS.NT on entry from userspace x86, boot, kaslr: Fix nuisance warning on 32-bit builds
| * | | | | | | x86: Fix section conflict for numachipAndi Kleen2014-10-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A variable cannot be both __read_mostly and const. This is a meaningless combination. Just make it only const. This fixes the LTO build with numachip enabled. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1411533139-25708-1-git-send-email-andi@firstfloor.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | x86_64, entry: Filter RFLAGS.NT on entry from userspaceAndy Lutomirski2014-10-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NT flag doesn't do anything in long mode other than causing IRET to #GP. Oddly, CPL3 code can still set NT using popf. Entry via hardware or software interrupt clears NT automatically, so the only relevant entries are fast syscalls. If user code causes kernel code to run with NT set, then there's at least some (small) chance that it could cause trouble. For example, user code could cause a call to EFI code with NT set, and who knows what would happen? Apparently some games on Wine sometimes do this (!), and, if an IRET return happens, they will segfault. That segfault cannot be handled, because signal delivery fails, too. This patch programs the CPU to clear NT on entry via SYSCALL (both 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT in software on entry via SYSENTER. To save a few cycles, this borrows a trick from Jan Beulich in Xen: it checks whether NT is set before trying to clear it. As a result, it seems to have very little effect on SYSENTER performance on my machine. There's another minor bug fix in here: it looks like the CFI annotations were wrong if CONFIG_AUDITSYSCALL=n. Testers beware: on Xen, SYSENTER with NT set turns into a GPF. I haven't touched anything on 32-bit kernels. The syscall mask change comes from a variant of this patch by Anish Bhatt. Note to stable maintainers: there is no known security issue here. A misguided program can set NT and cause the kernel to try and fail to deliver SIGSEGV, crashing the program. This patch fixes Far Cry on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275 Cc: <stable@vger.kernel.org> Reported-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | | | | | Merge branch 'x86-seccomp-for-linus' of ↵Linus Torvalds2014-10-143-63/+155
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 seccomp changes from Ingo Molnar: "This tree includes x86 seccomp filter speedups and related preparatory work, which touches core seccomp facilities as well. The main idea is to split seccomp into two phases, to be able to enter a simple fast path for syscalls with ptrace side effects. There's no substantial user-visible (and ABI) effects expected from this, except a change in how we emit a better audit record for SECCOMP_RET_TRACE events" * 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls x86: Split syscall_trace_enter into two phases x86, entry: Only call user_exit if TIF_NOHZ x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit seccomp: Document two-phase seccomp and arch-provided seccomp_data seccomp: Allow arch code to provide seccomp_data seccomp: Refactor the filter callback and the API seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
| * | | | | | | | x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscallsAndy Lutomirski2014-09-081-23/+15Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On KVM on my box, this reduces the overhead from an always-accept seccomp filter from ~130ns to ~17ns. Most of that comes from avoiding IRET on every syscall when seccomp is enabled. In extremely approximate hacked-up benchmarking, just bypassing IRET saves about 80ns, so there's another 43ns of savings here from simplifying the seccomp path. The diffstat is also rather nice :) Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/a3dbd267ee990110478d349f78cccfdac5497a84.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | | | | | | | x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscallsAndy Lutomirski2014-09-081-9/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For slowpath syscalls, we initialize regs->ax to -ENOSYS and stick the syscall number into regs->orig_ax prior to any possible tracing and syscall execution. This is user-visible ABI used by ptrace syscall emulation and seccomp. For fastpath syscalls, there's no good reason not to do the same thing. It's even slightly simpler than what we're currently doing. It probably has no measureable performance impact. It should have no user-visible effect. The purpose of this patch is to prepare for two-phase syscall tracing, in which the first phase might modify the saved RAX without leaving the fast path. This change is just subtle enough that I'm keeping it separate. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/01218b493f12ae2f98034b78c9ae085e38e94350.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | | | | | | | x86: Split syscall_trace_enter into two phasesAndy Lutomirski2014-09-081-24/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This splits syscall_trace_enter into syscall_trace_enter_phase1 and syscall_trace_enter_phase2. Only phase 2 has full pt_regs, and only phase 2 is permitted to modify any of pt_regs except for orig_ax. The intent is that phase 1 can be called from the syscall fast path. In this implementation, phase1 can handle any combination of TIF_NOHZ (RCU context tracking), TIF_SECCOMP, and TIF_SYSCALL_AUDIT, unless seccomp requests a ptrace event, in which case phase2 is forced. In principle, this could yield a big speedup for TIF_NOHZ as well as for TIF_SECCOMP if syscall exit work were similarly split up. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2df320a600020fda055fccf2b668145729dd0c04.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | | | | | | | x86, entry: Only call user_exit if TIF_NOHZAndy Lutomirski2014-09-081-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RCU context tracking code requires that arch code call user_exit() on any entry into kernel code if TIF_NOHZ is set. This patch adds a check for TIF_NOHZ and a comment to the syscall entry tracing code. The main purpose of this patch is to make the code easier to follow: one can read the body of user_exit and of every function it calls without finding any explanation of why it's called for traced syscalls but not for untraced syscalls. This makes it clear when user_exit() is necessary. Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/0b13e0e24ec0307d67ab7a23b58764f6b1270116.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | | | | | | | x86, x32, audit: Fix x32's AUDIT_ARCH wrt auditAndy Lutomirski2014-09-081-10/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is_compat_task() is the wrong check for audit arch; the check should be is_ia32_task(): x32 syscalls should be AUDIT_ARCH_X86_64, not AUDIT_ARCH_I386. CONFIG_AUDITSYSCALL is currently incompatible with x32, so this has no visible effect. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/a0138ed8c709882aec06e4acc30bfa9b623b8717.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | | | | | | | seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computingAndy Lutomirski2014-09-032-2/+2
| | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The secure_computing function took a syscall number parameter, but it only paid any attention to that parameter if seccomp mode 1 was enabled. Rather than coming up with a kludge to get the parameter to work in mode 2, just remove the parameter. To avoid churn in arches that don't have seccomp filters (and may not even support syscall_get_nr right now), this leaves the parameter in secure_computing_strict, which is now a real function. For ARM, this is a bit ugly due to the fact that ARM conditionally supports seccomp filters. Fixing that would probably only be a couple of lines of code, but it should be coordinated with the audit maintainers. This will be a slight slowdown on some arches. The right fix is to pass in all of seccomp_data instead of trying to make just the syscall nr part be fast. This is a prerequisite for making two-phase seccomp work cleanly. Cc: Russell King <linux@arm.linux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: x86@kernel.org Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Kees Cook <keescook@chromium.org>
* | | | | | | | Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds2014-10-145-5/+156
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "The main changes in this tree are: - fix and update Intel Quark [Galileo] SoC platform support - update IOSF chipset side band interface and make it available via debugfs - enable HPETs on Soekris net6501 and other e6xx based systems" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache() x86: Quark: Comment setup_arch() to document TLB/PGE bug x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead x86/platform/intel/iosf: Add debugfs config option for IOSF x86/platform/intel/iosf: Add better description of IOSF driver in config x86/platform/intel/iosf: Add Braswell PCI ID x86/platform/pmc_atom: Fix warning when CONFIG_DEBUG_FS=n x86: HPET force enable for e6xx based systems x86/iosf: Add debugfs support x86/iosf: Add Kconfig prompt for IOSF_MBI selection
| * | | | | | | | x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()Bryan O'Donoghue2014-10-081-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel processors which don't report cache information via cpuid(2) or cpuid(4) need quirk code in the legacy_cache_size callback to report this data. For Intel that callback is is intel_size_cache(). This patch enables calling of cpu_detect_cache_sizes() inside of init_intel() and hence the calling of the legacy_cache callback in intel_size_cache(). Adding this call will ensure that PIII Tualatin currently in intel_size_cache() and Quark SoC X1000 being added to intel_size_cache() in this patch will report their respective cache sizes. This model of calling cpu_detect_cache_sizes() is consistent with AMD/Via/Cirix/Transmeta and Centaur. Also added is a string to idenitfy the Quark as Quark SoC X1000 giving better and more descriptive output via /proc/cpuinfo Adding cpu_detect_cache_sizes to init_intel() will enable calling of intel_size_cache() on Intel processors which currently no code can reach. Therefore this patch will also re-enable reporting of PIII Tualatin cache size information as well as add Quark SoC X1000 support. Comment text and cache flow logic suggested by Thomas Gleixner Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: davej@redhat.com Cc: hmh@hmh.eng.br Link: http://lkml.kernel.org/r/1412641189-12415-3-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | x86: Quark: Comment setup_arch() to document TLB/PGE bugBryan O'Donoghue2014-10-081-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quark SoC X1000 advertises Page Global Enable for it's Translation Lookaside Buffer via cpuid. The silicon does not in fact support PGE and hence will not flush the TLB when CR4.PGE is rewritten. The Quark documentation makes clear the necessity to instead rewrite CR3 in order to flush any TLB entries, irrespective of the state of CR4.PGE or an individual PTE.PGE See Intel Quark Core DevMan_001.pdf section 6.4.11 In setup.c setup_arch() the code will load_cr3() and then do a __flush_tlb_all(). On Quark the entire TLB will be flushed at the load_cr3(). The __flush_tlb_all() have no effect and can be safely ignored. Later on in the boot process we switch off the flag for cpu_has_pge() which means that subsequent calls to __flush_tlb_all() will call __flush_tlb() not __flush_tlb_global() flushing the TLB in the correct way via load_cr3() not CR4.PGE rewrite This patch documents the behaviour of flushing the TLB for Quark in setup_arch() Comment text suggested by Thomas Gleixner Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: davej@redhat.com Cc: hmh@hmh.eng.br Link: http://lkml.kernel.org/r/1412641189-12415-2-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 insteadBryan O'Donoghue2014-09-241-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quark x1000 advertises PGE via the standard CPUID method PGE bits exist in Quark X1000's PTEs. In order to flush an individual PTE it is necessary to reload CR3 irrespective of the PTE.PGE bit. See Quark Core_DevMan_001.pdf section 6.4.11 This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to crash and burn on this platform. Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Borislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1411514784-14885-1-git-send-email-pure.logic@nexus-software.ie Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | x86/platform/intel/iosf: Add debugfs config option for IOSFDavid E. Box2014-09-191-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Makes the IOSF sideband available through debugfs. Allows developers to experiment with using the sideband to provide debug and analytical tools for units on the SoC. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1411017231-20807-4-git-send-email-david.e.box@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | x86/platform/intel/iosf: Add Braswell PCI IDDavid E. Box2014-09-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Braswell PCI ID to list of supported ID's for the IOSF driver. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1411017231-20807-2-git-send-email-david.e.box@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | x86/platform/pmc_atom: Fix warning when CONFIG_DEBUG_FS=nMartin Kelly2014-09-191-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling with CONFIG_DEBUG_FS=n, GCC emits an unused variable warning for pmc_atom.c because "ret" is used only within the CONFIG_DEBUG_FS block. This patch adds a dummy #ifdef for pmc_dbgfs_register() when CONFIG_DEBUG_FS=n to simplify the code and remove the warning. Signed-off-by: Martin Kelly <martkell@amazon.com> Acked-by: "Li, Aubrey" <aubrey.li@linux.intel.com> Cc: vishwesh.m.rudramuni@intel.com Link: http://lkml.kernel.org/r/1410963476-8360-1-git-send-email-martin@martingkelly.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | x86: HPET force enable for e6xx based systemsPeter Neubauer2014-09-161-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the Soekris net6501 and other e6xx based systems do not have any ACPI implementation, HPET won't get enabled. This patch enables HPET on such platforms. [ 0.430149] pci 0000:00:01.0: Force enabled HPET at 0xfed00000 [ 0.644838] HPET: 3 timers in total, 0 timers will be used for per-cpu timer Original patch by Peter Neubauer (http://www.mail-archive.com/soekris-tech@lists.soekris.com/msg06462.html) slightly modified by Conrad Kostecki <ck@conrad-kostecki.de> and massaged accoring to Thomas Gleixners <tglx@linutronix.de> by me. Suggested-by: Conrad Kostecki <ck@conrad-kostecki.de> Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de> Cc: Peter Neubauer <pneubauer@bluerwhite.org> Link: http://lkml.kernel.org/r/5412D3A5.2030909@lsexperts.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | | | x86/iosf: Add debugfs supportDavid E. Box2014-08-271-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allows access to the iosf sideband through debugfs. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1409175640-32426-3-git-send-email-david.e.box@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | | | | | | | | Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2014-10-141-4/+3Star
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "This tree includes the following changes: - fix memory hotplug - fix hibernation bootup memory layout assumptions - fix hyperv numa guest kernel messages - remove dead code - update documentation" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Update memory map description to list hypervisor-reserved area x86/mm, hibernate: Do not assume the first e820 area to be RAM x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data() x86/mm/hotplug: Modify PGD entry when removing memory x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable() x86: Remove set_pmd_pfn