summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* bpf: allocate local storage buffers using GFP_ATOMICRoman Gushchin2018-12-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 569a933b03f3c48b392fe67c0086b3a6b9306b5a ] Naresh reported an issue with the non-atomic memory allocation of cgroup local storage buffers: [ 73.047526] BUG: sleeping function called from invalid context at /srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421 [ 73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: test_cgroup_sto [ 73.068342] INFO: lockdep is turned off. [ 73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted 4.20.0-rc2-next-20181113 #1 [ 73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0b 07/27/2017 [ 73.088018] Call Trace: [ 73.090463] dump_stack+0x70/0xa5 [ 73.093783] ___might_sleep+0x152/0x240 [ 73.097619] __might_sleep+0x4a/0x80 [ 73.101191] __kmalloc_node+0x1cf/0x2f0 [ 73.105031] ? cgroup_storage_update_elem+0x46/0x90 [ 73.109909] cgroup_storage_update_elem+0x46/0x90 cgroup_storage_update_elem() (as well as other update map update callbacks) is called with disabled preemption, so GFP_ATOMIC allocation should be used: e.g. alloc_htab_elem() in hashtab.c. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* tracing/fgraph: Fix set_graph_function from showing interruptsSteven Rostedt (VMware)2018-12-084-3/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5cf99a0f3161bc3ae2391269d134d6bf7e26f00e upstream. The tracefs file set_graph_function is used to only function graph functions that are listed in that file (or all functions if the file is empty). The way this is implemented is that the function graph tracer looks at every function, and if the current depth is zero and the function matches something in the file then it will trace that function. When other functions are called, the depth will be greater than zero (because the original function will be at depth zero), and all functions will be traced where the depth is greater than zero. The issue is that when a function is first entered, and the handler that checks this logic is called, the depth is set to zero. If an interrupt comes in and a function in the interrupt handler is traced, its depth will be greater than zero and it will automatically be traced, even if the original function was not. But because the logic only looks at depth it may trace interrupts when it should not be. The recent design change of the function graph tracer to fix other bugs caused the depth to be zero while the function graph callback handler is being called for a longer time, widening the race of this happening. This bug was actually there for a longer time, but because the race window was so small it seldom happened. The Fixes tag below is for the commit that widen the race window, because that commit belongs to a series that will also help fix the original bug. Cc: stable@kernel.org Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack") Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* uprobes: Fix handle_swbp() vs. unregister() + register() race once moreAndrea Parri2018-12-081-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 09d3f015d1e1b4fee7e9bbdcf54201d239393391 upstream. Commit: 142b18ddc8143 ("uprobes: Fix handle_swbp() vs unregister() + register() race") added the UPROBE_COPY_INSN flag, and corresponding smp_wmb() and smp_rmb() memory barriers, to ensure that handle_swbp() uses fully-initialized uprobes only. However, the smp_rmb() is mis-placed: this barrier should be placed after handle_swbp() has tested for the flag, thus guaranteeing that (program-order) subsequent loads from the uprobe can see the initial stores performed by prepare_uprobe(). Move the smp_rmb() accordingly. Also amend the comments associated to the two memory barriers to indicate their actual locations. Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@kernel.org Fixes: 142b18ddc8143 ("uprobes: Fix handle_swbp() vs unregister() + register() race") Link: http://lkml.kernel.org/r/20181122161031.15179-1-andrea.parri@amarulasolutions.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* function_graph: Reverse the order of pushing the ret_stack and the callbackSteven Rostedt (VMware)2018-12-051-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | commit 7c6ea35ef50810aa12ab26f21cb858d980881576 upstream. The function graph profiler uses the ret_stack to store the "subtime" and reuse it by nested functions and also on the return. But the current logic has the profiler callback called before the ret_stack is updated, and it is just modifying the ret_stack that will later be allocated (it's just lucky that the "subtime" is not touched when it is allocated). This could also cause a crash if we are at the end of the ret_stack when this happens. By reversing the order of the allocating the ret_stack and then calling the callbacks attached to a function being traced, the ret_stack entry is no longer used before it is allocated. Cc: stable@kernel.org Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* function_graph: Move return callback before update of curr_ret_stackSteven Rostedt (VMware)2018-12-051-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 552701dd0fa7c3d448142e87210590ba424694a0 upstream. In the past, curr_ret_stack had two functions. One was to denote the depth of the call graph, the other is to keep track of where on the ret_stack the data is used. Although they may be slightly related, there are two cases where they need to be used differently. The one case is that it keeps the ret_stack data from being corrupted by an interrupt coming in and overwriting the data still in use. The other is just to know where the depth of the stack currently is. The function profiler uses the ret_stack to save a "subtime" variable that is part of the data on the ret_stack. If curr_ret_stack is modified too early, then this variable can be corrupted. The "max_depth" option, when set to 1, will record the first functions going into the kernel. To see all top functions (when dealing with timings), the depth variable needs to be lowered before calling the return hook. But by lowering the curr_ret_stack, it makes the data on the ret_stack still being used by the return hook susceptible to being overwritten. Now that there's two variables to handle both cases (curr_ret_depth), we can move them to the locations where they can handle both cases. Cc: stable@kernel.org Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* function_graph: Have profiler use curr_ret_stack and not depthSteven Rostedt (VMware)2018-12-051-2/+2
| | | | | | | | | | | | | | | | | | | | commit b1b35f2e218a5b57d03bbc3b0667d5064570dc60 upstream. The profiler uses trace->depth to find its entry on the ret_stack, but the depth may not match the actual location of where its entry is (if an interrupt were to preempt the processing of the profiler for another function, the depth and the curr_ret_stack will be different). Have it use the curr_ret_stack as the index to find its ret_stack entry instead of using the depth variable, as that is no longer guaranteed to be the same. Cc: stable@kernel.org Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stackSteven Rostedt (VMware)2018-12-052-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 39eb456dacb543de90d3bc6a8e0ac5cf51ac475e upstream. Currently, the depth of the ret_stack is determined by curr_ret_stack index. The issue is that there's a race between setting of the curr_ret_stack and calling of the callback attached to the return of the function. Commit 03274a3ffb44 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") moved the calling of the callback to after the setting of the curr_ret_stack, even stating that it was safe to do so, when in fact, it was the reason there was a barrier() there (yes, I should have commented that barrier()). Not only does the curr_ret_stack keep track of the current call graph depth, it also keeps the ret_stack content from being overwritten by new data. The function profiler, uses the "subtime" variable of ret_stack structure and by moving the curr_ret_stack, it allows for interrupts to use the same structure it was using, corrupting the data, and breaking the profiler. To fix this, there needs to be two variables to handle the call stack depth and the pointer to where the ret_stack is being used, as they need to change at two different locations. Cc: stable@kernel.org Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* function_graph: Make ftrace_push_return_trace() staticSteven Rostedt (VMware)2018-12-051-1/+1
| | | | | | | | | | | | | | | | | commit d125f3f866df88da5a85df00291f88f0baa89f7c upstream. As all architectures now call function_graph_enter() to do the entry work, no architecture should ever call ftrace_push_return_trace(). Make it static. This is needed to prepare for a fix of a design bug on how the curr_ret_stack is used. Cc: stable@kernel.org Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* function_graph: Create function_graph_enter() to consolidate architecture codeSteven Rostedt (VMware)2018-12-051-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | commit 8114865ff82e200b383e46821c25cb0625b842b5 upstream. Currently all the architectures do basically the same thing in preparing the function graph tracer on entry to a function. This code can be pulled into a generic location and then this will allow the function graph tracer to be fixed, as well as extended. Create a new function graph helper function_graph_enter() that will call the hook function (ftrace_graph_entry) and the shadow stack operation (ftrace_push_return_trace), and remove the need of the architecture code to manage the shadow stack. This is needed to prepare for a fix of a design bug on how the curr_ret_stack is used. Cc: stable@kernel.org Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRSThomas Gleixner2018-12-051-10/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 46f7ecb1e7359f183f5bbd1e08b90e10e52164f9 upstream The IBPB control code in x86 removed the usage. Remove the functionality which was introduced for this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/speculation: Rework SMT state changeThomas Gleixner2018-12-051-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream arch_smt_update() is only called when the sysfs SMT control knob is changed. This means that when SMT is enabled in the sysfs control knob the system is considered to have SMT active even if all siblings are offline. To allow finegrained control of the speculation mitigations, the actual SMT state is more interesting than the fact that siblings could be enabled. Rework the code, so arch_smt_update() is invoked from each individual CPU hotplug function, and simplify the update function while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sched/smt: Expose sched_smt_present static keyThomas Gleixner2018-12-051-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 321a874a7ef85655e93b3206d0f36b4a6097f948 upstream Make the scheduler's 'sched_smt_present' static key globaly available, so it can be used in the x86 speculation control code. Provide a query function and a stub for the CONFIG_SMP=n case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sched/smt: Make sched_smt_present track topologyPeter Zijlstra (Intel)2018-12-051-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c5511d03ec090980732e929c318a7a6374b5550e upstream Currently the 'sched_smt_present' static key is enabled when at CPU bringup SMT topology is observed, but it is never disabled. However there is demand to also disable the key when the topology changes such that there is no SMT present anymore. Implement this by making the key count the number of cores that have SMT enabled. In particular, the SMT topology bits are set before interrrupts are enabled and similarly, are cleared after interrupts are disabled for the last time and the CPU dies. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.246110444@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/speculation: Apply IBPB more strictly to avoid cross-process data leakJiri Kosina2018-12-051-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream Currently, IBPB is only issued in cases when switching into a non-dumpable process, the rationale being to protect such 'important and security sensitive' processess (such as GPG) from data leaking into a different userspace process via spectre v2. This is however completely insufficient to provide proper userspace-to-userpace spectrev2 protection, as any process can poison branch buffers before being scheduled out, and the newly scheduled process immediately becomes spectrev2 victim. In order to minimize the performance impact (for usecases that do require spectrev2 protection), issue the barrier only in cases when switching between processess where the victim can't be ptraced by the potential attacker (as in such cases, the attacker doesn't have to bother with branch buffers at all). [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably fine-grained ] Fixes: 18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch") Originally-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "WoodhouseDavid" <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: "SchauflerCasey" <casey.schaufler@intel.com> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigationJiri Kosina2018-12-051-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 53c613fe6349994f023245519265999eed75957f upstream STIBP is a feature provided by certain Intel ucodes / CPUs. This feature (once enabled) prevents cross-hyperthread control of decisions made by indirect branch predictors. Enable this feature if - the CPU is vulnerable to spectre v2 - the CPU supports SMT and has SMT siblings online - spectre_v2 mitigation autoselection is enabled (default) After some previous discussion, this leaves STIBP on all the time, as wrmsr on crossing kernel boundary is a no-no. This could perhaps later be a bit more optimized (like disabling it in NOHZ, experiment with disabling it in idle, etc) if needed. Note that the synchronization of the mask manipulation via newly added spec_ctrl_mutex is currently not strictly needed, as the only updater is already being serialized by cpu_add_remove_lock, but let's make this a little bit more future-proof. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "WoodhouseDavid" <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: "SchauflerCasey" <casey.schaufler@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* rcu: Make need_resched() respond to urgent RCU-QS needsPaul E. McKenney2018-12-011-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 92aa39e9dc77481b90cbef25e547d66cab901496 upstream. The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent need for an RCU quiescent state from the force-quiescent-state processing within the grace-period kthread to context switches and to cond_resched(). Unfortunately, such urgent needs are not communicated to need_resched(), which is sometimes used to decide when to invoke cond_resched(), for but one example, within the KVM vcpu_run() function. As of v4.15, this can result in synchronize_sched() being delayed by up to ten seconds, which can be problematic, to say nothing of annoying. This commit therefore checks rcu_dynticks.rcu_urgent_qs from within rcu_check_callbacks(), which is invoked from the scheduling-clock interrupt handler. If the current task is not an idle task and is not executing in usermode, a context switch is forced, and either way, the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current task is an idle task, then RCU's dyntick-idle code will detect the quiescent state, so no further action is required. Similarly, if the task is executing in usermode, other code in rcu_check_callbacks() and its called functions will report the corresponding quiescent state. Reported-by: Marius Hillenbrand <mhillenb@amazon.de> Reported-by: David Woodhouse <dwmw2@infradead.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Backported to make patch apply cleanly on older versions. ] Tested-by: Marius Hillenbrand <mhillenb@amazon.de> Cc: <stable@vger.kernel.org> # 4.12.x - 4.19.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kdb: Use strscpy with destination buffer sizePrarit Bhargava2018-12-013-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c2b94c72d93d0929f48157eef128c4f9d2e603ce ] gcc 8.1.0 warns with: kernel/debug/kdb/kdb_support.c: In function ‘kallsyms_symbol_next’: kernel/debug/kdb/kdb_support.c:239:4: warning: ‘strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=] strncpy(prefix_name, name, strlen(name)+1); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/debug/kdb/kdb_support.c:239:31: note: length computed here Use strscpy() with the destination buffer size, and use ellipses when displaying truncated symbols. v2: Use strscpy() Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Jonathan Toppins <jtoppins@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: kgdb-bugreport@lists.sourceforge.net Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* sched/fair: Fix cpu_util_wake() for 'execl' type workloadsPatrick Bellasi2018-12-011-14/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c469933e772132aad040bd6a2adc8edf9ad6f825 ] A ~10% regression has been reported for UnixBench's execl throughput test by Aaron Lu and Ye Xiaolong: https://lkml.org/lkml/2018/10/30/765 That test is pretty simple, it does a "recursive" execve() syscall on the same binary. Starting from the syscall, this sequence is possible: do_execve() do_execveat_common() __do_execve_file() sched_exec() select_task_rq_fair() <==| Task already enqueued find_idlest_cpu() find_idlest_group() capacity_spare_wake() <==| Functions not called from cpu_util_wake() | the wakeup path which means we can end up calling cpu_util_wake() not only from the "wakeup path", as its name would suggest. Indeed, the task doing an execve() syscall is already enqueued on the CPU we want to get the cpu_util_wake() for. The estimated utilization for a CPU computed in cpu_util_wake() was written under the assumption that function can be called only from the wakeup path. If instead the task is already enqueued, we end up with a utilization which does not remove the current task's contribution from the estimated utilization of the CPU. This will wrongly assume a reduced spare capacity on the current CPU and increase the chances to migrate the task on execve. The regression is tracked down to: commit d519329f72a6 ("sched/fair: Update util_est only on util_avg updates") because in that patch we turn on by default the UTIL_EST sched feature. However, the real issue is introduced by: commit f9be3e5961c5 ("sched/fair: Use util_est in LB and WU paths") Let's fix this by ensuring to always discount the task estimated utilization from the CPU's estimated utilization when the task is also the current one. The same benchmark of the bug report, executed on a dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine, reports these "Execl Throughput" figures (higher the better): mainline : 48136.5 lps mainline+fix : 55376.5 lps which correspond to a 15% speedup. Moreover, since {cpu_util,capacity_spare}_wake() are not really only used from the wakeup path, let's remove this ambiguity by using a better matching name: {cpu_util,capacity_spare}_without(). Since we are at that, let's also improve the existing documentation. Reported-by: Aaron Lu <aaron.lu@intel.com> Reported-by: Ye Xiaolong <xiaolong.ye@intel.com> Tested-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Fixes: f9be3e5961c5 (sched/fair: Use util_est in LB and WU paths) Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/ Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* sched/core: Take the hotplug lock in sched_init_smp()Valentin Schneider2018-11-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 40fa3780bac2b654edf23f6b13f4e2dd550aea10 ] When running on linux-next (8c60c36d0b8c ("Add linux-next specific files for 20181019")) + CONFIG_PROVE_LOCKING=y on a big.LITTLE system (e.g. Juno or HiKey960), we get the following report: [ 0.748225] Call trace: [ 0.750685] lockdep_assert_cpus_held+0x30/0x40 [ 0.755236] static_key_enable_cpuslocked+0x20/0xc8 [ 0.760137] build_sched_domains+0x1034/0x1108 [ 0.764601] sched_init_domains+0x68/0x90 [ 0.768628] sched_init_smp+0x30/0x80 [ 0.772309] kernel_init_freeable+0x278/0x51c [ 0.776685] kernel_init+0x10/0x108 [ 0.780190] ret_from_fork+0x10/0x18 The static_key in question is 'sched_asym_cpucapacity' introduced by commit: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations") In this particular case, we enable it because smp_prepare_cpus() will end up fetching the capacity-dmips-mhz entry from the devicetree, so we already have some asymmetry detected when entering sched_init_smp(). This didn't get detected in tip/sched/core because we were missing: commit cb538267ea1e ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations") Calls to build_sched_domains() post sched_init_smp() will hold the hotplug lock, it just so happens that this very first call is a special case. As stated by a comment in sched_init_smp(), "There's no userspace yet to cause hotplug operations" so this is a harmless warning. However, to both respect the semantics of underlying callees and make lockdep happy, take the hotplug lock in sched_init_smp(). This also satisfies the comment atop sched_init_domains() that says "Callers must hold the hotplug lock". Reported-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar.Eggemann@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: morten.rasmussen@arm.com Cc: quentin.perret@arm.com Link: http://lkml.kernel.org/r/1540301851-3048-1-git-send-email-valentin.schneider@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unprivDaniel Borkmann2018-11-271-0/+1
| | | | | | | | | | | | | | | | | | [ Upstream commit 28c2fae726bf5003cd209b0d5910a642af98316f ] While dbecd7388476 ("bpf: get kernel symbol addresses via syscall") zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries from unprivileged users, commit 815581c11cc2 ("bpf: get JITed image lengths of functions via syscall") forgot about doing so and therefore returns the #elems of the user set up buffer which is incorrect. It also needs to indicate a info.nr_jited_func_lens of zero. Fixes: 815581c11cc2 ("bpf: get JITed image lengths of functions via syscall") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Revert "x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation"Greg Kroah-Hartman2018-11-231-10/+1Star
| | | | | | | | | | | | | | | | | | | This reverts commit 233b9d7df0e114c7e7c3674559fb0fc41ada3e8f which is commit 53c613fe6349994f023245519265999eed75957f upstream. It's not ready for the stable trees as there are major slowdowns involved with this patch. Reported-by: Jiri Kosina <jkosina@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "WoodhouseDavid" <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: "SchauflerCasey" <casey.schaufler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kdb: print real address of pointers instead of hashed addressesChristophe Leroy2018-11-212-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 568fb6f42ac6851320adaea25f8f1b94de14e40a upstream. Since commit ad67b74d2469 ("printk: hash addresses printed with %p"), all pointers printed with %p are printed with hashed addresses instead of real addresses in order to avoid leaking addresses in dmesg and syslog. But this applies to kdb too, with is unfortunate: Entering kdb (current=0x(ptrval), pid 329) due to Keyboard Entry kdb> ps 15 sleeping system daemon (state M) processes suppressed, use 'ps A' to see all. Task Addr Pid Parent [*] cpu State Thread Command 0x(ptrval) 329 328 1 0 R 0x(ptrval) *sh 0x(ptrval) 1 0 0 0 S 0x(ptrval) init 0x(ptrval) 3 2 0 0 D 0x(ptrval) rcu_gp 0x(ptrval) 4 2 0 0 D 0x(ptrval) rcu_par_gp 0x(ptrval) 5 2 0 0 D 0x(ptrval) kworker/0:0 0x(ptrval) 6 2 0 0 D 0x(ptrval) kworker/0:0H 0x(ptrval) 7 2 0 0 D 0x(ptrval) kworker/u2:0 0x(ptrval) 8 2 0 0 D 0x(ptrval) mm_percpu_wq 0x(ptrval) 10 2 0 0 D 0x(ptrval) rcu_preempt The whole purpose of kdb is to debug, and for debugging real addresses need to be known. In addition, data displayed by kdb doesn't go into dmesg. This patch replaces all %p by %px in kdb in order to display real addresses. Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Cc: <stable@vger.kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kdb: use correct pointer when 'btc' calls 'btt'Christophe Leroy2018-11-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit dded2e159208a9edc21dd5c5f583afa28d378d39 upstream. On a powerpc 8xx, 'btc' fails as follows: Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry kdb> btc btc: cpu status: Currently on cpu 0 Available cpus: 0 kdb_getarea: Bad address 0x0 when booting the kernel with 'debug_boot_weak_hash', it fails as well Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry kdb> btc btc: cpu status: Currently on cpu 0 Available cpus: 0 kdb_getarea: Bad address 0xba99ad80 On other platforms, Oopses have been observed too, see https://github.com/linuxppc/linux/issues/139 This is due to btc calling 'btt' with %p pointer as an argument. This patch replaces %p by %px to get the real pointer value as expected by 'btt' Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Cc: <stable@vger.kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tracing/kprobes: Check the probe on unloaded module correctlyMasami Hiramatsu2018-11-211-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 59158ec4aef7d44be51a6f3e7e17fc64c32604eb ] Current kprobe event doesn't checks correctly whether the given event is on unloaded module or not. It just checks the event has ":" in the name. That is not enough because if we define a probe on non-exist symbol on loaded module, it allows to define that (with warning message) To ensure it correctly, this searches the module name on loaded module list and only if there is not, it allows to define it. (this event will be available when the target module is loaded) Link: http://lkml.kernel.org/r/153547309528.26502.8300278470528281328.stgit@devbox Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* bpf: wait for running BPF programs when updating map-in-mapDaniel Colascione2018-11-131-0/+13
| | | | | | | | | | | | | | | | | | | | | commit 1ae80cf31938c8f77c37a29bbe29e7f1cd492be8 upstream. The map-in-map frequently serves as a mechanism for atomic snapshotting of state that a BPF program might record. The current implementation is dangerous to use in this way, however, since userspace has no way of knowing when all programs that might have retrieved the "old" value of the map may have completed. This change ensures that map update operations on map-in-map map types always wait for all references to the old map to drop before returning to userspace. Signed-off-by: Daniel Colascione <dancol@google.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* userns: also map extents in the reverse map to kernel IDsJann Horn2018-11-131-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d2f007dbe7e4c9583eea6eb04d60001e85c6f1bd upstream. The current logic first clones the extent array and sorts both copies, then maps the lower IDs of the forward mapping into the lower namespace, but doesn't map the lower IDs of the reverse mapping. This means that code in a nested user namespace with >5 extents will see incorrect IDs. It also breaks some access checks, like inode_owner_or_capable() and privileged_wrt_inode_uidgid(), so a process can incorrectly appear to be capable relative to an inode. To fix it, we have to make sure that the "lower_first" members of extents in both arrays are translated; and we have to make sure that the reverse map is sorted *after* the translation (since otherwise the translation can break the sorting). This is CVE-2018-18955. Fixes: 6397fac4915a ("userns: bump idmap limits to 340") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Tested-by: Eric W. Biederman <ebiederm@xmission.com> Reviewed-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tracing: Return -ENOENT if there is no target synthetic eventMasami Hiramatsu2018-11-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | commit 18858511fd8a877303cc34c06efa461b26a0e070 upstream. Return -ENOENT error if there is no target synthetic event. This notices an operation failure to user as below; # echo 'wakeup_latency u64 lat; pid_t pid;' > synthetic_events # echo '!wakeup' >> synthetic_events sh: write error: No such file or directory Link: http://lkml.kernel.org/r/154013449986.25576.9487131386597290172.stgit@devbox Acked-by: Tom Zanussi <zanussi@linux.intel.com> Tested-by: Tom Zanussi <zanussi@linux.intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Cc: stable@vger.kernel.org Fixes: 4b147936fa50 ('tracing: Add support for 'synthetic' events') Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* genirq: Fix race on spurious interrupt detectionLukas Wunner2018-11-131-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 746a923b863a1065ef77324e1e43f19b1a3eab5c upstream. Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of threaded irqs") made detection of spurious interrupts work for threaded handlers by: a) incrementing a counter every time the thread returns IRQ_HANDLED, and b) checking whether that counter has increased every time the thread is woken. However for oneshot interrupts, the commit unmasks the interrupt before incrementing the counter. If another interrupt occurs right after unmasking but before the counter is incremented, that interrupt is incorrectly considered spurious: time | irq_thread() | irq_thread_fn() | action->thread_fn() | irq_finalize_oneshot() | unmask_threaded_irq() /* interrupt is unmasked */ | | /* interrupt fires, incorrectly deemed spurious */ | | atomic_inc(&desc->threads_handled); /* counter is incremented */ v This is observed with a hi3110 CAN controller receiving data at high volume (from a separate machine sending with "cangen -g 0 -i -x"): The controller signals a huge number of interrupts (hundreds of millions per day) and every second there are about a dozen which are deemed spurious. In theory with high CPU load and the presence of higher priority tasks, the number of incorrectly detected spurious interrupts might increase beyond the 99,900 threshold and cause disablement of the interrupt. In practice it just increments the spurious interrupt count. But that can cause people to waste time investigating it over and over. Fix it by moving the accounting before the invocation of irq_finalize_oneshot(). [ tglx: Folded change log update ] Fixes: 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of threaded irqs") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mathias Duckeck <m.duckeck@kunbus.de> Cc: Akshay Bhat <akshay.bhat@timesys.com> Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com> Cc: stable@vger.kernel.org # v3.16+ Link: https://lkml.kernel.org/r/1dfd8bbd16163940648045495e3e9698e63b50ad.1539867047.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* printk: Fix panic caused by passing log_buf_len to command lineHe Zhe2018-11-131-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 277fcdb2cfee38ccdbe07e705dbd4896ba0c9930 upstream. log_buf_len_setup does not check input argument before passing it to simple_strtoull. The argument would be a NULL pointer if "log_buf_len", without its value, is set in command line and thus causes the following panic. PANIC: early exception 0xe3 IP 10:ffffffffaaeacd0d error 0 cr2 0x0 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc4-yocto-standard+ #1 [ 0.000000] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70 ... [ 0.000000] Call Trace: [ 0.000000] simple_strtoull+0x29/0x70 [ 0.000000] memparse+0x26/0x90 [ 0.000000] log_buf_len_setup+0x17/0x22 [ 0.000000] do_early_param+0x57/0x8e [ 0.000000] parse_args+0x208/0x320 [ 0.000000] ? rdinit_setup+0x30/0x30 [ 0.000000] parse_early_options+0x29/0x2d [ 0.000000] ? rdinit_setup+0x30/0x30 [ 0.000000] parse_early_param+0x36/0x4d [ 0.000000] setup_arch+0x336/0x99e [ 0.000000] start_kernel+0x6f/0x4ee [ 0.000000] x86_64_start_reservations+0x24/0x26 [ 0.000000] x86_64_start_kernel+0x6f/0x72 [ 0.000000] secondary_startup_64+0xa4/0xb0 This patch adds a check to prevent the panic. Link: http://lkml.kernel.org/r/1538239553-81805-1-git-send-email-zhe.he@windriver.com Cc: stable@vger.kernel.org Cc: rostedt@goodmis.org Cc: linux-kernel@vger.kernel.org Signed-off-by: He Zhe <zhe.he@windriver.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kbuild: fix kernel/bounds.c 'W=1' warningArnd Bergmann2018-11-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6a32c2469c3fbfee8f25bcd20af647326650a6cf upstream. Building any configuration with 'make W=1' produces a warning: kernel/bounds.c:16:6: warning: no previous prototype for 'foo' [-Wmissing-prototypes] When also passing -Werror, this prevents us from building any other files. Nobody ever calls the function, but we can't make it 'static' either since we want the compiler output. Calling it 'main' instead however avoids the warning, because gcc does not insist on having a declaration for main. Link: http://lkml.kernel.org/r/20181005083313.2088252-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* signal: Guard against negative signal numbers in copy_siginfo_from_user32Eric W. Biederman2018-11-131-1/+1
| | | | | | | | | | | | | | | | | | | commit a36700589b85443e28170be59fa11c8a104130a5 upstream. While fixing an out of bounds array access in known_siginfo_layout reported by the kernel test robot it became apparent that the same bug exists in siginfo_layout and affects copy_siginfo_from_user32. The straight forward fix that makes guards against making this mistake in the future and should keep the code size small is to just take an unsigned signal number instead of a signed signal number, as I did to fix known_siginfo_layout. Cc: stable@vger.kernel.org Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* signal: Always deliver the kernel's SIGKILL and SIGSTOP to a pid namespace initEric W. Biederman2018-11-131-1/+1
| | | | | | | | | | | | | | | | | [ Upstream commit 3597dfe01d12f570bc739da67f857fd222a3ea66 ] Instead of playing whack-a-mole and changing SEND_SIG_PRIV to SEND_SIG_FORCED throughout the kernel to ensure a pid namespace init gets signals sent by the kernel, stop allowing a pid namespace init to ignore SIGKILL or SIGSTOP sent by the kernel. A pid namespace init is only supposed to be able to ignore signals sent from itself and children with SIG_DFL. Fixes: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* bpf/verifier: fix verifier instabilityAlexei Starovoitov2018-11-131-8/+8
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a9c676bc8fc58d00eea9836fb14ee43c0346416a ] Edward Cree says: In check_mem_access(), for the PTR_TO_CTX case, after check_ctx_access() has supplied a reg_type, the other members of the register state are set appropriately. Previously reg.range was set to 0, but as it is in a union with reg.map_ptr, which is larger, upper bytes of the latter were left in place. This then caused the memcmp() in regsafe() to fail, preventing some branches from being pruned (and occasionally causing the same program to take a varying number of processed insns on repeated verifier runs). Fix the instability by clearing bpf_reg_state in __mark_reg_[un]known() Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Debugged-by: Edward Cree <ecree@solarflare.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* kprobes: Return error if we fail to reuse kprobe instead of BUG_ON()Masami Hiramatsu2018-11-131-7/+20
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 819319fc93461c07b9cdb3064f154bd8cfd48172 ] Make reuse_unused_kprobe() to return error code if it fails to reuse unused kprobe for optprobe instead of calling BUG_ON(). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/153666124040.21306.14150398706331307654.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* signal: Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstackWill Deacon2018-11-131-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 22839869f21ab3850fbbac9b425ccc4c0023926f ] The sigaltstack(2) system call fails with -ENOMEM if the new alternative signal stack is found to be smaller than SIGMINSTKSZ. On architectures such as arm64, where the native value for SIGMINSTKSZ is larger than the compat value, this can result in an unexpected error being reported to a compat task. See, for example: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904385 This patch fixes the problem by extending do_sigaltstack to take the minimum signal stack size as an additional parameter, allowing the native and compat system call entry code to pass in their respective values. COMPAT_SIGMINSTKSZ is just defined as SIGMINSTKSZ if it has not been defined by the architecture. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Reported-by: Steve McIntyre <steve.mcintyre@arm.com> Tested-by: Steve McIntyre <93sam@debian.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* locking/lockdep: Fix debug_locks off performance problemWaiman Long2018-11-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9506a7425b094d2f1d9c877ed5a78f416669269b ] It was found that when debug_locks was turned off because of a problem found by the lockdep code, the system performance could drop quite significantly when the lock_stat code was also configured into the kernel. For instance, parallel kernel build time on a 4-socket x86-64 server nearly doubled. Further analysis into the cause of the slowdown traced back to the frequent call to debug_locks_off() from the __lock_acquired() function probably due to some inconsistent lockdep states with debug_locks off. The debug_locks_off() function did an unconditional atomic xchg to write a 0 value into debug_locks which had already been set to 0. This led to severe cacheline contention in the cacheline that held debug_locks. As debug_locks is being referenced in quite a few different places in the kernel, this greatly slow down the system performance. To prevent that trashing of debug_locks cacheline, lock_acquired() and lock_contended() now checks the state of debug_locks before proceeding. The debug_locks_off() function is also modified to check debug_locks before calling __debug_locks_off(). Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigationJiri Kosina2018-11-131-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 53c613fe6349994f023245519265999eed75957f upstream. STIBP is a feature provided by certain Intel ucodes / CPUs. This feature (once enabled) prevents cross-hyperthread control of decisions made by indirect branch predictors. Enable this feature if - the CPU is vulnerable to spectre v2 - the CPU supports SMT and has SMT siblings online - spectre_v2 mitigation autoselection is enabled (default) After some previous discussion, this leaves STIBP on all the time, as wrmsr on crossing kernel boundary is a no-no. This could perhaps later be a bit more optimized (like disabling it in NOHZ, experiment with disabling it in idle, etc) if needed. Note that the synchronization of the mask manipulation via newly added spec_ctrl_mutex is currently not strictly needed, as the only updater is already being serialized by cpu_add_remove_lock, but let's make this a little bit more future-proof. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "WoodhouseDavid" <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: "SchauflerCasey" <casey.schaufler@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* dma-mapping: fix panic caused by passing empty cma command line argumentHe Zhe2018-11-131-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a3ceed87b07769fb80ce9dc6b604e515dba14c4b upstream. early_cma does not check input argument before passing it to simple_strtoull. The argument would be a NULL pointer if "cma", without its value, is set in command line and thus causes the following panic. PANIC: early exception 0xe3 IP 10:ffffffffa3e9db8d error 0 cr2 0x0 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-yocto-standard+ #7 [ 0.000000] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70 ... [ 0.000000] Call Trace: [ 0.000000] simple_strtoull+0x29/0x70 [ 0.000000] memparse+0x26/0x90 [ 0.000000] early_cma+0x17/0x6a [ 0.000000] do_early_param+0x57/0x8e [ 0.000000] parse_args+0x208/0x320 [ 0.000000] ? rdinit_setup+0x30/0x30 [ 0.000000] parse_early_options+0x29/0x2d [ 0.000000] ? rdinit_setup+0x30/0x30 [ 0.000000] parse_early_param+0x36/0x4d [ 0.000000] setup_arch+0x336/0x99e [ 0.000000] start_kernel+0x6f/0x4e6 [ 0.000000] x86_64_start_reservations+0x24/0x26 [ 0.000000] x86_64_start_kernel+0x6f/0x72 [ 0.000000] secondary_startup_64+0xa4/0xb0 This patch adds a check to prevent the panic. Signed-off-by: He Zhe <zhe.he@windriver.com> Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* bpf: fix partial copy of map_ptr when dst is scalarDaniel Borkmann2018-11-131-4/+6
| | | | | | | | | | | | | | | | | | | | commit 0962590e553331db2cc0aef2dc35c57f6300dbbe upstream. ALU operations on pointers such as scalar_reg += map_value_ptr are handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr and range in the register state share a union, so transferring state through dst_reg->range = ptr_reg->range is just buggy as any new map_ptr in the dst_reg is then truncated (or null) for subsequent checks. Fix this by adding a raw member and use it for copying state over to dst_reg. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* Merge branch 'sched-urgent-for-linus' of ↵Greg Kroah-Hartman2018-10-202-4/+22
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "scheduler fixes: Two fixes: a CFS-throttling bug fix, and an interactivity fix." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix the min_vruntime update logic in dequeue_entity() sched/fair: Fix throttle_list starvation with low CFS quota
| * sched/fair: Fix the min_vruntime update logic in dequeue_entity()Song Muchun2018-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment and the code around the update_min_vruntime() call in dequeue_entity() are not in agreement. From commit: b60205c7c558 ("sched/fair: Fix min_vruntime tracking") I think that we want to update min_vruntime when a task is sleeping/migrating. So, the check is inverted there - fix it. Signed-off-by: Song Muchun <smuchun@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b60205c7c558 ("sched/fair: Fix min_vruntime tracking") Link: http://lkml.kernel.org/r/20181014112612.2614-1-smuchun@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * sched/fair: Fix throttle_list starvation with low CFS quotaPhil Auld2018-10-112-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With a very low cpu.cfs_quota_us setting, such as the minimum of 1000, distribute_cfs_runtime may not empty the throttled_list before it runs out of runtime to distribute. In that case, due to the change from c06f04c7048 to put throttled entries at the head of the list, later entries on the list will starve. Essentially, the same X processes will get pulled off the list, given CPU time and then, when expired, get put back on the head of the list where distribute_cfs_runtime will give runtime to the same set of processes leaving the rest. Fix the issue by setting a bit in struct cfs_bandwidth when distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can decide to put the throttled entry on the tail or the head of the list. The bit is set/cleared by the callers of distribute_cfs_runtime while they hold cfs_bandwidth->lock. This is easy to reproduce with a handful of CPU consumers. I use 'crash' on the live system. In some cases you can simply look at the throttled list and see the later entries are not changing: crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1" "$4}' | pr -t -n3 1 ffff90b56cb2d200 -976050 2 ffff90b56cb2cc00 -484925 3 ffff90b56cb2bc00 -658814 4 ffff90b56cb2ba00 -275365 5 ffff90b166a45600 -135138 6 ffff90b56cb2da00 -282505 7 ffff90b56cb2e000 -148065 8 ffff90b56cb2fa00 -872591 9 ffff90b56cb2c000 -84687 10 ffff90b56cb2f000 -87237 11 ffff90b166a40a00 -164582 crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1" "$4}' | pr -t -n3 1 ffff90b56cb2d200 -994147 2 ffff90b56cb2cc00 -306051 3 ffff90b56cb2bc00 -961321 4 ffff90b56cb2ba00 -24490 5 ffff90b166a45600 -135138 6 ffff90b56cb2da00 -282505 7 ffff90b56cb2e000 -148065 8 ffff90b56cb2fa00 -872591 9 ffff90b56cb2c000 -84687 10 ffff90b56cb2f000 -87237 11 ffff90b166a40a00 -164582 Sometimes it is easier to see by finding a process getting starved and looking at the sched_info: crash> task ffff8eb765994500 sched_info PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest" sched_info = { pcount = 8, run_delay = 697094208, last_arrival = 240260125039, last_queued = 240260327513 }, crash> task ffff8eb765994500 sched_info PID: 7800 TASK: ffff8eb765994500 CPU: 16 COMMAND: "cputest" sched_info = { pcount = 8, run_delay = 697094208, last_arrival = 240260125039, last_queued = 240260327513 }, Signed-off-by: Phil Auld <pauld@redhat.com> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: c06f04c70489 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop") Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csb Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge tag 'trace-v4.19-rc8-2' of ↵Greg Kroah-Hartman2018-10-201-7/+25
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "tracing: A few small fixes to synthetic events Masami found some issues with the creation of synthetic events. The first two patches fix handling of unsigned type, and handling of a space before an ending semi-colon. The third patch adds a selftest to test the processing of synthetic events." * tag 'trace-v4.19-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests: ftrace: Add synthetic event syntax testcase tracing: Fix synthetic event to allow semicolon at end tracing: Fix synthetic event to accept unsigned modifier
| * | tracing: Fix synthetic event to allow semicolon at endMasami Hiramatsu2018-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix synthetic event to allow independent semicolon at end. The synthetic_events interface accepts a semicolon after the last word if there is no space. # echo "myevent u64 var;" >> synthetic_events But if there is a space, it returns an error. # echo "myevent u64 var ;" > synthetic_events sh: write error: Invalid argument This behavior is difficult for users to understand. Let's allow the last independent semicolon too. Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: stable@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
| * | tracing: Fix synthetic event to accept unsigned modifierMasami Hiramatsu2018-10-191-6/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix synthetic event to accept unsigned modifier for its field type correctly. Currently, synthetic_events interface returns error for "unsigned" modifiers as below; # echo "myevent unsigned long var" >> synthetic_events sh: write error: Invalid argument This is because argv_split() breaks "unsigned long" into "unsigned" and "long", but parse_synth_field() doesn't expected it. With this fix, synthetic_events can handle the "unsigned long" correctly like as below; # echo "myevent unsigned long var" >> synthetic_events # cat synthetic_events myevent unsigned long var Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: stable@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman2018-10-191-8/+2Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David writes: "Networking 1) Fix gro_cells leak in xfrm layer, from Li RongQing. 2) BPF selftests change RLIMIT_MEMLOCK blindly, don't do that. From Eric Dumazet. 3) AF_XDP calls synchronize_net() under RCU lock, fix from Björn Töpel. 4) Out of bounds packet access in _decode_session6(), from Alexei Starovoitov. 5) Several ethtool bugs, where we copy a struct into the kernel twice and our validations of the values in the first copy can be invalidated by the second copy due to asynchronous updates to the memory by the user. From Wenwen Wang. 6) Missing netlink attribute validation in cls_api, from Davide Caratti. 7) LLC SAP sockets neet to be SOCK_RCU FREE, from Cong Wang. 8) rxrpc operates on wrong kvec, from Yue Haibing. 9) A regression was introduced by the disassosciation of route neighbour references in rt6_probe(), causing probe for neighbourless routes to not be properly rate limited. Fix from Sabrina Dubroca. 10) Unsafe RCU locking in tipc, from Tung Nguyen. 11) Use after free in inet6_mc_check(), from Eric Dumazet. 12) PMTU from icmp packets should update the SCTP transport pathmtu, from Xin Long. 13) Missing peer put on error in rxrpc, from David Howells. 14) Fix pedit in nfp driver, from Pieter Jansen van Vuuren. 15) Fix overflowing shift statement in qla3xxx driver, from Nathan Chancellor. 16) Fix Spectre v1 in ptp code, from Gustavo A. R. Silva. 17) udp6_unicast_rcv_skb() interprets udpv6_queue_rcv_skb() return value in an inverted manner, fix from Paolo Abeni. 18) Fix missed unresolved entries in ipmr dumps, from Nikolay Aleksandrov. 19) Fix NAPI handling under high load, we can completely miss events when NAPI has to loop more than one time in a cycle. From Heiner Kallweit." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits) ip6_tunnel: Fix encapsulation layout tipc: fix info leak from kernel tipc_event net: socket: fix a missing-check bug net: sched: Fix for duplicate class dump r8169: fix NAPI handling under high load net: ipmr: fix unresolved entry dumps net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion() sctp: fix the data size calculation in sctp_data_size virtio_net: avoid using netif_tx_disable() for serializing tx routine udp6: fix encap return code for resubmitting mlxsw: core: Fix use-after-free when flashing firmware during init sctp: not free the new asoc when sctp_wait_for_connect returns err sctp: fix race on sctp_id2asoc r8169: re-enable MSI-X on RTL8168g net: bpfilter: use get_pid_task instead of pid_task ptp: fix Spectre v1 vulnerability net: qla3xxx: Remove overflowing shift statement geneve, vxlan: Don't set exceptions if skb->len < mtu geneve, vxlan: Don't check skb_dst() twice sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead ...
| * \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2018-10-141-8/+2Star
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2018-10-14 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix xsk map update and delete operation to not call synchronize_net() but to piggy back on SOCK_RCU_FREE for sockets instead as we are not allowed to sleep under RCU, from Björn. 2) Do not change RLIMIT_MEMLOCK in reuseport_bpf selftest if the process already has unlimited RLIMIT_MEMLOCK, from Eric. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | xsk: do not call synchronize_net() under RCU read lockBjörn Töpel2018-10-111-8/+2Star
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The XSKMAP update and delete functions called synchronize_net(), which can sleep. It is not allowed to sleep during an RCU read section. Instead we need to make sure that the sock sk_destruct (xsk_destruct) function is asynchronously called after an RCU grace period. Setting the SOCK_RCU_FREE flag for XDP sockets takes care of this. Fixes: fbfc504a24f5 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | | Merge tag 'trace-v4.19-rc8' of ↵Greg Kroah-Hartman2018-10-182-21/+13Star
|\ \ \ \ | |/ / / |/| | / | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "tracing: Two fixes for 4.19 This fixes two bugs: - Fix size mismatch of tracepoint array - Have preemptirq test module use same clock source of the selftest" * tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c tracepoint: Fix tracepoint array element size mismatch
| * | tracing: Use trace_clock_local() for looping in preemptirq_delay_test.cSteven Rostedt (VMware)2018-10-171-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The preemptirq_delay_test module is used for the ftrace selftest code that tests the latency tracers. The problem is that it uses ktime for the delay loop, and then checks the tracer to see if the delay loop is caught, but the tracer uses trace_clock_local() which uses various different other clocks to measure the latency. As ktime uses the clock cycles, and the code then converts that to nanoseconds, it causes rounding errors, and the preemptirq latency tests are failing due to being off by 1 (it expects to see a delay of 500000 us, but the delay is only 499999 us). This is happening due to a rounding error in the ktime (which is totally legit). The purpose of the test is to see if it can catch the delay, not to test the accuracy between trace_clock_local() and ktime_get(). Best to use apples to apples, and have the delay loop use the same clock as the latency tracer does. Cc: stable@vger.kernel.org Fixes: f96e8577da102 ("lib: Add module for testing preemptoff/irqsoff latency tracers") Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>