summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/fadump: reuse crashkernel parameter for fadump memory reservationHari Bathini2017-05-091-13/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fadump supports specifying memory to reserve for fadump's crash kernel with fadump_reserve_mem kernel parameter. This parameter currently supports passing a fixed memory size, like fadump_reserve_mem=<size> only. This patch aims to add support for other syntaxes like range-based memory size <range1>:<size1>[,<range2>:<size2>,<range3>:<size3>,...] which allows using the same parameter to boot the kernel with different system RAM sizes. As crashkernel parameter already supports the above mentioned syntaxes, this patch deprecates fadump_reserve_mem parameter and reuses crashkernel parameter instead, to specify memory for fadump's crash kernel memory reservation as well. If any offset is provided in crashkernel parameter, it will be ignored in case of fadump, as fadump reserves memory at end of RAM. Advantages using crashkernel parameter instead of fadump_reserve_mem parameter are one less kernel parameter overall, code reuse and support for multiple syntaxes to specify memory. Suggested-by: Dave Young <dyoung@redhat.com> Link: http://lkml.kernel.org/r/149035346749.6881.911095631212975718.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* powerpc/fadump: remove dependency with CONFIG_KEXECHari Bathini2017-05-095-37/+16Star
| | | | | | | | | | | | | | | | | | | | Now that crashkernel parameter parsing and vmcoreinfo related code is moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove dependency with CONFIG_KEXEC for CONFIG_FA_DUMP. While here, get rid of definitions of fadump_append_elf_note() & fadump_final_note() functions to reuse similar functions compiled under CONFIG_CRASH_CORE. Link: http://lkml.kernel.org/r/149035343956.6881.1536459326017709354.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ia64: reuse append_elf_note() and final_note() functionsHari Bathini2017-05-095-70/+20Star
| | | | | | | | | | | | | | | | | | | Get rid of multiple definitions of append_elf_note() & final_note() functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also, define Elf_Word and use it instead of generic u32 or the more specific Elf64_Word. Link: http://lkml.kernel.org/r/149035342324.6881.11667840929850361402.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_COREHari Bathini2017-05-099-462/+531
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and reuse crashkernel parameter for fadump", v4. Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. This patchset removes dependency with CONFIG_KEXEC for crashkernel parameter and vmcoreinfo related code as it can be reused without kexec support. Also, crashkernel parameter is reused instead of fadump_reserve_mem to reserve memory for fadump. The first patch moves crashkernel parameter parsing and vmcoreinfo related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The second patch reuses the definitions of append_elf_note() & final_note() functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump) in powerpc. The next patch reuses crashkernel parameter for reserving memory for fadump, instead of the fadump_reserve_mem parameter. This has the advantage of using all syntaxes crashkernel parameter supports, for fadump as well. The last patch updates fadump kernel documentation about use of crashkernel parameter. This patch (of 5): Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. But currently, code related to vmcoreinfo and parsing of crashkernel parameter is built under CONFIG_KEXEC_CORE. This patch introduces CONFIG_CRASH_CORE and moves the above mentioned code under this config, allowing code reuse without dependency on CONFIG_KEXEC. There is no functional change with this patch. Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpumask: make "nr_cpumask_bits" unsignedAlexey Dobriyan2017-05-093-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bit searching functions accept "unsigned long" indices but "nr_cpumask_bits" is "int" which is signed, so inevitable sign extensions occur on x86_64. Those MOVSX are #1 MOVSX bloat by number of uses across whole kernel. Change "nr_cpumask_bits" to unsigned, this number can't be negative after all. It allows to do implicit zero-extension on x86_64 without MOVSX. Change signed comparisons into unsigned comparisons where necessary. Other uses looks fine because it is either argument passed to a function or comparison is already unsigned. Net win on allyesconfig type of kernel: ~2.8 KB (!) add/remove: 0/0 grow/shrink: 8/725 up/down: 93/-2926 (-2833) function old new delta xen_exit_mmap 691 735 +44 qstat_read 426 440 +14 __cpufreq_cooling_register 1678 1687 +9 trace_rb_cpu_prepare 447 455 +8 vermagic 54 60 +6 nfp_driver_version 54 60 +6 rcu_torture_stats_print 1147 1151 +4 find_next_push_cpu 267 269 +2 xen_irq_resume 961 960 -1 ... init_vp_index 946 906 -40 od_set_powersave_bias 328 281 -47 power_cpu_exit 193 139 -54 arch_show_interrupts 3538 3484 -54 select_idle_sibling 1558 1471 -87 Total: Before=158358910, After=158356077, chg -0.00% Same arguments apply to "nr_cpu_ids" but I haven't yet found enough courage to delve into this issue (and proper fix may require new type "cpu_t" which is whole separate story). Link: http://lkml.kernel.org/r/20170309205322.GA1728@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fork: free vmapped stacks in cache when cpus are offlineHoeun Ryu2017-05-091-0/+23
| | | | | | | | | | | | | | | | | | | | | | | Using virtually mapped stack, kernel stacks are allocated via vmalloc. In the current implementation, two stacks per cpu can be cached when tasks are freed and the cached stacks are used again in task duplications. But the cached stacks may remain unfreed even when cpu are offline. By adding a cpu hotplug callback to free the cached stacks when a cpu goes offline, the pages of the cached stacks are not wasted. Link: http://lkml.kernel.org/r/1487076043-17802-1-git-send-email-hoeun.ryu@gmail.com Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* reiserfs: use designated initializersKees Cook2017-05-091-12/+12
| | | | | | | | | | | | | Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. Link: http://lkml.kernel.org/r/20170329210419.GA40066@beast Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: improve the SUSPECT_CODE_INDENT testJoe Perches2017-05-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | The current SUSPECT_CODE_INDENT test does not recognize several defective code style defects where code following a logical test is inappropriately indented. Before this patch, for code like: if (foo) bar(); checkpatch would not emit a warning. Improve the test to warn when code after a logical test has the same indentation as the logical test. Perform the same indentation test for "else" blocks too. Link: http://lkml.kernel.org/r/df2374b68c4a68af2b7ef08afe486584811f610a.1493683942.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: improve the embedded function name test for patch contextsJoe Perches2017-05-091-9/+8Star
| | | | | | | | | | | | | | The current test works only for a single patch context as it is done in the foreach ($rawlines) loop that precedes the loop where the actual $context_function variable is used. Move the set of $context_function into the foreach (@lines) loop where it is useful for each patch context. Link: http://lkml.kernel.org/r/6c675a31c74fbfad4fc45b9f462303d60ca2a283.1493486091.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: add --typedefsfileJerome Forissier2017-05-091-17/+35
| | | | | | | | | | | | | | | | | | | | | | | | | When using checkpatch on out-of-tree code, it may occur that some project-specific types are used, which will cause spurious warnings. Add the --typedefsfile option as a way to extend the known types and deal with this issue. This was developed for OP-TEE [1]. We run a Travis job on all pull requests [2], and checkpatch is part of that. The typical false warning we get on a regular basis is with some pointers to functions returning TEE_Result [3], which is a typedef from the GlobalPlatform APIs. We consider it is acceptable to use GP types in the OP-TEE core implementation, that's why this patch would be helpful for us. [1] https://github.com/OP-TEE/optee_os [2] https://travis-ci.org/OP-TEE/optee_os/builds [3] https://travis-ci.org/OP-TEE/optee_os/builds/193355335#L1733 Link: http://lkml.kernel.org/r/ba1124d6dfa599bb0dd1d8919dd45dd09ce541a4.1492702192.git.jerome.forissier@linaro.org Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: improve k.alloc with multiplication and sizeof testJoe Perches2017-05-091-3/+10
| | | | | | | | | | | Find multi-line uses of k.alloc by using the $stat variable and not the $line variable. This can still --fix only the single line variant though. Link: http://lkml.kernel.org/r/3f4b23d37cd4c7d8628eefc25afe83ba8fb3ab55.1493167076.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: special audit for revert commit lineWei Wang2017-05-091-0/+1
| | | | | | | | | | | | Currently checkpatch.pl does not recognize git's default commit revert message and will complain about the hash format. Add special audit for revert commit message line to fix it. Link: http://lkml.kernel.org/r/20170411191532.74381-1-wvw@google.com Signed-off-by: Wei Wang <wvw@google.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: clarify the EMBEDDED_FUNCTION_NAME messageJoe Perches2017-05-091-5/+7
| | | | | | | | | | | | Try to make the conversion of embedded function names to "%s: ", __func__ a bit clearer. Add a bit more information to the comment describing the test too. Link: http://lkml.kernel.org/r/38f5d32f0aec1cd98cb9ceeedd6a736cc9a802db.1491759835.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: improve MULTISTATEMENT_MACRO_USE_DO_WHILE testJoe Perches2017-05-091-2/+4
| | | | | | | | | | | | | | | The logic currrently misses macros that start with an if statement. e.g.: #define foo(bar) if (bar) baz; Add a test for macro content that starts with if Link: http://lkml.kernel.org/r/a9d41aafe1673889caf1a9850208fb7fd74107a0.1491783914.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Andreas Mohr <andi@lisas.de> Original-patch-by: Alfonso Lima <alfonsolimaastor@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: avoid suggesting struct definitions should be constJoe Perches2017-05-091-3/+3
| | | | | | | | | | | | | | | | | | | Many structs are generally used const and there is a known list of these structs. struct definitions should not be generally be declared const. Add a test for the lack of an open brace immediately after the struct to avoid definitions. This avoids the false positive "struct foo should normally be const" message only when the open brace is on the same line as the definition. Link: http://lkml.kernel.org/r/0dce709150d712e66f1b90b03827634b53b28085.1491845946.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Arthur Brainville <ybalrid@ybalrid.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: allow space leading blank lines in email headersJoe Perches2017-05-091-2/+2
| | | | | | | | | | | | | | | | | | Allow a leading space and otherwise blank link in the email headers as it can be a line wrapped Spamassassin multiple line string or any other valid rfc 2822/5322 email header. The line with space causes checkpatch to erroneously think that it's in the content body, as opposed to headers and thus flag a mail header as an unwrapped long comment line. Link: http://lkml.kernel.org/r/d75a9f0b78b3488078429f4037d9fff3bdfa3b78.1490247180.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com>Reported-by: Darren Hart (VMware) <dvhart@infradead.org> Tested-by: Darren Hart (VMware) <dvhart@infradead.org> Reviewed-by: Darren Hart (VMware) <dvhart@vmware.com> Original-patch-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: improve EMBEDDED_FUNCTION_NAME testJoe Perches2017-05-091-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing behavior relies on patch context to identify function declarations. Add the ability to find function declarations when there is an open brace in column 1. This finds function declarations only in specific single line forms where the function name is on a single line like: int foo(args...) { and int foo(args...) { It does not recognize function declarations like: int foo(int bar, int baz) { Link: http://lkml.kernel.org/r/738d74bbbe1a06b80f11ed504818107c68903095.1488155636.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: add ability to find bad uses of vsprintf %p<foo> extensionsJoe Perches2017-05-092-0/+29
| | | | | | | | | | | | | | | %pK was at least once misused at %pk in an out-of-tree module. This lead to some security concerns. Add the ability to track single and multiple line statements for misuses of %p<foo>. [akpm@linux-foundation.org: add helpful comment into lib/vsprintf.c] [akpm@linux-foundation.org: text tweak] Link: http://lkml.kernel.org/r/163a690510e636a23187c0dc9caa09ddac6d4cde.1488228427.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: William Roberts <william.c.roberts@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* checkpatch: remove obsolete CONFIG_EXPERIMENTAL checksRuslan Bilovol2017-05-091-13/+0Star
| | | | | | | | | | | | | Config EXPERIMENTAL has been removed from kernel in 2013 (see commit 3d374d09f16f: "final removal of CONFIG_EXPERIMENTAL"), there is no any reason to do these checks now. Link: http://lkml.kernel.org/r/1488234097-20119-1-git-send-email-ruslan.bilovol@gmail.com Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* firmware/Makefile: force recompilation if makefile changesLuis R. Rodriguez2017-05-091-1/+2
| | | | | | | | | | | | | | | | | | If you modify the target asm we currently do not force the recompilation of the firmware files. The target asm is in the firmware/Makefile, peg this file as a dependency to require re-compilation of firmware targets when the asm changes. Link: http://lkml.kernel.org/r/20170123150727.4883-1-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <mmarek@suse.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tom Gundersen <teg@jklm.no> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lib: add module support to linked list sorting testsGeert Uytterhoeven2017-05-094-152/+155
| | | | | | | | | | | | | | | Extract the linked list sorting test code into its own source file, to allow to compile it either to a loadable module, or builtin into the kernel. Link: http://lkml.kernel.org/r/1488287219-15832-4-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lib: add module support to array-based sort testsGeert Uytterhoeven2017-05-091-3/+4
| | | | | | | | | | | | | | Allow to compile the array-based sort test code either to a loadable module, or builtin into the kernel. Link: http://lkml.kernel.org/r/1488287219-15832-3-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Revert "lib/test_sort.c: make it explicitly non-modular"Geert Uytterhoeven2017-05-091-6/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "lib: add module support to sort tests". This patch series allows to compile the array-based and linked list sort test code either to loadable modules, or builtin into the kernel. It's very valuable to have modular tests, so you can run them just by insmodding the test modules, instead of needing a separate kernel that runs them at boot. This patch (of 3): This reverts commit 8893f519330bb073a49c5b4676fce4be6f1be15d. It's very valuable to have modular tests, so you can run them just by insmodding the test modules, instead of needing a separate kernel that runs them at boot. Link: http://lkml.kernel.org/r/1488287219-15832-2-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()Dan Carpenter2017-05-091-2/+2
| | | | | | | | | | | | c2port_device_register() never returns NULL, it uses error pointers. Link: http://lkml.kernel.org/r/20170412083321.GC3250@mwanda Fixes: 65131cd52b9e ("c2port: add c2port support for Eurotech Duramar 2150") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Rodolfo Giometti <giometti@linux.it> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow testsDan Carpenter2017-05-091-2/+8
| | | | | | | | | | | | | | The "DIV_ROUND_UP(size, PAGE_SIZE)" operation can overflow if "size" is more than ULLONG_MAX - PAGE_SIZE. Link: http://lkml.kernel.org/r/20170322111950.GA11279@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kernel/hung_task.c: defer showing held locksTetsuo Handa2017-05-091-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | When I was running my testcase which may block hundreds of threads on fs locks, I got lockup due to output from debug_show_all_locks() added by commit b2d4c2edb2e4 ("locking/hung_task: Show all locks"). For example, if 1000 threads were blocked in TASK_UNINTERRUPTIBLE state and 500 out of 1000 threads hold some lock, debug_show_all_locks() from for_each_process_thread() loop will report locks held by 500 threads for 1000 times. This is a too much noise. In order to make sure rcu_lock_break() is called frequently, we should avoid calling debug_show_all_locks() from for_each_process_thread() loop because debug_show_all_locks() effectively calls for_each_process_thread() loop. Let's defer calling debug_show_all_locks() till before panic() or leaving for_each_process_thread() loop. Link: http://lkml.kernel.org/r/1489296834-60436-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* make help: add tools help targetRandy Dunlap2017-05-091-2/+6
| | | | | | | | | | | Add a top-level Makefile help target for Userspace tools. Also make each help "heading" end with a colon ':'. Link: http://lkml.kernel.org/r/55c986ff-3966-3e47-2984-7349da2cce51@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smpMatthias Kaehlcke2017-05-091-8/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | jiffies_64 is defined in kernel/time/timer.c with ____cacheline_aligned_in_smp, however this macro is not part of the declaration of jiffies and jiffies_64 in jiffies.h. As a result clang generates the following warning: kernel/time/timer.c:57:26: error: section does not match previous declaration [-Werror,-Wsection] __visible u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES; ^ include/linux/cache.h:39:36: note: expanded from macro '__cacheline_aligned_in_smp' ^ include/linux/cache.h:34:4: note: expanded from macro '__cacheline_aligned' __section__(".data..cacheline_aligned"))) ^ include/linux/jiffies.h:77:12: note: previous attribute is here extern u64 __jiffy_data jiffies_64; ^ include/linux/jiffies.h:70:38: note: expanded from macro '__jiffy_data' Link: http://lkml.kernel.org/r/20170403190200.70273-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Cc: Grant Grundler <grundler@chromium.org> Cc: Michael Davidson <md@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/virt/fsl_hypervisor.c: use get_user_pages_unlocked()Lorenzo Stoakes2017-05-091-5/+2Star
| | | | | | | | | | | | | | | | Moving from get_user_pages() to get_user_pages_unlocked() simplifies the code and takes advantage of VM_FAULT_RETRY functionality when faulting in pages. Link: http://lkml.kernel.org/r/20161101194332.23961-1-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc/sysctl: fix the int overflow for jiffies conversionGao Feng2017-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | do_proc_dointvec_jiffies_conv() uses LONG_MAX/HZ as the max value to avoid overflow. But actually the *valp is int type, so it still causes overflow. For example, echo 2147483647 > ./sys/net/ipv4/tcp_keepalive_time Then, cat ./sys/net/ipv4/tcp_keepalive_time The output is "-1", it is not expected. Now use INT_MAX/HZ as the max value instead LONG_MAX/HZ to fix it. Link: http://lkml.kernel.org/r/1490109532-9228-1-git-send-email-fgao@ikuai8.com Signed-off-by: Gao Feng <fgao@ikuai8.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/proc/inode.c: remove cast from memory allocationTobin C. Harding2017-05-091-1/+1
| | | | | | | | | | | | | Coccinelle emits this warning: WARNING: casting value returned by memory allocation function to (struct proc_inode *) is useless. Remove unnecessary cast. Link: http://lkml.kernel.org/r/1487745720-16967-1-git-send-email-me@tobin.cc Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, compaction: finish whole pageblock to reduce fragmentationVlastimil Babka2017-05-092-2/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main goal of direct compaction is to form a high-order page for allocation, but it should also help against long-term fragmentation when possible. Most lower-than-pageblock-order compactions are for non-movable allocations, which means that if we compact in a movable pageblock and terminate as soon as we create the high-order page, it's unlikely that the fallback heuristics will claim the whole block. Instead there might be a single unmovable page in a pageblock full of movable pages, and the next unmovable allocation might pick another pageblock and increase long-term fragmentation. To help against such scenarios, this patch changes the termination criteria for compaction so that the current pageblock is finished even though the high-order page already exists. Note that it might be possible that the high-order page formed elsewhere in the zone due to parallel activity, but this patch doesn't try to detect that. This is only done with sync compaction, because async compaction is limited to pageblock of the same migratetype, where it cannot result in a migratetype fallback. (Async compaction also eagerly skips order-aligned blocks where isolation fails, which is against the goal of migrating away as much of the pageblock as possible.) As a result of this patch, long-term memory fragmentation should be reduced. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 20%. The number Link: http://lkml.kernel.org/r/20170307131545.28577-9-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, compaction: restrict async compaction to pageblocks of same migratetypeVlastimil Babka2017-05-092-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE pageblocks. This is a heuristic intended to reduce latency, based on the assumption that non-MOVABLE pageblocks are unlikely to contain movable pages. However, with the exception of THP's, most high-order allocations are not movable. Should the async compaction succeed, this increases the chance that the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the long-term fragmentation worse. This patch attempts to help the situation by changing async direct compaction so that the migrate scanner only scans the pageblocks of the requested migratetype. If it's a non-MOVABLE type and there are such pageblocks that do contain movable pages, chances are that the allocation can succeed within one of such pageblocks, removing the need for a fallback. If that fails, the subsequent sync attempt will ignore this restriction. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 30%. The number of movable allocations falling back is reduced by 12%. Link: http://lkml.kernel.org/r/20170307131545.28577-8-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, compaction: add migratetype to compact_controlVlastimil Babka2017-05-092-8/+8
| | | | | | | | | | | | | | Preparation patch. We are going to need migratetype at lower layers than compact_zone() and compact_finished(). Link: http://lkml.kernel.org/r/20170307131545.28577-7-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, compaction: change migrate_async_suitable() to suitable_migration_source()Vlastimil Babka2017-05-092-8/+16
| | | | | | | | | | | | | | Preparation for making the decisions more complex and depending on compact_control flags. No functional change. Link: http://lkml.kernel.org/r/20170307131545.28577-6-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, page_alloc: count movable pages when stealing from pageblockVlastimil Babka2017-05-093-21/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When stealing pages from pageblock of a different migratetype, we count how many free pages were stolen, and change the pageblock's migratetype if more than half of the pageblock was free. This might be too conservative, as there might be other pages that are not free, but were allocated with the same migratetype as our allocation requested. While we cannot determine the migratetype of allocated pages precisely (at least without the page_owner functionality enabled), we can count pages that compaction would try to isolate for migration - those are either on LRU or __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be done as part of free page stealing with little additional overhead. The page stealing code is changed so that it considers free pages plus pages of the "good" migratetype for the decision whether to change pageblock's migratetype. The result should be more accurate migratetype of pageblocks wrt the actual pages in the pageblocks, when stealing from semi-occupied pageblocks. This should help the efficiency of page grouping by mobility. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 47%. The number of movable allocations falling back to other pageblocks are increased by 55%, but these events don't cause permanent fragmentation, so the tradeoff should be positive. Later patches also offset the movable fallback increase to some extent. [akpm@linux-foundation.org: merge fix] Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, page_alloc: split smallest stolen page in fallbackVlastimil Babka2017-05-091-25/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __rmqueue_fallback() function is called when there's no free page of requested migratetype, and we need to steal from a different one. There are various heuristics to make this event infrequent and reduce permanent fragmentation. The main one is to try stealing from a pageblock that has the most free pages, and possibly steal them all at once and convert the whole pageblock. Precise searching for such pageblock would be expensive, so instead the heuristics walks the free lists from MAX_ORDER down to requested order and assumes that the block with highest-order free page is likely to also have the most free pages in total. Chances are that together with the highest-order page, we steal also pages of lower orders from the same block. But then we still split the highest order page. This is wasteful and can contribute to fragmentation instead of avoiding it. This patch thus changes __rmqueue_fallback() to just steal the page(s) and put them on the freelist of the requested migratetype, and only report whether it was successful. Then we pick (and eventually split) the smallest page with __rmqueue_smallest(). This all happens under zone lock, so nobody can steal it from us in the process. This should reduce fragmentation due to fallbacks. At worst we are only stealing a single highest-order page and waste some cycles by moving it between lists and then removing it, but fallback is not exactly hot path so that should not be a concern. As a side benefit the patch removes some duplicate code by reusing __rmqueue_smallest(). [vbabka@suse.cz: fix endless loop in the modified __rmqueue()] Link: http://lkml.kernel.org/r/59d71b35-d556-4fc9-ee2e-1574259282fd@suse.cz Link: http://lkml.kernel.org/r/20170307131545.28577-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, compaction: remove redundant watermark check in compact_finished()Vlastimil Babka2017-05-091-8/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When detecting whether compaction has succeeded in forming a high-order page, __compact_finished() employs a watermark check, followed by an own search for a suitable page in the freelists. This is not ideal for two reasons: - The watermark check also searches high-order freelists, but has a less strict criteria wrt fallback. It's therefore redundant and waste of cycles. This was different in the past when high-order watermark check attempted to apply reserves to high-order pages. - The watermark check might actually fail due to lack of order-0 pages. Compaction can't help with that, so there's no point in continuing because of that. It's possible that high-order page still exists and it terminates. This patch therefore removes the watermark check. This should save some cycles and terminate compaction sooner in some cases. Link: http://lkml.kernel.org/r/20170307131545.28577-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, compaction: reorder fields in struct compact_controlVlastimil Babka2017-05-091-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "try to reduce fragmenting fallbacks", v3. Last year, Johannes Weiner has reported a regression in page mobility grouping [1] and while the exact cause was not found, I've come up with some ways to improve it by reducing the number of allocations falling back to different migratetype and causing permanent fragmentation. The series was tested with mmtests stress-highalloc modified to do GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone balance check in prepare_kswapd_sleep" (without that, kcompactd indeed wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of each run, as the extfrag stats are quite volatile (note the stats below are sums, not averages, as it was less perl hacking for me). Success rate are the same, already high due to the low allocation order used, so I'm not including them. Compaction stats: (the patches are stacked, and I haven't measured the non-functional-changes patches separately) patch 1 patch 2 patch 3 patch 4 patch 7 patch 8 Compaction stalls 22449 24680 24846 19765 22059 17480 Compaction success 12971 14836 14608 10475 11632 8757 Compaction failures 9477 9843 10238 9290 10426 8722 Page migrate success 3109022 3370438 3312164 1695105 1608435 2111379 Page migrate failure 911588 1149065 1028264 1112675 1077251 1026367 Compaction pages isolated 7242983 8015530 7782467 4629063 4402787 5377665 Compaction migrate scanned 980838938 987367943 957690188 917647238 947155598 1018922197 Compaction free scanned 557926893 598946443 602236894 594024490 541169699 763651731 Compaction cost 10243 10578 10304 8286 8398 9440 Compaction stats are mostly within noise until patch 4, which decreases the number of compactions, and migrations. Part of that could be due to more pageblocks marked as unmovable, and async compaction skipping those. This changes a bit with patch 7, but not so much. Patch 8 increases free scanner stats and migrations, which comes from the changed termination criteria. Interestingly number of compactions decreases - probably the fully compacted pageblock satisfies multiple subsequent allocations, so it amortizes. Next comes the extfrag tracepoint, where "fragmenting" means that an allocation had to fallback to a pageblock of another migratetype which wasn't fully free (which is almost all of the fallbacks). I have locally added another tracepoint for "Page steal" into steal_suitable_fallback() which triggers in situations where we are allowed to do move_freepages_block(). If we decide to also do set_pageblock_migratetype(), it's "Pages steal with pageblock" with break down for which allocation migratetype we are stealing and from which fallback migratetype. The last part "due to counting" comes from patch 4 and counts the events where the counting of movable pages allowed us to change pageblock's migratetype, while the number of free pages alone wouldn't be enough to cross the threshold. patch 1 patch 2 patch 3 patch 4 patch 7 patch 8 Page alloc extfrag event 10155066 8522968 10164959 15622080 13727068 13140319 Extfrag fragmenting 10149231 8517025 10159040 15616925 13721391 13134792 Extfrag fragmenting for unmovable 159504 168500 184177 97835 70625 56948 Extfrag fragmenting unmovable placed with movable 153613 163549 172693 91740 64099 50917 Extfrag fragmenting unmovable placed with reclaim. 5891 4951 11484 6095 6526 6031 Extfrag fragmenting for reclaimable 4738 4829 6345 4822 5640 5378 Extfrag fragmenting reclaimable placed with movable 1836 1902 1851 1579 1739 1760 Extfrag fragmenting reclaimable placed with unmov. 2902 2927 4494 3243 3901 3618 Extfrag fragmenting for movable 9984989 8343696 9968518 15514268 13645126 13072466 Pages steal 179954 192291 210880 123254 94545 81486 Pages steal with pageblock 22153 18943 20154 33562 29969 33444 Pages steal with pageblock for unmovable 14350 12858 13256 20660 19003 20852 Pages steal with pageblock for unmovable from mov. 12812 11402 11683 19072 17467 19298 Pages steal with pageblock for unmovable from recl. 1538 1456 1573 1588 1536 1554 Pages steal with pageblock for movable 7114 5489 5965 11787 10012 11493 Pages steal with pageblock for movable from unmov. 6885 5291 5541 11179 9525 10885 Pages steal with pageblock for movable from recl. 229 198 424 608 487 608 Pages steal with pageblock for reclaimable 689 596 933 1115 954 1099 Pages steal with pageblock for reclaimable from unmov. 273 219 537 658 547 667 Pages steal with pageblock for reclaimable from mov. 416 377 396 457 407 432 Pages steal with pageblock due to counting 11834 10075 7530 ... for unmovable 8993 7381 4616 ... for movable 2792 2653 2851 ... for reclaimable 49 41 63 What we can see is that "Extfrag fragmenting for unmovable" and "... placed with movable" drops with almost each patch, which is good as we are polluting less movable pageblocks with unmovable pages. The most significant change is patch 4 with movable page counting. On the other hand it increases "Extfrag fragmenting for movable" by 50%. "Pages steal" drops though, so these movable allocation fallbacks find only small free pages and are not allowed to steal whole pageblocks back. "Pages steal with pageblock" raises, because the patch increases the chances of pageblock migratetype changes to happen. This affects all migratetypes. The summary is that patch 4 is not a clear win wrt these stats, but I believe that the tradeoff it makes is a good one. There's less pollution of movable pageblocks by unmovable allocations. There's less stealing between pageblock, and those that remain have higher chance of changing migratetype also the pageblock itself, so it should more faithfully reflect the migratetype of the pages within the pageblock. The increase of movable allocations falling back to unmovable pageblock might look dramatic, but those allocations can be migrated by compaction when needed, and other patches in the series (7-9) improve that aspect. Patches 7 and 8 continue the trend of reduced unmovable fallbacks and also reduce the impact on movable fallbacks from patch 4. [1] https://www.spinics.net/lists/linux-mm/msg114237.html This patch (of 8): While currently there are (mostly by accident) no holes in struct compact_control (on x86_64), but we are going to add more bool flags, so place them all together to the end of the structure. While at it, just order all fields from largest to smallest. Link: http://lkml.kernel.org/r/20170307131545.28577-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* docs: complete bumping minimal GNU Make version to 3.81Max Filippov2017-05-071-1/+1
| | | | | | | | | Commit 37d69ee30808 ("docs: bump minimal GNU Make version to 3.81") changes one entry of GNU make version in the changes.rst, there's still one more entry saying that one need version 3.80. Fix that. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2017-05-0614-105/+465
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "Various fixes for stable for CIFS/SMB3 especially for better interoperability for SMB3 to Macs. It also includes Pavel's improvements to SMB3 async i/o support (which is much faster now)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: add misssing SFM mapping for doublequote SMB3: Work around mount failure when using SMB3 dialect to Macs cifs: fix CIFS_IOC_GET_MNT_INFO oops CIFS: fix mapping of SFM_SPACE and SFM_PERIOD CIFS: fix oplock break deadlocks cifs: fix CIFS_ENUMERATE_SNAPSHOTS oops cifs: fix leak in FSCTL_ENUM_SNAPS response handling Set unicode flag on cifs echo request to avoid Mac error CIFS: Add asynchronous write support through kernel AIO CIFS: Add asynchronous read support through kernel AIO CIFS: Add asynchronous context to support kernel AIO cifs: fix IPv6 link local, with scope id, address parsing cifs: small underflow in cnvrtDosUnixTm()
| * CIFS: add misssing SFM mapping for doublequoteBjörn Jacke2017-05-052-0/+7
| | | | | | | | | | | | | | | | | | | | SFM is mapping doublequote to 0xF020 Without this patch creating files with doublequote fails to Windows/Mac Signed-off-by: Bjoern Jacke <bjacke@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> CC: stable <stable@vger.kernel.org>
| * SMB3: Work around mount failure when using SMB3 dialect to MacsSteve French2017-05-041-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Macs send the maximum buffer size in response on ioctl to validate negotiate security information, which causes us to fail the mount as the response buffer is larger than the expected response. Changed ioctl response processing to allow for padding of validate negotiate ioctl response and limit the maximum response size to maximum buffer size. Signed-off-by: Steve French <steve.french@primarydata.com> CC: Stable <stable@vger.kernel.org>
| * cifs: fix CIFS_IOC_GET_MNT_INFO oopsDavid Disseldorp2017-05-041-0/+2
| | | | | | | | | | | | | | | | | | An open directory may have a NULL private_data pointer prior to readdir. Fixes: 0de1f4c6f6c0 ("Add way to query server fs info for smb3") Cc: stable@vger.kernel.org Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Steve French <smfrench@gmail.com>
| * CIFS: fix mapping of SFM_SPACE and SFM_PERIODBjörn Jacke2017-05-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | - trailing space maps to 0xF028 - trailing period maps to 0xF029 This fix corrects the mapping of file names which have a trailing character that would otherwise be illegal (period or space) but is allowed by POSIX. Signed-off-by: Bjoern Jacke <bjacke@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
| * CIFS: fix oplock break deadlocksRabin Vincent2017-05-034-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the final cifsFileInfo_put() is called from cifsiod and an oplock break work is queued, lockdep complains loudly: ============================================= [ INFO: possible recursive locking detected ] 4.11.0+ #21 Not tainted --------------------------------------------- kworker/0:2/78 is trying to acquire lock: ("cifsiod"){++++.+}, at: flush_work+0x215/0x350 but task is already holding lock: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock("cifsiod"); lock("cifsiod"); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by kworker/0:2/78: #0: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0 #1: ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0 stack backtrace: CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21 Workqueue: cifsiod cifs_writev_complete Call Trace: dump_stack+0x85/0xc2 __lock_acquire+0x17dd/0x2260 ? match_held_lock+0x20/0x2b0 ? trace_hardirqs_off_caller+0x86/0x130 ? mark_lock+0xa6/0x920 lock_acquire+0xcc/0x260 ? lock_acquire+0xcc/0x260 ? flush_work+0x215/0x350 flush_work+0x236/0x350 ? flush_work+0x215/0x350 ? destroy_worker+0x170/0x170 __cancel_work_timer+0x17d/0x210 ? ___preempt_schedule+0x16/0x18 cancel_work_sync+0x10/0x20 cifsFileInfo_put+0x338/0x7f0 cifs_writedata_release+0x2a/0x40 ? cifs_writedata_release+0x2a/0x40 cifs_writev_complete+0x29d/0x850 ? preempt_count_sub+0x18/0xd0 process_one_work+0x304/0x8e0 worker_thread+0x9b/0x6a0 kthread+0x1b2/0x200 ? process_one_work+0x8e0/0x8e0 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x31/0x40 This is a real warning. Since the oplock is queued on the same workqueue this can deadlock if there is only one worker thread active for the workqueue (which will be the case during memory pressure when the rescuer thread is handling it). Furthermore, there is at least one other kind of hang possible due to the oplock break handling if there is only worker. (This can be reproduced without introducing memory pressure by having passing 1 for the max_active parameter of cifsiod.) cifs_oplock_break() can wait indefintely in the filemap_fdatawait() while the cifs_writev_complete() work is blocked: sysrq: SysRq : Show Blocked State task PC stack pid father kworker/0:1 D 0 16 2 0x00000000 Workqueue: cifsiod cifs_oplock_break Call Trace: __schedule+0x562/0xf40 ? mark_held_locks+0x4a/0xb0 schedule+0x57/0xe0 io_schedule+0x21/0x50 wait_on_page_bit+0x143/0x190 ? add_to_page_cache_lru+0x150/0x150 __filemap_fdatawait_range+0x134/0x190 ? do_writepages+0x51/0x70 filemap_fdatawait_range+0x14/0x30 filemap_fdatawait+0x3b/0x40 cifs_oplock_break+0x651/0x710 ? preempt_count_sub+0x18/0xd0 process_one_work+0x304/0x8e0 worker_thread+0x9b/0x6a0 kthread+0x1b2/0x200 ? process_one_work+0x8e0/0x8e0 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x31/0x40 dd D 0 683 171 0x00000000 Call Trace: __schedule+0x562/0xf40 ? mark_held_locks+0x29/0xb0 schedule+0x57/0xe0 io_schedule+0x21/0x50 wait_on_page_bit+0x143/0x190 ? add_to_page_cache_lru+0x150/0x150 __filemap_fdatawait_range+0x134/0x190 ? do_writepages+0x51/0x70 filemap_fdatawait_range+0x14/0x30 filemap_fdatawait+0x3b/0x40 filemap_write_and_wait+0x4e/0x70 cifs_flush+0x6a/0xb0 filp_close+0x52/0xa0 __close_fd+0xdc/0x150 SyS_close+0x33/0x60 entry_SYSCALL_64_fastpath+0x1f/0xbe Showing all locks held in the system: 2 locks held by kworker/0:1/16: #0: ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0 #1: ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0 Showing busy workqueues and worker pools: workqueue cifsiod: flags=0xc pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1 in-flight: 16:cifs_oplock_break delayed: cifs_writev_complete, cifs_echo_request pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3 Fix these problems by creating a a new workqueue (with a rescuer) for the oplock break work. Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
| * cifs: fix CIFS_ENUMERATE_SNAPSHOTS oopsDavid Disseldorp2017-05-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | As with 618763958b22, an open directory may have a NULL private_data pointer prior to readdir. CIFS_ENUMERATE_SNAPSHOTS must check for this before dereference. Fixes: 834170c85978 ("Enable previous version support") Signed-off-by: David Disseldorp <ddiss@suse.de> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
| * cifs: fix leak in FSCTL_ENUM_SNAPS response handlingDavid Disseldorp2017-05-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | The server may respond with success, and an output buffer less than sizeof(struct smb_snapshot_array) in length. Do not leak the output buffer in this case. Fixes: 834170c85978 ("Enable previous version support") Signed-off-by: David Disseldorp <ddiss@suse.de> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
| * Set unicode flag on cifs echo request to avoid Mac errorSteve French2017-05-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | Mac requires the unicode flag to be set for cifs, even for the smb echo request (which doesn't have strings). Without this Mac rejects the periodic echo requests (when mounting with cifs) that we use to check if server is down Signed-off-by: Steve French <smfrench@gmail.com> CC: Stable <stable@vger.kernel.org>
| * CIFS: Add asynchronous write support through kernel AIOPavel Shilovsky2017-05-022-51/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to process write calls passed by io_submit() asynchronously. It based on the previously introduced async context that allows to process i/o responses in a separate thread and return the caller immediately for asynchronous calls. This improves writing performance of single threaded applications with increasing of i/o queue depth size. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>