summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2019-03-247-59/+58Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous ext4 bug fixes for 5.1" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: prohibit fstrim in norecovery mode ext4: cleanup bh release code in ext4_ind_remove_space() ext4: brelse all indirect buffer in ext4_ind_remove_space() ext4: report real fs size after failed resize ext4: add missing brelse() in add_new_gdb_meta_bg() ext4: remove useless ext4_pin_inode() ext4: avoid panic during forced reboot ext4: fix data corruption caused by unaligned direct AIO ext4: fix NULL pointer dereference while journal is aborted
| * ext4: prohibit fstrim in norecovery modeDarrick J. Wong2019-03-231-0/+7
| | | | | | | | | | | | | | | | | | | | The ext4 fstrim implementation uses the block bitmaps to find free space that can be discarded. If we haven't replayed the journal, the bitmaps will be stale and we absolutely *cannot* use stale metadata to zap the underlying storage. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: cleanup bh release code in ext4_ind_remove_space()zhangyi (F)2019-03-231-25/+22Star
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we are releasing the indirect buffer where we are done with it in ext4_ind_remove_space(), so we can see the brelse() and BUFFER_TRACE() everywhere. It seems fragile and hard to read, and we may probably forget to release the buffer some day. This patch cleans up the code by putting of the code which releases the buffers to the end of the function. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * ext4: brelse all indirect buffer in ext4_ind_remove_space()zhangyi (F)2019-03-231-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All indirect buffers get by ext4_find_shared() should be released no mater the branch should be freed or not. But now, we forget to release the lower depth indirect buffers when removing space from the same higher depth indirect block. It will lead to buffer leak and futher more, it may lead to quota information corruption when using old quota, consider the following case. - Create and mount an empty ext4 filesystem without extent and quota features, - quotacheck and enable the user & group quota, - Create some files and write some data to them, and then punch hole to some files of them, it may trigger the buffer leak problem mentioned above. - Disable quota and run quotacheck again, it will create two new aquota files and write the checked quota information to them, which probably may reuse the freed indirect block(the buffer and page cache was not freed) as data block. - Enable quota again, it will invoke vfs_load_quota_inode()->invalidate_bdev() to try to clean unused buffers and pagecache. Unfortunately, because of the buffer of quota data block is still referenced, quota code cannot read the up to date quota info from the device and lead to quota information corruption. This problem can be reproduced by xfstests generic/231 on ext3 file system or ext4 file system without extent and quota features. This patch fix this problem by releasing the missing indirect buffers, in ext4_ind_remove_space(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
| * ext4: report real fs size after failed resizeLukas Czerner2019-03-151-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when the file system resize using ext4_resize_fs() fails it will report into log that "resized filesystem to <requested block count>". However this may not be true in the case of failure. Use the current block count as returned by ext4_blocks_count() to report the block count. Additionally, report a warning that "error occurred during file system resize" Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: add missing brelse() in add_new_gdb_meta_bg()Lukas Czerner2019-03-151-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | Currently in add_new_gdb_meta_bg() there is a missing brelse of gdb_bh in case ext4_journal_get_write_access() fails. Additionally kvfree() is missing in the same error path. Fix it by moving the ext4_journal_get_write_access() before the ext4 sb update as Ted suggested and release n_group_desc and gdb_bh in case it fails. Fixes: 61a9c11e5e7a ("ext4: add missing brelse() add_new_gdb_meta_bg()'s error path") Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: remove useless ext4_pin_inode()Jason Yan2019-03-151-30/+0Star
| | | | | | | | | | | | | | | | This function is never used from the beginning (and is commented out); let's remove it. Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: avoid panic during forced rebootJan Kara2019-03-151-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When admin calls "reboot -f" - i.e., does a hard system reboot by directly calling reboot(2) - ext4 filesystem mounted with errors=panic can panic the system. This happens because the underlying device gets disabled without unmounting the filesystem and thus some syscall running in parallel to reboot(2) can result in the filesystem getting IO errors. This is somewhat surprising to the users so try improve the behavior by switching to errors=remount-ro behavior when the system is running reboot(2). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix data corruption caused by unaligned direct AIOLukas Czerner2019-03-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ext4 needs to serialize unaligned direct AIO because the zeroing of partial blocks of two competing unaligned AIOs can result in data corruption. However it decides not to serialize if the potentially unaligned aio is past i_size with the rationale that no pending writes are possible past i_size. Unfortunately if the i_size is not block aligned and the second unaligned write lands past i_size, but still into the same block, it has the potential of corrupting the previous unaligned write to the same block. This is (very simplified) reproducer from Frank // 41472 = (10 * 4096) + 512 // 37376 = 41472 - 4096 ftruncate(fd, 41472); io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376); io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472); io_submit(io_ctx, 1, &iocbs[1]); io_submit(io_ctx, 1, &iocbs[2]); io_getevents(io_ctx, 2, 2, events, NULL); Without this patch the 512B range from 40960 up to the start of the second unaligned write (41472) is going to be zeroed overwriting the data written by the first write. This is a data corruption. 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 * 0000a000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 With this patch the data corruption is avoided because we will recognize the unaligned_aio and wait for the unwritten extent conversion. 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 * 0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 * 0000b200 Reported-by: Frank Sorenson <fsorenso@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO") Cc: stable@vger.kernel.org
| * ext4: fix NULL pointer dereference while journal is abortedJiufei Xue2019-03-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We see the following NULL pointer dereference while running xfstests generic/475: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10 RIP: 0010:ext4_do_update_inode+0x4ec/0x760 ... Call Trace: ? jbd2_journal_get_write_access+0x42/0x50 ? __ext4_journal_get_write_access+0x2c/0x70 ? ext4_truncate+0x186/0x3f0 ext4_mark_iloc_dirty+0x61/0x80 ext4_mark_inode_dirty+0x62/0x1b0 ext4_truncate+0x186/0x3f0 ? unmap_mapping_pages+0x56/0x100 ext4_setattr+0x817/0x8b0 notify_change+0x1df/0x430 do_truncate+0x5e/0x90 ? generic_permission+0x12b/0x1a0 This is triggered because the NULL pointer handle->h_transaction was dereferenced in function ext4_update_inode_fsync_trans(). I found that the h_transaction was set to NULL in jbd2__journal_restart but failed to attached to a new transaction while the journal is aborted. Fix this by checking the handle before updating the inode. Fixes: b436b9bef84d ("ext4: Wait for proper transaction commit on fsync") Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: stable@kernel.org
* | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2019-03-241-0/+27
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 fixes: - Prevent potential NULL pointer dereferences in the HPET and HyperV code - Exclude the GART aperture from /proc/kcore to prevent kernel crashes on access - Use the correct macros for Cyrix I/O on Geode processors - Remove yet another kernel address printk leak - Announce microcode reload completion as requested by quite some people. Microcode loading has become popular recently. - Some 'Make Clang' happy fixlets - A few cleanups for recently added code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/gart: Exclude GART aperture from kcore x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error x86/mm/pti: Make local symbols static x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processors x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors x86/microcode: Announce reload operation's completion x86/hyperv: Prevent potential NULL pointer dereference x86/hpet: Prevent potential NULL pointer dereference x86/lib: Fix indentation issue, remove extra tab x86/boot: Restrict header scope to make Clang happy x86/mm: Don't leak kernel addresses x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
| * | x86/gart: Exclude GART aperture from kcoreKairui Song2019-03-231-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On machines where the GART aperture is mapped over physical RAM, /proc/kcore contains the GART aperture range. Accessing the GART range via /proc/kcore results in a kernel crash. vmcore used to have the same issue, until it was fixed with commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging existing hook infrastructure in vmcore to let /proc/vmcore return zeroes when attempting to read the aperture region, and so it won't read from the actual memory. Apply the same workaround for kcore. First implement the same hook infrastructure for kcore, then reuse the hook functions introduced in the previous vmcore fix. Just with some minor adjustment, rename some functions for more general usage, and simplify the hook infrastructure a bit as there is no module usage yet. Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Dave Young <dyoung@redhat.com> Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
* | | Merge tag '5.1-rc1-cifs-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2019-03-246-70/+102
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull smb3 fixes from Steve French: - two fixes for stable for guest mount problems with smb3.1.1 - two fixes for crediting (SMB3 flow control) on resent requests - a byte range lock leak fix - two fixes for incorrect rc mappings * tag '5.1-rc1-cifs-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number SMB3: Fix SMB3.1.1 guest mounts to Samba cifs: Fix slab-out-of-bounds when tracing SMB tcon cifs: allow guest mounts to work for smb3.11 fix incorrect error code mapping for OBJECTID_NOT_FOUND cifs: fix that return -EINVAL when do dedupe operation CIFS: Fix an issue with re-sending rdata when transport returning -EAGAIN CIFS: Fix an issue with re-sending wdata when transport returning -EAGAIN
| * | | cifs: update internal module version numberSteve French2019-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | To 2.19 Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | SMB3: Fix SMB3.1.1 guest mounts to SambaSteve French2019-03-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Workaround problem with Samba responses to SMB3.1.1 null user (guest) mounts. The server doesn't set the expected flag in the session setup response so we have to do a similar check to what is done in smb3_validate_negotiate where we also check if the user is a null user (but not sec=krb5 since username might not be passed in on mount for Kerberos case). Note that the commit below tightened the conditions and forced signing for the SMB2-TreeConnect commands as per MS-SMB2. However, this should only apply to normal user sessions and not for cases where there is no user (even if server forgets to set the flag in the response) since we don't have anything useful to sign with. This is especially important now that the more secure SMB3.1.1 protocol is in the default dialect list. An earlier patch ("cifs: allow guest mounts to work for smb3.11") fixed the guest mounts to Windows. Fixes: 6188f28bf608 ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara <palcantara@suse.de> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: Fix slab-out-of-bounds when tracing SMB tconPaulo Alcantara (SUSE)2019-03-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following KASAN report: [ 779.044746] BUG: KASAN: slab-out-of-bounds in string+0xab/0x180 [ 779.044750] Read of size 1 at addr ffff88814f327968 by task trace-cmd/2812 [ 779.044756] CPU: 1 PID: 2812 Comm: trace-cmd Not tainted 5.1.0-rc1+ #62 [ 779.044760] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014 [ 779.044761] Call Trace: [ 779.044769] dump_stack+0x5b/0x90 [ 779.044775] ? string+0xab/0x180 [ 779.044781] print_address_description+0x6c/0x23c [ 779.044787] ? string+0xab/0x180 [ 779.044792] ? string+0xab/0x180 [ 779.044797] kasan_report.cold.3+0x1a/0x32 [ 779.044803] ? string+0xab/0x180 [ 779.044809] string+0xab/0x180 [ 779.044816] ? widen_string+0x160/0x160 [ 779.044822] ? vsnprintf+0x5bf/0x7f0 [ 779.044829] vsnprintf+0x4e7/0x7f0 [ 779.044836] ? pointer+0x4a0/0x4a0 [ 779.044841] ? seq_buf_vprintf+0x79/0xc0 [ 779.044848] seq_buf_vprintf+0x62/0xc0 [ 779.044855] trace_seq_printf+0x113/0x210 [ 779.044861] ? trace_seq_puts+0x110/0x110 [ 779.044867] ? trace_raw_output_prep+0xd8/0x110 [ 779.044876] trace_raw_output_smb3_tcon_class+0x9f/0xc0 [ 779.044882] print_trace_line+0x377/0x890 [ 779.044888] ? tracing_buffers_read+0x300/0x300 [ 779.044893] ? ring_buffer_read+0x58/0x70 [ 779.044899] s_show+0x6e/0x140 [ 779.044906] seq_read+0x505/0x6a0 [ 779.044913] vfs_read+0xaf/0x1b0 [ 779.044919] ksys_read+0xa1/0x130 [ 779.044925] ? kernel_write+0xa0/0xa0 [ 779.044931] ? __do_page_fault+0x3d5/0x620 [ 779.044938] do_syscall_64+0x63/0x150 [ 779.044944] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 779.044949] RIP: 0033:0x7f62c2c2db31 [ 779.044955] Code: fe ff ff 48 8d 3d 17 9e 09 00 48 83 ec 08 e8 96 02 02 00 66 0f 1f 44 00 00 8b 05 fa fc 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89 [ 779.044958] RSP: 002b:00007ffd6e116678 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 779.044964] RAX: ffffffffffffffda RBX: 0000560a38be9260 RCX: 00007f62c2c2db31 [ 779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003 [ 779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003 [ 779.044969] RBP: 00007f62c2ef5420 R08: 0000000000000000 R09: 0000000000000003 [ 779.044972] R10: ffffffffffffffa8 R11: 0000000000000246 R12: 00007ffd6e116710 [ 779.044975] R13: 0000000000002000 R14: 0000000000000d68 R15: 0000000000002000 [ 779.044981] Allocated by task 1257: [ 779.044987] __kasan_kmalloc.constprop.5+0xc1/0xd0 [ 779.044992] kmem_cache_alloc+0xad/0x1a0 [ 779.044997] getname_flags+0x6c/0x2a0 [ 779.045003] user_path_at_empty+0x1d/0x40 [ 779.045008] do_faccessat+0x12a/0x330 [ 779.045012] do_syscall_64+0x63/0x150 [ 779.045017] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 779.045019] Freed by task 1257: [ 779.045023] __kasan_slab_free+0x12e/0x180 [ 779.045029] kmem_cache_free+0x85/0x1b0 [ 779.045034] filename_lookup.part.70+0x176/0x250 [ 779.045039] do_faccessat+0x12a/0x330 [ 779.045043] do_syscall_64+0x63/0x150 [ 779.045048] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 779.045052] The buggy address belongs to the object at ffff88814f326600 which belongs to the cache names_cache of size 4096 [ 779.045057] The buggy address is located 872 bytes to the right of 4096-byte region [ffff88814f326600, ffff88814f327600) [ 779.045058] The buggy address belongs to the page: [ 779.045062] page:ffffea00053cc800 count:1 mapcount:0 mapping:ffff88815b191b40 index:0x0 compound_mapcount: 0 [ 779.045067] flags: 0x200000000010200(slab|head) [ 779.045075] raw: 0200000000010200 dead000000000100 dead000000000200 ffff88815b191b40 [ 779.045081] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 779.045083] page dumped because: kasan: bad access detected [ 779.045085] Memory state around the buggy address: [ 779.045089] ffff88814f327800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045093] ffff88814f327880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045097] >ffff88814f327900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045099] ^ [ 779.045103] ffff88814f327980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045107] ffff88814f327a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045109] ================================================================== [ 779.045110] Disabling lock debugging due to kernel taint Correctly assign tree name str for smb3_tcon event. Signed-off-by: Paulo Alcantara (SUSE) <paulo@paulo.ac> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: allow guest mounts to work for smb3.11Ronnie Sahlberg2019-03-231-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix Guest/Anonymous sessions so that they work with SMB 3.11. The commit noted below tightened the conditions and forced signing for the SMB2-TreeConnect commands as per MS-SMB2. However, this should only apply to normal user sessions and not for Guest/Anonumous sessions. Fixes: 6188f28bf608 ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares") Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | fix incorrect error code mapping for OBJECTID_NOT_FOUNDSteve French2019-03-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was mapped to EIO which can be confusing when user space queries for an object GUID for an object for which the server file system doesn't support (or hasn't saved one). As Amir Goldstein suggested this is similar to ENOATTR (equivalently ENODATA in Linux errno definitions) so changing NT STATUS code mapping for OBJECTID_NOT_FOUND to ENODATA. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Amir Goldstein <amir73il@gmail.com>
| * | | cifs: fix that return -EINVAL when do dedupe operationXiaoli Feng2019-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dedupe_file_range operations is combiled into remap_file_range. But it's always skipped for dedupe operations in function cifs_remap_file_range. Example to test: Before this patch: # dd if=/dev/zero of=cifs/file bs=1M count=1 # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file XFS_IOC_FILE_EXTENT_SAME: Invalid argument After this patch: # dd if=/dev/zero of=cifs/file bs=1M count=1 # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file XFS_IOC_FILE_EXTENT_SAME: Operation not supported Influence for xfstests: generic/091 generic/112 generic/127 generic/263 These tests report this error "do_copy_range:: Invalid argument" instead of "FIDEDUPERANGE: Invalid argument". Because there are still two bugs cause these test failed. https://bugzilla.kernel.org/show_bug.cgi?id=202935 https://bugzilla.kernel.org/show_bug.cgi?id=202785 Signed-off-by: Xiaoli Feng <fengxiaoli0714@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | CIFS: Fix an issue with re-sending rdata when transport returning -EAGAINLong Li2019-03-231-30/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sending a rdata, transport may return -EAGAIN. In this case we should re-obtain credits because the session may have been reconnected. Change in v2: adjust_credits before re-sending Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
| * | | CIFS: Fix an issue with re-sending wdata when transport returning -EAGAINLong Li2019-03-231-32/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sending a wdata, transport may return -EAGAIN. In this case we should re-obtain credits because the session may have been reconnected. Change in v2: adjust_credits before re-sending Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
* | | | Merge tag 'io_uring-20190323' of git://git.kernel.dk/linux-blockLinus Torvalds2019-03-233-233/+230Star
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes and improvements from Jens Axboe: "The first five in this series are heavily inspired by the work Al did on the aio side to fix the races there. The last two re-introduce a feature that was in io_uring before it got merged, but which I pulled since we didn't have a good way to have BVEC iters that already have a stable reference. These aren't necessarily related to block, it's just how io_uring pins fixed buffers" * tag 'io_uring-20190323' of git://git.kernel.dk/linux-block: block: add BIO_NO_PAGE_REF flag iov_iter: add ITER_BVEC_FLAG_NO_REF flag io_uring: mark me as the maintainer io_uring: retry bulk slab allocs as single allocs io_uring: fix poll races io_uring: fix fget/fput handling io_uring: add prepped flag io_uring: make io_read/write return an integer io_uring: use regular request ref counts
| * | | block: add BIO_NO_PAGE_REF flagJens Axboe2019-03-182-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If bio_iov_iter_get_pages() is called on an iov_iter that is flagged with NO_REF, then we don't need to add a page reference for the pages that we add. Add BIO_NO_PAGE_REF to track this in the bio, so IO completion knows not to drop a reference to these pages. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | iov_iter: add ITER_BVEC_FLAG_NO_REF flagJens Axboe2019-03-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For ITER_BVEC, if we're holding on to kernel pages, the caller doesn't need to grab a reference to the bvec pages, and drop that same reference on IO completion. This is essentially safe for any ITER_BVEC, but some use cases end up reusing pages and uncondtionally dropping a page reference on completion. And example of that is sendfile(2), that ends up being a splice_in + splice_out on the pipe pages. Add a flag that tells us it's fine to not grab a page reference to the bvec pages, since that caller knows not to drop a reference when it's done with the pages. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: retry bulk slab allocs as single allocsJens Axboe2019-03-181-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've seen cases where bulk alloc fails, since the bulk alloc API is all-or-nothing - either we get the number we ask for, or it returns 0 as number of entries. If we fail a batch bulk alloc, retry a "normal" kmem_cache_alloc() and just use that instead of failing with -EAGAIN. While in there, ensure we use GFP_KERNEL. That was an oversight in the original code, when we switched away from GFP_ATOMIC. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: fix poll racesJens Axboe2019-03-151-55/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a straight port of Al's fix for the aio poll implementation, since the io_uring version is heavily based on that. The below description is almost straight from that patch, just modified to fit the io_uring situation. io_poll() has to cope with several unpleasant problems: * requests that might stay around indefinitely need to be made visible for io_cancel(2); that must not be done to a request already completed, though. * in cases when ->poll() has placed us on a waitqueue, wakeup might have happened (and request completed) before ->poll() returns. * worse, in some early wakeup cases request might end up re-added into the queue later - we can't treat "woken up and currently not in the queue" as "it's not going to stick around indefinitely" * ... moreover, ->poll() might have decided not to put it on any queues to start with, and that needs to be distinguished from the previous case * ->poll() might have tried to put us on more than one queue. Only the first will succeed for io poll, so we might end up missing wakeups. OTOH, we might very well notice that only after the wakeup hits and request gets completed (all before ->poll() gets around to the second poll_wait()). In that case it's too late to decide that we have an error. req->woken was an attempt to deal with that. Unfortunately, it was broken. What we need to keep track of is not that wakeup has happened - the thing might come back after that. It's that async reference is already gone and won't come back, so we can't (and needn't) put the request on the list of cancellables. The easiest case is "request hadn't been put on any waitqueues"; we can tell by seeing NULL apt.head, and in that case there won't be anything async. We should either complete the request ourselves (if vfs_poll() reports anything of interest) or return an error. In all other cases we get exclusion with wakeups by grabbing the queue lock. If request is currently on queue and we have something interesting from vfs_poll(), we can steal it and complete the request ourselves. If it's on queue and vfs_poll() has not reported anything interesting, we either put it on the cancellable list, or, if we know that it hadn't been put on all queues ->poll() wanted it on, we steal it and return an error. If it's _not_ on queue, it's either been already dealt with (in which case we do nothing), or there's io_poll_complete_work() about to be executed. In that case we either put it on the cancellable list, or, if we know it hadn't been put on all queues ->poll() wanted it on, simulate what cancel would've done. Fixes: 221c5eb23382 ("io_uring: add support for IORING_OP_POLL") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: fix fget/fput handlingJens Axboe2019-03-151-133/+97Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This isn't a straight port of commit 84c4e1f89fef for aio.c, since io_uring doesn't use files in exactly the same way. But it's pretty close. See the commit message for that commit. This essentially fixes a use-after-free with the poll command handling, but it takes cue from Linus's approach to just simplifying the file handling. We move the setup of the file into a higher level location, so the individual commands don't have to deal with it. And then we release the reference when we free the associated io_kiocb. Fixes: 221c5eb23382 ("io_uring: add support for IORING_OP_POLL") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: add prepped flagJens Axboe2019-03-151-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use the fact that if ->ki_filp is already set, then we've done the prep. In preparation for moving the file assignment earlier, use a separate flag to tell whether the request has been prepped for IO or not. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: make io_read/write return an integerJens Axboe2019-03-151-10/+9Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The callers all convert to an integer, and we only return 0/-ERROR anyway. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | io_uring: use regular request ref countsJens Axboe2019-03-151-19/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of the special casing of "normal" requests not having any references to the io_kiocb. We initialize the ref count to 2, one for the submission side, and one or the completion side. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge tag 'fixes_for_v5.1-rc2' of ↵Linus Torvalds2019-03-213-4/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull udf fixes from Jan Kara: "Two udf error handling fixes" * tag 'fixes_for_v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Propagate errors from udf_truncate_extents() udf: Fix crash on IO error during truncate
| * | | | udf: Propagate errors from udf_truncate_extents()Jan Kara2019-03-183-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make udf_truncate_extents() properly propagate errors to its callers and let udf_setsize() handle the error properly as well. This lets userspace know in case there's some error when truncating blocks. Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | udf: Fix crash on IO error during truncateJan Kara2019-03-181-0/+3
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When truncate(2) hits IO error when reading indirect extent block the code just bugs with: kernel BUG at linux-4.15.0/fs/udf/truncate.c:249! ... Fix the problem by bailing out cleanly in case of IO error. CC: stable@vger.kernel.org Reported-by: jean-luc malet <jeanluc.malet@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* | | | Merge tag 'fsnotify_for_v5.1-rc2' of ↵Linus Torvalds2019-03-212-3/+16
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "One inotify and one fanotify fix" * tag 'fsnotify_for_v5.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: Allow copying of file handle to userspace inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch()
| * | | fanotify: Allow copying of file handle to userspaceJan Kara2019-03-191-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When file handle is embedded inside fanotify_event and usercopy checks are enabled, we get a warning like: Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'fanotify_event' (offset 40, size 8)! WARNING: CPU: 1 PID: 7649 at mm/usercopy.c:78 usercopy_warn+0xeb/0x110 mm/usercopy.c:78 Annotate handling in fanotify_event properly to mark copying it to userspace is fine. Reported-by: syzbot+2c49971e251e36216d1f@syzkaller.appspotmail.com Fixes: a8b13aa20afb ("fanotify: enable FAN_REPORT_FID init flag") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch()ZhangXiaoxu2019-03-111-2/+5
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") forgot to call fsnotify_put_mark() with IN_MASK_CREATE after fsnotify_find_mark() Fixes: 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()") Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>
* | | Merge tag '9p-for-5.1' of git://github.com/martinetd/linuxLinus Torvalds2019-03-175-30/+53
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull 9p updates from Dominique Martinet: "Here is a 9p update for 5.1; there honestly hasn't been much. Two fixes (leak on invalid mount argument and possible deadlock on i_size update on 32bit smp) and a fall-through warning cleanup" * tag '9p-for-5.1' of git://github.com/martinetd/linux: 9p/net: fix memory leak in p9_client_create 9p: use inode->i_lock to protect i_size_write() under 32-bit 9p: mark expected switch fall-through
| * | | 9p: use inode->i_lock to protect i_size_write() under 32-bitHou Tao2019-03-035-30/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use inode->i_lock to protect i_size_write(), else i_size_read() in generic_fillattr() may loop infinitely in read_seqcount_begin() when multiple processes invoke v9fs_vfs_getattr() or v9fs_vfs_getattr_dotl() simultaneously under 32-bit SMP environment, and a soft lockup will be triggered as show below: watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [stat:2217] Modules linked in: CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system PC is at generic_fillattr+0x104/0x108 LR is at 0xec497f00 pc : [<802b8898>] lr : [<ec497f00>] psr: 200c0013 sp : ec497e20 ip : ed608030 fp : ec497e3c r10: 00000000 r9 : ec497f00 r8 : ed608030 r7 : ec497ebc r6 : ec497f00 r5 : ee5c1550 r4 : ee005780 r3 : 0000052d r2 : 00000000 r1 : ec497f00 r0 : ed608030 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: ac48006a DAC: 00000051 CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4 Hardware name: Generic DT based system Backtrace: [<8010d974>] (dump_backtrace) from [<8010dc88>] (show_stack+0x20/0x24) [<8010dc68>] (show_stack) from [<80a1d194>] (dump_stack+0xb0/0xdc) [<80a1d0e4>] (dump_stack) from [<80109f34>] (show_regs+0x1c/0x20) [<80109f18>] (show_regs) from [<801d0a80>] (watchdog_timer_fn+0x280/0x2f8) [<801d0800>] (watchdog_timer_fn) from [<80198658>] (__hrtimer_run_queues+0x18c/0x380) [<801984cc>] (__hrtimer_run_queues) from [<80198e60>] (hrtimer_run_queues+0xb8/0xf0) [<80198da8>] (hrtimer_run_queues) from [<801973e8>] (run_local_timers+0x28/0x64) [<801973c0>] (run_local_timers) from [<80197460>] (update_process_times+0x3c/0x6c) [<80197424>] (update_process_times) from [<801ab2b8>] (tick_nohz_handler+0xe0/0x1bc) [<801ab1d8>] (tick_nohz_handler) from [<80843050>] (arch_timer_handler_virt+0x38/0x48) [<80843018>] (arch_timer_handler_virt) from [<80180a64>] (handle_percpu_devid_irq+0x8c/0x240) [<801809d8>] (handle_percpu_devid_irq) from [<8017ac20>] (generic_handle_irq+0x34/0x44) [<8017abec>] (generic_handle_irq) from [<8017b344>] (__handle_domain_irq+0x6c/0xc4) [<8017b2d8>] (__handle_domain_irq) from [<801022e0>] (gic_handle_irq+0x4c/0x88) [<80102294>] (gic_handle_irq) from [<80101a30>] (__irq_svc+0x70/0x98) [<802b8794>] (generic_fillattr) from [<8056b284>] (v9fs_vfs_getattr_dotl+0x74/0xa4) [<8056b210>] (v9fs_vfs_getattr_dotl) from [<802b8904>] (vfs_getattr_nosec+0x68/0x7c) [<802b889c>] (vfs_getattr_nosec) from [<802b895c>] (vfs_getattr+0x44/0x48) [<802b8918>] (vfs_getattr) from [<802b8a74>] (vfs_statx+0x9c/0xec) [<802b89d8>] (vfs_statx) from [<802b9428>] (sys_lstat64+0x48/0x78) [<802b93e0>] (sys_lstat64) from [<80101000>] (ret_fast_syscall+0x0/0x28) [dominique.martinet@cea.fr: updated comment to not refer to a function in another subsystem] Link: http://lkml.kernel.org/r/20190124063514.8571-2-houtao1@huawei.com Cc: stable@vger.kernel.org Fixes: 7549ae3e81cc ("9p: Use the i_size_[read, write]() macros instead of using inode->i_size directly.") Reported-by: Xing Gaopeng <xingaopeng@huawei.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
* | | | Merge tag 'pidfd-v5.1-rc1' of ↵Linus Torvalds2019-03-161-0/+9
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd system call from Christian Brauner: "This introduces the ability to use file descriptors from /proc/<pid>/ as stable handles on struct pid. Even if a pid is recycled the handle will not change. For a start these fds can be used to send signals to the processes they refer to. With the ability to use /proc/<pid> fds as stable handles on struct pid we can fix a long-standing issue where after a process has exited its pid can be reused by another process. If a caller sends a signal to a reused pid it will end up signaling the wrong process. With this patchset we enable a variety of use cases. One obvious example is that we can now safely delegate an important part of process management - sending signals - to processes other than the parent of a given process by sending file descriptors around via scm rights and not fearing that the given process will have been recycled in the meantime. It also allows for easy testing whether a given process is still alive or not by sending signal 0 to a pidfd which is quite handy. There has been some interest in this feature e.g. from systems management (systemd, glibc) and container managers. I have requested and gotten comments from glibc to make sure that this syscall is suitable for their needs as well. In the future I expect it to take on most other pid-based signal syscalls. But such features are left for the future once they are needed. This has been sitting in linux-next for quite a while and has not caused any issues. It comes with selftests which verify basic functionality and also test that a recycled pid cannot be signaled via a pidfd. Jon has written about a prior version of this patchset. It should cover the basic functionality since not a lot has changed since then: https://lwn.net/Articles/773459/ The commit message for the syscall itself is extensively documenting the syscall, including it's functionality and extensibility" * tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: selftests: add tests for pidfd_send_signal() signal: add pidfd_send_signal() syscall
| * | | | signal: add pidfd_send_signal() syscallChristian Brauner2019-03-051-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kill() syscall operates on process identifiers (pid). After a process has exited its pid can be reused by another process. If a caller sends a signal to a reused pid it will end up signaling the wrong process. This issue has often surfaced and there has been a push to address this problem [1]. This patch uses file descriptors (fd) from proc/<pid> as stable handles on struct pid. Even if a pid is recycled the handle will not change. The fd can be used to send signals to the process it refers to. Thus, the new syscall pidfd_send_signal() is introduced to solve this problem. Instead of pids it operates on process fds (pidfd). /* prototype and argument /* long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags); /* syscall number 424 */ The syscall number was chosen to be 424 to align with Arnd's rework in his y2038 to minimize merge conflicts (cf. [25]). In addition to the pidfd and signal argument it takes an additional siginfo_t and flags argument. If the siginfo_t argument is NULL then pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo(). The flags argument is added to allow for future extensions of this syscall. It currently needs to be passed as 0. Failing to do so will cause EINVAL. /* pidfd_send_signal() replaces multiple pid-based syscalls */ The pidfd_send_signal() syscall currently takes on the job of rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a positive pid is passed to kill(2). It will however be possible to also replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended. /* sending signals to threads (tid) and process groups (pgid) */ Specifically, the pidfd_send_signal() syscall does currently not operate on process groups or threads. This is left for future extensions. In order to extend the syscall to allow sending signal to threads and process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and PIDFD_TYPE_TID) should be added. This implies that the flags argument will determine what is signaled and not the file descriptor itself. Put in other words, grouping in this api is a property of the flags argument not a property of the file descriptor (cf. [13]). Clarification for this has been requested by Eric (cf. [19]). When appropriate extensions through the flags argument are added then pidfd_send_signal() can additionally replace the part of kill(2) which operates on process groups as well as the tgkill(2) and rt_tgsigqueueinfo(2) syscalls. How such an extension could be implemented has been very roughly sketched in [14], [15], and [16]. However, this should not be taken as a commitment to a particular implementation. There might be better ways to do it. Right now this is intentionally left out to keep this patchset as simple as possible (cf. [4]). /* naming */ The syscall had various names throughout iterations of this patchset: - procfd_signal() - procfd_send_signal() - taskfd_send_signal() In the last round of reviews it was pointed out that given that if the flags argument decides the scope of the signal instead of different types of fds it might make sense to either settle for "procfd_" or "pidfd_" as prefix. The community was willing to accept either (cf. [17] and [18]). Given that one developer expressed strong preference for the "pidfd_" prefix (cf. [13]) and with other developers less opinionated about the name we should settle for "pidfd_" to avoid further bikeshedding. The "_send_signal" suffix was chosen to reflect the fact that the syscall takes on the job of multiple syscalls. It is therefore intentional that the name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer because it might imply that pidfd_send_signal() is a replacement for kill(2), and not the latter because it is a hassle to remember the correct spelling - especially for non-native speakers - and because it is not descriptive enough of what the syscall actually does. The name "pidfd_send_signal" makes it very clear that its job is to send signals. /* zombies */ Zombies can be signaled just as any other process. No special error will be reported since a zombie state is an unreliable state (cf. [3]). However, this can be added as an extension through the @flags argument if the need ever arises. /* cross-namespace signals */ The patch currently enforces that the signaler and signalee either are in the same pid namespace or that the signaler's pid namespace is an ancestor of the signalee's pid namespace. This is done for the sake of simplicity and because it is unclear to what values certain members of struct siginfo_t would need to be set to (cf. [5], [6]). /* compat syscalls */ It became clear that we would like to avoid adding compat syscalls (cf. [7]). The compat syscall handling is now done in kernel/signal.c itself by adding __copy_siginfo_from_user_generic() which lets us avoid compat syscalls (cf. [8]). It should be noted that the addition of __copy_siginfo_from_user_any() is caused by a bug in the original implementation of rt_sigqueueinfo(2) (cf. 12). With upcoming rework for syscall handling things might improve significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain any additional callers. /* testing */ This patch was tested on x64 and x86. /* userspace usage */ An asciinema recording for the basic functionality can be found under [9]. With this patch a process can be killed via: #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags) { #ifdef __NR_pidfd_send_signal return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags); #else return -ENOSYS; #endif } int main(int argc, char *argv[]) { int fd, ret, saved_errno, sig; if (argc < 3) exit(EXIT_FAILURE); fd = open(argv[1], O_DIRECTORY | O_CLOEXEC); if (fd < 0) { printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]); exit(EXIT_FAILURE); } sig = atoi(argv[2]); printf("Sending signal %d to process %s\n", sig, argv[1]); ret = do_pidfd_send_signal(fd, sig, NULL, 0); saved_errno = errno; close(fd); errno = saved_errno; if (ret < 0) { printf("%s - Failed to send signal %d to process %s\n", strerror(errno), sig, argv[1]); exit(EXIT_FAILURE); } exit(EXIT_SUCCESS); } /* Q&A * Given that it seems the same questions get asked again by people who are * late to the party it makes sense to add a Q&A section to the commit * message so it's hopefully easier to avoid duplicate threads. * * For the sake of progress please consider these arguments settled unless * there is a new point that desperately needs to be addressed. Please make * sure to check the links to the threads in this commit message whether * this has not already been covered. */ Q-01: (Florian Weimer [20], Andrew Morton [21]) What happens when the target process has exited? A-01: Sending the signal will fail with ESRCH (cf. [22]). Q-02: (Andrew Morton [21]) Is the task_struct pinned by the fd? A-02: No. A reference to struct pid is kept. struct pid - as far as I understand - was created exactly for the reason to not require to pin struct task_struct (cf. [22]). Q-03: (Andrew Morton [21]) Does the entire procfs directory remain visible? Just one entry within it? A-03: The same thing that happens right now when you hold a file descriptor to /proc/<pid> open (cf. [22]). Q-04: (Andrew Morton [21]) Does the pid remain reserved? A-04: No. This patchset guarantees a stable handle not that pids are not recycled (cf. [22]). Q-05: (Andrew Morton [21]) Do attempts to signal that fd return errors? A-05: See {Q,A}-01. Q-06: (Andrew Morton [22]) Is there a cleaner way of obtaining the fd? Another syscall perhaps. A-06: Userspace can already trivially retrieve file descriptors from procfs so this is something that we will need to support anyway. Hence, there's no immediate need to add another syscalls just to make pidfd_send_signal() not dependent on the presence of procfs. However, adding a syscalls to get such file descriptors is planned for a future patchset (cf. [22]). Q-07: (Andrew Morton [21] and others) This fd-for-a-process sounds like a handy thing and people may well think up other uses for it in the future, probably unrelated to signals. Are the code and the interface designed to permit such future applications? A-07: Yes (cf. [22]). Q-08: (Andrew Morton [21] and others) Now I think about it, why a new syscall? This thing is looking rather like an ioctl? A-08: This has been extensively discussed. It was agreed that a syscall is preferred for a variety or reasons. Here are just a few taken from prior threads. Syscalls are safer than ioctl()s especially when signaling to fds. Processes are a core kernel concept so a syscall seems more appropriate. The layout of the syscall with its four arguments would require the addition of a custom struct for the ioctl() thereby causing at least the same amount or even more complexity for userspace than a simple syscall. The new syscall will replace multiple other pid-based syscalls (see description above). The file-descriptors-for-processes concept introduced with this syscall will be extended with other syscalls in the future. See also [22], [23] and various other threads already linked in here. Q-09: (Florian Weimer [24]) What happens if you use the new interface with an O_PATH descriptor? A-09: pidfds opened as O_PATH fds cannot be used to send signals to a process (cf. [2]). Signaling processes through pidfds is the equivalent of writing to a file. Thus, this is not an operation that operates "purely at the file descriptor level" as required by the open(2) manpage. See also [4]. /* References */ [1]: https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/ [2]: https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/ [3]: https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/ [4]: https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/ [5]: https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/ [6]: https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/ [7]: https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/ [8]: https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/ [9]: https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy [11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/ [12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/ [13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/ [14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/ [15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/ [16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/ [17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/ [18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/ [19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/ [20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/ [21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/ [22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/ [23]: https://lwn.net/Articles/773459/ [24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/ [25]: https://lore.kernel.org/lkml/CAK8P3a0ej9NcJM8wXNPbcGUyOUZYX+VLoDFdbenW3s3114oQZw@mail.gmail.com/ Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jann Horn <jannh@google.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Christian Brauner <christian@brauner.io> Reviewed-by: Tycho Andersen <tycho@tycho.ws> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Aleksa Sarai <cyphar@cyphar.com>
* | | | | Merge tag 'nfs-for-5.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2019-03-161-1/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Bugfixes: - Fix an Oops in SUNRPC back channel tracepoints - Fix a SUNRPC client regression when handling oversized replies - Fix the minimal size for SUNRPC reply buffer allocation - rpc_decode_header() must always return a non-zero value on error - Fix a typo in pnfs_update_layout() Cleanup: - Remove redundant check for the reply length in call_decode()" * tag 'nfs-for-5.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Remove redundant check for the reply length in call_decode() SUNRPC: Handle the SYSTEM_ERR rpc error SUNRPC: rpc_decode_header() must always return a non-zero value on error SUNRPC: Use the ENOTCONN error on socket disconnect SUNRPC: Fix the minimal size for reply buffer allocation SUNRPC: Fix a client regression when handling oversized replies pNFS: Fix a typo in pnfs_update_layout fix null pointer deref in tracepoints in back channel
| * | | | | pNFS: Fix a typo in pnfs_update_layoutTrond Myklebust2019-03-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're supposed to wait for the outstanding layout count to go to zero, but that got lost somehow. Fixes: d03360aaf5cca ("pNFS: Ensure we return the error if someone...") Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* | | | | | Merge branch 'work.mount' of ↵Linus Torvalds2019-03-161-3/+5
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount infrastructure fix from Al Viro: "Fixup for sysfs braino. Capabilities checks for sysfs mount do include those on netns, but only if CONFIG_NET_NS is enabled. Sorry, should've caught that earlier..." * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix sysfs_init_fs_context() in !CONFIG_NET_NS case
| * | | | | | fix sysfs_init_fs_context() in !CONFIG_NET_NS caseAl Viro2019-03-161-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Permission checks on current's netns should be done only when netns are enabled. Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Fixes: 23bf1b6be9c2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | Merge tag '5.1-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2019-03-1615-349/+957
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more smb3 updates from Steve French: "Various tracing and debugging improvements, crediting fixes, some cleanup, and important fallocate fix (fixes three xfstests) and lock fix. Summary: - Various additional dynamic tracing tracepoints - Debugging improvements (including ability to query the server via SMB3 fsctl from userspace tools which can help with stats and debugging) - One minor performance improvement (root directory inode caching) - Crediting (SMB3 flow control) fixes - Some cleanup (docs and to mknod) - Important fixes: one to smb3 implementation of fallocate zero range (which fixes three xfstests) and a POSIX lock fix" * tag '5.1-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6: (22 commits) CIFS: fix POSIX lock leak and invalid ptr deref SMB3: Allow SMB3 FSCTL queries to be sent to server from tools cifs: fix incorrect handling of smb2_set_sparse() return in smb3_simple_falloc smb2: fix typo in definition of a few error flags CIFS: make mknod() an smb_version_op cifs: minor documentation updates cifs: remove unused value pointed out by Coverity SMB3: passthru query info doesn't check for SMB3 FSCTL passthru smb3: add dynamic tracepoints for simple fallocate and zero range cifs: fix smb3_zero_range so it can expand the file-size when required cifs: add SMB2_ioctl_init/free helpers to be used with compounding smb3: Add dynamic trace points for various compounded smb3 ops cifs: cache FILE_ALL_INFO for the shared root handle smb3: display volume serial number for shares in /proc/fs/cifs/DebugData cifs: simplify how we handle credits in compound_send_recv() smb3: add dynamic tracepoint for timeout waiting for credits smb3: display security information in /proc/fs/cifs/DebugData more accurately cifs: add a timeout argument to wait_for_free_credits cifs: prevent starvation in wait_for_free_credits for multi-credit requests cifs: wait_for_free_credits() make it possible to wait for >=1 credits ...
| * | | | | | | CIFS: fix POSIX lock leak and invalid ptr derefAurelien Aptel2019-03-151-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a customer reporting crashes in lock_get_status() with many "Leaked POSIX lock" messages preceeding the crash. Leaked POSIX lock on dev=0x0:0x56 ... Leaked POSIX lock on dev=0x0:0x56 ... Leaked POSIX lock on dev=0x0:0x56 ... Leaked POSIX lock on dev=0x0:0x53 ... Leaked POSIX lock on dev=0x0:0x53 ... Leaked POSIX lock on dev=0x0:0x53 ... Leaked POSIX lock on dev=0x0:0x53 ... POSIX: fl_owner=ffff8900e7b79380 fl_flags=0x1 fl_type=0x1 fl_pid=20709 Leaked POSIX lock on dev=0x0:0x4b ino... Leaked locks on dev=0x0:0x4b ino=0xf911400000029: POSIX: fl_owner=ffff89f41c870e00 fl_flags=0x1 fl_type=0x1 fl_pid=19592 stack segment: 0000 [#1] SMP Modules linked in: binfmt_misc msr tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 arc4 ecb auth_rpcgss nfsv4 md4 nfs nls_utf8 lockd grace cifs sunrpc ccm dns_resolver fscache af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock xfs libcrc32c sb_edac edac_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng vmw_balloon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr vmxnet3 i2c_piix4 vmw_vmci shpchp fjes processor button ac btrfs xor raid6_pq sr_mod cdrom ata_generic sd_mod ata_piix vmwgfx crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw ahci libahci drm libata vmw_pvscsi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes CPU: 6 PID: 28250 Comm: lsof Not tainted 4.4.156-94.64-default #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 task: ffff88a345f28740 ti: ffff88c74005c000 task.ti: ffff88c74005c000 RIP: 0010:[<ffffffff8125dcab>] [<ffffffff8125dcab>] lock_get_status+0x9b/0x3b0 RSP: 0018:ffff88c74005fd90 EFLAGS: 00010202 RAX: ffff89bde83e20ae RBX: ffff89e870003d18 RCX: 0000000049534f50 RDX: ffffffff81a3541f RSI: ffffffff81a3544e RDI: ffff89bde83e20ae RBP: 0026252423222120 R08: 0000000020584953 R09: 000000000000ffff R10: 0000000000000000 R11: ffff88c74005fc70 R12: ffff89e5ca7b1340 R13: 00000000000050e5 R14: ffff89e870003d30 R15: ffff89e5ca7b1340 FS: 00007fafd64be800(0000) GS:ffff89f41fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001c80018 CR3: 000000a522048000 CR4: 0000000000360670 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: 0000000000000208 ffffffff81a3d6b6 ffff89e870003d30 ffff89e870003d18 ffff89e5ca7b1340 ffff89f41738d7c0 ffff89e870003d30 ffff89e5ca7b1340 ffffffff8125e08f 0000000000000000 ffff89bc22b67d00 ffff88c74005ff28 Call Trace: [<ffffffff8125e08f>] locks_show+0x2f/0x70 [<ffffffff81230ad1>] seq_read+0x251/0x3a0 [<ffffffff81275bbc>] proc_reg_read+0x3c/0x70 [<ffffffff8120e456>] __vfs_read+0x26/0x140 [<ffffffff8120e9da>] vfs_read+0x7a/0x120 [<ffffffff8120faf2>] SyS_read+0x42/0xa0 [<ffffffff8161cbc3>] entry_SYSCALL_64_fastpath+0x1e/0xb7 When Linux closes a FD (close(), close-on-exec, dup2(), ...) it calls filp_close() which also removes all posix locks. The lock struct is initialized like so in filp_close() and passed down to cifs ... lock.fl_type = F_UNLCK; lock.fl_flags = FL_POSIX | FL_CLOSE; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; ... Note the FL_CLOSE flag, which hints the VFS code that this unlocking is done for closing the fd. filp_close() locks_remove_posix(filp, id); vfs_lock_file(filp, F_SETLK, &lock, NULL); return filp->f_op->lock(filp, cmd, fl) => cifs_lock() rc = cifs_setlk(file, flock, type, wait_flag, posix_lck, lock, unlock, xid); rc = server->ops->mand_unlock_range(cfile, flock, xid); if (flock->fl_flags & FL_POSIX && !rc) rc = locks_lock_file_wait(file, flock) Notice how we don't call locks_lock_file_wait() which does the generic VFS lock/unlock/wait work on the inode if rc != 0. If we are closing the handle, the SMB server is supposed to remove any locks associated with it. Similarly, cifs.ko frees and wakes up any lock and lock waiter when closing the file: cifs_close() cifsFileInfo_put(file->private_data) /* * Delete any outstanding lock records. We'll lose them when the file * is closed anyway. */ down_write(&cifsi->lock_sem); list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) { list_del(&li->llist); cifs_del_lock_waiters(li); kfree(li); } list_del(&cifs_file->llist->llist); kfree(cifs_file->llist); up_write(&cifsi->lock_sem); So we can safely ignore unlocking failures in cifs_lock() if they happen with the FL_CLOSE flag hint set as both the server and the client take care of it during the actual closing. This is not a proper fix for the unlocking failure but it's safe and it seems to prevent the lock leakages and crashes the customer experiences. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
| * | | | | | | SMB3: Allow SMB3 FSCTL queries to be sent to server from toolsRonnie Sahlberg2019-03-151-16/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For debugging purposes we often have to be able to query additional information only available via SMB3 FSCTL from the server from user space tools (e.g. like cifs-utils's smbinfo). See MS-FSCC and MS-SMB2 protocol specifications for more details. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | | cifs: fix incorrect handling of smb2_set_sparse() return in smb3_simple_fallocRonnie Sahlberg2019-03-151-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | smb2_set_sparse does not return -errno, it returns a boolean where true means success. Change this to just ignore the return value just like the other callsites. Additionally add code to handle the case where we must set the file sparse and possibly also extending it. Fixes xfstests: generic/236 generic/350 generic/420 Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | | | | | smb2: fix typo in definition of a few error flagsSteve French2019-03-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Sergey Senozhatsky pointed out __constant_cpu_to_le32() is misspelled in a few definitions in the list of status codes smb2status.h as __constanst_cpu_to_le32() Signed-off-by: Steve French <stfrench@microsoft.com> CC: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
| * | | | | | | CIFS: make mknod() an smb_version_opAurelien Aptel2019-03-154-104/+239
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This cleanup removes cifs specific code from SMB2/SMB3 code paths which is cleaner and easier to maintain as the code to handle special files is improved. Below is an example creating special files using 'sfu' mount option over SMB3 to Windows (with this patch) (Note that to Samba server, support for saving dos attributes has to be enabled for the SFU mount option to work). In the future this will also make implementation of creating special files as reparse points easier (as Windows NFS server does for example). root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/char character special file root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/block block special file Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>