summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* f2fs: show the fault injection mount optionKaixu Xia2017-02-231-0/+5
| | | | | | | | This patch shows the fault injection mount option in f2fs_show_options(). Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix null pointer dereference when issuing flush in ->fsyncChao Yu2017-02-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only allocate flush merge control structure sbi::sm_info::fcc_info when flush_merge option is on, but in f2fs_issue_flush we still try to access member of the control structure without that option, it incurs panic as show below, fix it. Call Trace: __remove_ino_entry+0xa9/0xc0 [f2fs] f2fs_do_sync_file.isra.27+0x214/0x6d0 [f2fs] f2fs_sync_file+0x18/0x20 [f2fs] vfs_fsync_range+0x3d/0xb0 __do_page_fault+0x261/0x4d0 do_fsync+0x3d/0x70 SyS_fsync+0x10/0x20 do_syscall_64+0x6e/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f18ce260de0 RSP: 002b:00007ffdd4589258 EFLAGS: 00000246 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f18ce260de0 RDX: 0000000000000006 RSI: 00000000016c0360 RDI: 0000000000000003 RBP: 00000000016c0360 R08: 000000000000ffff R09: 000000000000001f R10: 00007ffdd4589020 R11: 0000000000000246 R12: 00000000016c0100 R13: 0000000000000000 R14: 00000000016c1f00 R15: 00000000016c0100 Code: fb 81 e3 00 08 00 00 48 89 45 a0 0f 1f 44 00 00 31 c0 85 db 75 27 41 81 e7 00 04 00 00 74 0c 41 8b 45 20 85 c0 0f 85 81 00 00 00 <f0> 41 ff 45 20 4c 89 e7 e8 f8 e9 ff ff f0 41 ff 4d 20 48 83 c4 RIP: f2fs_issue_flush+0x5b/0x170 [f2fs] RSP: ffffc90003b5fd78 CR2: 0000000000000020 ---[ end trace a09314c24f037648 ]--- Reported-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to avoid overflow when left shifting page offsetChao Yu2017-02-231-2/+3
| | | | | | | | | | | | We use following method to calculate size with current page index: size = index << PAGE_SHIFT If type of index has only 32-bits size, left shifting will incur overflow, which makes result incorrect. So let's cast index with 64-bits type to avoid such issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: enhance lookup xattrChao Yu2017-02-232-18/+121
| | | | | | | | | | | | Previously, in getxattr we will load all entries both in inline xattr and xattr node block, and then do the lookup in all entries, but our lookup flow shows low efficiency, since if we can lookup and hit in inline xattr of inode page cache first, we don't need to load and lookup xattr node block, which can obviously save cpu time and IO latency. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: initialize NULL to avoid warning] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Doc: f2fs: Fix typo in Documentation/filesystems/f2fs.txtMasanari Iida2017-02-231-1/+1
| | | | | | | This patch fix a typo in f2fs.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix a dead loop in f2fs_fiemap()Wei Fang2017-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A dead loop can be triggered in f2fs_fiemap() using the test case as below: ... fd = open(); fallocate(fd, 0, 0, 4294967296); ioctl(fd, FS_IOC_FIEMAP, fiemap_buf); ... It's caused by an overflow in __get_data_block(): ... bh->b_size = map.m_len << inode->i_blkbits; ... map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits on 64 bits archtecture, type conversion from an unsigned int to a size_t will result in an overflow. In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap() will call get_data_block() at block 0 again an again. Fix this by adding a force conversion before left shift. Signed-off-by: Wei Fang <fangwei1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: do not preallocate blocks which has wrong bufferJaegeuk Kim2017-02-233-2/+13
| | | | | | | | | | | Sheng Yong reports needless preallocation if write(small_buffer, large_size) is called. In that case, f2fs preallocates large_size, but vfs returns early due to small_buffer size. Let's detect it before preallocation phase in f2fs. Reported-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: show # of on-going flush and discard biosJaegeuk Kim2017-02-233-3/+17
| | | | | | | This patch adds stat information for flush and discard commands. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add a kernel thread to issue discard commands asynchronouslyJaegeuk Kim2017-02-232-31/+109
| | | | | | | | | This patch adds a kernel thread to issue discard commands. It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current bio status. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: factor out discard command info into discard_cmd_controlJaegeuk Kim2017-02-234-22/+69
| | | | | | | This patch adds discard_cmd_control with the existing discarding controls. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: reorganize stat informationJaegeuk Kim2017-02-231-4/+4
| | | | | | | This patch modifies stat information more clearly. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: clean up flush/discard command namingsJaegeuk Kim2017-02-233-61/+59Star
| | | | | | | This patch simply cleans up the names for flush/discard commands. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: check in-memory sit version bitmapChao Yu2017-02-232-4/+30
| | | | | | | | | This patch adds a mirror for sit version bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: check in-memory nat version bitmapChao Yu2017-02-233-0/+29
| | | | | | | | | This patch adds a mirror for nat version bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: check in-memory block bitmapChao Yu2017-02-232-2/+36
| | | | | | | | | This patch adds a mirror for valid block bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce FI_ATOMIC_COMMITChao Yu2017-02-235-9/+26
| | | | | | | | | | | This patch introduces a new flag to indicate inode status of doing atomic write committing, so that, we can keep atomic write status for inode during atomic committing, then we can skip GCing pages of atomic write inode, that avoids random GCed datas being mixed with current transaction, so isolation of transaction can be kept. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: clean up with list_{first, last}_entryChao Yu2017-02-233-5/+5
| | | | | Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: return fs_trim if there is no candidateJaegeuk Kim2017-02-233-5/+34
| | | | | | | | If there is no candidate to submit discard command during f2fs_trim_fs, let's return without checkpoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid needless checkpoint in f2fs_trim_fsJaegeuk Kim2017-02-221-8/+9
| | | | | | | | | The f2fs_trim_fs() doesn't need to do checkpoint if there are newly allocated data blocks only which didn't change the critical checkpoint data such as nat and sit entries. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: relax async discard commands moreJaegeuk Kim2017-01-294-11/+27
| | | | | | | | | | | | | | | | | This patch relaxes async discard commands to avoid waiting its end_io during checkpoint. Instead of waiting them during checkpoint, it will be done when actually reusing them. Test on initial partition of nvme drive. # time fstrim /mnt/test Before : 6.158s After : 4.822s Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: drop exist_data for inline_data when truncated to 0Jaegeuk Kim2017-01-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A test program gets the SEEK_DATA with two values between a new created file and the exist file on f2fs filesystem. F2FS filesystem, (the first "test1" is a new file) SEEK_DATA size != 0 (offset = 8192) SEEK_DATA size != 0 (offset = 4096) PNFS filesystem, (the first "test1" is a new file) SEEK_DATA size != 0 (offset = 4096) SEEK_DATA size != 0 (offset = 4096) int main(int argc, char **argv) { char *filename = argv[1]; int offset = 1, i = 0, fd = -1; if (argc < 2) { printf("Usage: %s f2fsfilename\n", argv[0]); return -1; } /* if (!access(filename, F_OK) || errno != ENOENT) { printf("Needs a new file for test, %m\n"); return -1; }*/ fd = open(filename, O_RDWR | O_CREAT, 0777); if (fd < 0) { printf("Create test file %s failed, %m\n", filename); return -1; } for (i = 0; i < 20; i++) { offset = 1 << i; ftruncate(fd, 0); lseek(fd, offset, SEEK_SET); write(fd, "test", 5); /* Get the alloc size by seek data equal zero*/ if (lseek(fd, 0, SEEK_DATA)) { printf("SEEK_DATA size != 0 (offset = %d)\n", offset); break; } } close(fd); return 0; } Reported-and-Tested-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't allow encrypted operations without keysJaegeuk Kim2017-01-291-0/+12
| | | | | | | | | | This patch fixes the renaming bug on encrypted filenames, which was pointed by (ext4: don't allow encrypted operations without keys) Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: show the max number of atomic operationsJaegeuk Kim2017-01-294-2/+31
| | | | | | | | This patch adds to show the max number of atomic operations which are conducting concurrently. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: get io size bit from mount optionJaegeuk Kim2017-01-292-0/+24
| | | | | | This patch adds to set io_size_bits from mount option. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support IO alignment for DATA and NODE writesJaegeuk Kim2017-01-296-6/+84
| | | | | | | | | | | | | | | This patch implements IO alignment by filling dummy blocks in DATA and NODE write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs, we can eliminate underlying dummy page problem which FTL conducts in order to close MLC or TLC partial written pages. Note that, - it requires "-o mode=lfs". - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256. - read IO is still 4KB. - do checkpoint at fsync, if dummy NODE page was written. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add submit_bio tracepointJaegeuk Kim2017-01-292-19/+38
| | | | | | | This patch adds final submit_bio() tracepoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix wrong tracepoints for op and op_flagsJaegeuk Kim2017-01-291-16/+25
| | | | | | | This patch fixes wrong tracepoints in terms of op and op_flags. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: reassign new segment for mode=lfsJaegeuk Kim2017-01-291-3/+0Star
| | | | | | Otherwise we can remain wrong curseg->next_blkoff, resulting in fsck failure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix a missing discard prefree segmentsYunlei He2017-01-291-1/+9
| | | | | | | | | If userspace issue a fstrim with a range not involve prefree segments, it will reuse these segments without discard. This patch fix it. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use rb_entry_safeGeliang Tang2017-01-291-11/+6Star
| | | | | | | Use rb_entry_safe() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add a case of no need to read a page in write beginYunlei He2017-01-291-1/+6
| | | | | | | | | If the range we write cover the whole valid data in the last page, we do not need to read it. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix a problem of using memory after freeYunlei He2017-01-291-2/+3
| | | | | | | | | | | This patch fix a problem of using memory after free in function __try_merge_extent_node. Fixes: 0f825ee6e873 ("f2fs: add new interfaces for extent tree") Cc: <stable@vger.kernel.org> Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove unneeded conditionDan Carpenter2017-01-291-3/+3
| | | | | | | | We checked that "inode" is not an error pointer earlier so there is no need to check again here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't cache nat entry if out of memoryChao Yu2017-01-291-7/+20
| | | | | | | | | If we run out of memory, in cache_nat_entry, it's better to avoid loop for allocating memory to cache nat entry, so in low memory scenario, for read path of node block, I expect this can avoid unneeded latency. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove unused values in recover_fsync_dataYunlei He2017-01-291-4/+0Star
| | | | | | | This patch remove unused values in function recover_fsync_data Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Merge branch 'i2c/for-current' of ↵Linus Torvalds2017-01-292-4/+24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Two I2C driver bugfixes. The 'VLLS mode support' patch should have been entitled 'reconfigure pinctrl after suspend' to make the bugfix more clear. Sorry, I missed that, yet didn't want to rebase" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx-lpi2c: add VLLS mode support i2c: i2c-cadence: Initialize configuration before probing devices
| * i2c: imx-lpi2c: add VLLS mode supportGao Pan2017-01-261-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When system enters VLLS mode, module power is turned off. As a result, all registers are reset to HW default value. After exiting VLLS mode, registers are still in default mode. As a result, the pinctrl settings are incorrect, which will affect the module function. The patch recovers the pinctrl setting when exit VLLS mode. Signed-off-by: Gao Pan <pandy.gao@nxp.com> Reviewed-by: Vladimir Zapolskiy <vz@mleia.com> [wsa: added missing include] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * i2c: i2c-cadence: Initialize configuration before probing devicesMike Looijmans2017-01-251-4/+4
| | | | | | | | | | | | | | | | | | | | The cadence I2C driver calls cdns_i2c_writereg(..) to setup a workaround in the controller, but did so after calling i2c_add_adapter() which starts probing devices on the bus. Change the order so that the configuration is completely finished before using the adapter. Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
* | Merge tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2017-01-287-3/+14
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: "Stable patches: - NFSv4.1: Fix a deadlock in layoutget - NFSv4 must not bump sequence ids on NFS4ERR_MOVED errors - NFSv4 Fix a regression with OPEN EXCLUSIVE4 mode - Fix a memory leak when removing the SUNRPC module Bugfixes: - Fix a reference leak in _pnfs_return_layout" * tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS: Fix a reference leak in _pnfs_return_layout nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED" SUNRPC: cleanup ida information when removing sunrpc module NFSv4.0: always send mode in SETATTR after EXCLUSIVE4 nfs: Don't increment lock sequence ID after NFS4ERR_MOVED NFSv4.1: Fix a deadlock in layoutget
| * | pNFS: Fix a reference leak in _pnfs_return_layoutTrond Myklebust2017-01-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IF NFS_LAYOUT_RETURN_REQUESTED is not set, then we currently exit without freeing the list of invalidated layout segments, leading to a reference leak. Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 24408f5282 ("pNFS: Fix bugs in _pnfs_return_layout") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"Chuck Lever2017-01-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lock sequence IDs are bumped in decode_lock by calling nfs_increment_seqid(). nfs_increment_sequid() does not use the seqid_mutating_err() function fixed in commit 059aa7348241 ("Don't increment lock sequence ID after NFS4ERR_MOVED"). Fixes: 059aa7348241 ("Don't increment lock sequence ID after ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Xuan Qi <xuan.qi@oracle.com> Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | SUNRPC: cleanup ida information when removing sunrpc moduleKinglong Mee2017-01-243-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After removing sunrpc module, I get many kmemleak information as, unreferenced object 0xffff88003316b1e0 (size 544): comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0 [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0 [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150 [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180 [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd] [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd] [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd] [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc] [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc] [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc] [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0 [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0 [<ffffffffb0390f1f>] vfs_write+0xef/0x240 [<ffffffffb0392fbd>] SyS_write+0xad/0x130 [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9 [<ffffffffffffffff>] 0xffffffffffffffff I found, the ida information (dynamic memory) isn't cleanup. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: 2f048db4680a ("SUNRPC: Add an identifier for struct rpc_clnt") Cc: stable@vger.kernel.org # v3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4.0: always send mode in SETATTR after EXCLUSIVE4Benjamin Coddington2017-01-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some nfsv4.0 servers may return a mode for the verifier following an open with EXCLUSIVE4 createmode, but this does not mean the client should skip setting the mode in the following SETATTR. It should only do that for EXCLUSIVE4_1 or UNGAURDED createmode. Fixes: 5334c5bdac92 ("NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | nfs: Don't increment lock sequence ID after NFS4ERR_MOVEDChuck Lever2017-01-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xuan Qi reports that the Linux NFSv4 client failed to lock a file that was migrated. The steps he observed on the wire: 1. The client sent a LOCK request to the source server 2. The source server replied NFS4ERR_MOVED 3. The client switched to the destination server 4. The client sent the same LOCK request to the destination server with a bumped lock sequence ID 5. The destination server rejected the LOCK request with NFS4ERR_BAD_SEQID RFC 3530 section 8.1.5 provides a list of NFS errors which do not bump a lock sequence ID. However, RFC 3530 is now obsoleted by RFC 7530. In RFC 7530 section 9.1.7, this list has been updated by the addition of NFS4ERR_MOVED. Reported-by: Xuan Qi <xuan.qi@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | NFSv4.1: Fix a deadlock in layoutgetTrond Myklebust2017-01-241-0/+1
| |/ | | | | | | | | | | | | | | | | | | | | We cannot call nfs4_handle_exception() without first ensuring that the slot has been freed. If not, we end up deadlocking with the process waiting for recovery to complete, and recovery waiting for the slot table to drain. Fixes: 2e80dbe7ac51 ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | Merge tag 'md/4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds2017-01-284-45/+194
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MD fixes from Shaohua Li: "This fixes several corner cases for raid5 cache, which is merged into this cycle" * tag 'md/4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/r5cache: disable write back for degraded array md/r5cache: shift complex rmw from read path to write path md/r5cache: flush data only stripes in r5l_recovery_log() md/raid5: move comment of fetch_block to right location md/r5cache: read data into orig_page for prexor of cached data md/raid5-cache: delete meaningless code
| * | md/r5cache: disable write back for degraded arraySong Liu2017-01-243-7/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | write-back cache in degraded mode introduces corner cases to the array. Although we try to cover all these corner cases, it is safer to just disable write-back cache when the array is in degraded mode. In this patch, we disable writeback cache for degraded mode: 1. On device failure, if the array enters degraded mode, raid5_error() will submit async job r5c_disable_writeback_async to disable writeback; 2. In r5c_journal_mode_store(), it is invalid to enable writeback in degraded mode; 3. In r5c_try_caching_write(), stripes with s->failed>0 will be handled in write-through mode. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
| * | md/r5cache: shift complex rmw from read path to write pathSong Liu2017-01-241-4/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Write back cache requires a complex RMW mechanism, where old data is read into dev->orig_page for prexor, and then xor is done with dev->page. This logic is already implemented in the write path. However, current read path is not awared of this requirement. When the array is optimal, the RMW is not required, as the data are read from raid disks. However, when the target stripe is degraded, complex RMW is required to generate right data. To keep read path as clean as possible, we handle read path by flushing degraded, in-journal stripes before processing reads to missing dev. Specifically, when there is read requests to a degraded stripe with data in journal, handle_stripe_fill() calls r5c_make_stripe_write_out() and exits. Then handle_stripe_dirtying() will do the complex RMW and flush the stripe to RAID disks. After that, read requests are handled. There is one more corner case when there is non-overwrite bio for the missing (or out of sync) dev. handle_stripe_dirtying() will not be able to process the non-overwrite bios without constructing the data in handle_stripe_fill(). This is fixed by delaying non-overwrite bios in handle_stripe_dirtying(). So handle_stripe_fill() works on these bios after the stripe is flushed to raid disks. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
| * | md/r5cache: flush data only stripes in r5l_recovery_log()Song Liu2017-01-242-16/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For safer operation, all arrays start in write-through mode, which has been better tested and is more mature. And actually the write-through/write-mode isn't persistent after array restarted, so we always start array in write-through mode. However, if recovery found data-only stripes before the shutdown (from previous write-back mode), it is not safe to start the array in write-through mode, as write-through mode can not handle stripes with data in write-back cache. To solve this problem, we flush all data-only stripes in r5l_recovery_log(). When r5l_recovery_log() returns, the array starts with empty cache in write-through mode. This logic is implemented in r5c_recovery_flush_data_only_stripes(): 1. enable write back cache 2. flush all stripes 3. wake up conf->mddev->thread 4. wait for all stripes get flushed (reuse wait_for_quiescent) 5. disable write back cache The wait in 4 will be waked up in release_inactive_stripe_list() when conf->active_stripes reaches 0. It is safe to wake up mddev->thread here because all the resource required for the thread has been initialized. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
| * | md/raid5: move comment of fetch_block to right locationSong Liu2017-01-241-7/+6Star
| | | | | | | | | | | | | | | Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>