summaryrefslogtreecommitdiffstats
path: root/fs/ceph/caps.c
Commit message (Collapse)AuthorAgeFilesLines
* ceph: more precise CEPH_CLIENT_CAPS_PENDING_CAPSNAPYan, Zheng2019-07-081-11/+30
| | | | | | | | | | | Client uses this flag to tell mds if there is more cap snap need to flush. It's mainly for the case that client needs to re-send cap/snap flushes after mds failover, but CEPH_CAP_ANY_FILE_WR on corresponding inodes are all released before mds failover. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: kick flushing and flush snaps before sending normal cap messageYan, Zheng2019-07-081-4/+14
| | | | | | | | Otherwise client may send cap flush messages in wrong order. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: clear CEPH_I_KICK_FLUSH flag inside __kick_flushing_caps()Yan, Zheng2019-07-081-9/+4Star
| | | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: handle change_attr in cap messagesJeff Layton2019-07-081-9/+10
| | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: handle btime in cap messagesJeff Layton2019-07-081-6/+12
| | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add selinux supportYan, Zheng2019-07-081-0/+1
| | | | | | | | | | | | | | | When creating new file/directory, use security_dentry_init_security() to prepare selinux context for the new inode, then send openc/mkdir request to MDS, together with selinux xattr. security_dentry_init_security() only supports single security module and only selinux has dentry_init_security hook. So only selinux is supported for now. We can add support for other security modules once kernel has a generic version of dentry_init_security() Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: hold i_ceph_lock when removing caps for freeing inodeYan, Zheng2019-07-081-4/+6
| | | | | | | | | ceph_d_revalidate(, LOOKUP_RCU) may call __ceph_caps_issued_mask() on a freeing inode. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix error handling in ceph_get_caps()Yan, Zheng2019-06-051-11/+11
| | | | | | | | | | The function return 0 even when interrupted or try_get_cap_refs() return error. Fixes: 1199d7da2d ("ceph: simplify arguments and return semantics of try_get_cap_refs") Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: avoid iput_final() while holding mutex or in dispatch threadYan, Zheng2019-06-051-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iput_final() may wait for reahahead pages. The wait can cause deadlock. For example: Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: schedule+0x36/0x80 io_schedule+0x16/0x40 __lock_page+0x101/0x140 truncate_inode_pages_range+0x556/0x9f0 truncate_inode_pages_final+0x4d/0x60 evict+0x182/0x1a0 iput+0x1d2/0x220 iterate_session_caps+0x82/0x230 [ceph] dispatch+0x678/0xa80 [ceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: __schedule+0x3d6/0x8b0 schedule+0x36/0x80 schedule_preempt_disabled+0xe/0x10 mutex_lock+0x2f/0x40 ceph_check_caps+0x505/0xa80 [ceph] ceph_put_wrbuffer_cap_refs+0x1e5/0x2c0 [ceph] writepages_finish+0x2d3/0x410 [ceph] __complete_request+0x26/0x60 [libceph] handle_reply+0x6c8/0xa10 [libceph] dispatch+0x29a/0xbb0 [libceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 In above example, truncate_inode_pages_range() waits for readahead pages while holding s_mutex. ceph_check_caps() waits for s_mutex and blocks OSD dispatch thread. Later OSD replies (for readahead) can't be handled. ceph_check_caps() also may lock snap_rwsem for read. So similar deadlock can happen if iput_final() is called while holding snap_rwsem. In general, it's not good to call iput_final() inside MDS/OSD dispatch threads or while holding any mutex. The fix is introducing ceph_async_iput(), which calls iput_final() in workqueue. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: print inode number in __caps_issued_mask debugging messagesJeff Layton2019-05-071-6/+6
| | | | | | | | To make it easier to correlate with MDS logs. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: simplify arguments and return semantics of try_get_cap_refsJeff Layton2019-05-071-46/+30Star
| | | | | | | | | | | | | The return of this function is rather complex. It can return 0 or 1, and in the case of a 1 return, the "err" pointer will be filled out. This necessitates a lot of copying of values. We can achieve the same effect by just returning 0, 1 or a negative error code, and drop the "err" argument from this function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix comment over ceph_drop_caps_for_unlinkJeff Layton2019-05-071-1/+1
| | | | | | | | | It's not clear what AUTH_RDCACHE means in this context, and we're clearly just dropping LINK caps here. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: remove superfluous inode_lock in ceph_fsyncJeff Layton2019-05-071-3/+0Star
| | | | | | | | | | | | | | Originally, filemap_write_and_wait took the i_mutex internally, but commit 02c24a82187d pushed the mutex acquisition into the individual fsync routines, leaving it up to the subsystem maintainers to remove it if it wasn't needed. For ceph, I see no reason to take the inode_lock here. All of the operations inside that lock are protected by their own locking. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add mount option to limit caps countYan, Zheng2019-03-051-7/+26
| | | | | | | | | | | | If number of caps exceed the limit, ceph_trim_dentires() also trim dentries with valid leases. Trimming dentry releases references to associated inode, which may evict inode and release caps. By default, there is no limit for caps count. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: touch existing cap when handling replyYan, Zheng2019-03-051-0/+4
| | | | | | | | | Move cap to tail of session->s_caps list. So ceph_trim_caps() will trim older caps first. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: send cap releases more aggressivelyYan, Zheng2019-03-051-17/+12Star
| | | | | | | | | When pending cap releases fill up one message, start a work to send cap release message. (old way is sending cap releases every 5 seconds) Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: split large reconnect into multiple messagesYan, Zheng2019-03-051-0/+6
| | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: clear inode pointer when snap realm gets dropped by its inodeYan, Zheng2019-01-211-0/+2
| | | | | | | | | | | snap realm and corresponding inode have pointers to each other. The two pointer should get clear at the same time. Otherwise, snap realm's pointer may reference freed inode. Cc: stable@vger.kernel.org # 4.17+ Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Luis Henriques <lhenriques@suse.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: update wanted caps after resuming stale sessionYan, Zheng2018-12-261-17/+23
| | | | | | | | | | | | mds contains an optimization, it does not re-issue stale caps if client does not want any cap. A special case of the optimization is that client wants some caps, but skipped updating 'wanted'. For this case, client needs to update 'wanted' when stale session get renewed. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: skip updating 'wanted' caps if caps are already issuedYan, Zheng2018-12-261-10/+17
| | | | | | | | When reading cached inode that already has Fscr caps, this can avoid two cap messages (one updats 'wanted' caps, one clears 'wanted' caps). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: don't request excl caps when mount is readonlyYan, Zheng2018-12-261-2/+5
| | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: don't update importing cap's mseq when handing cap exportYan, Zheng2018-12-261-1/+0Star
| | | | | | | | | | | | | Updating mseq makes client think importer mds has accepted all prior cap messages and importer mds knows what caps client wants. Actually some cap messages may have been dropped because of mseq mismatch. If mseq is left untouched, importing cap's mds_wanted later will get reset by cap import message. Cc: stable@vger.kernel.org Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add non-blocking parameter to ceph_try_get_caps()Luis Henriques2018-10-221-3/+4
| | | | | | | | | | | | ceph_try_get_caps currently calls try_get_cap_refs with the nonblock parameter always set to 'true'. This change adds a new parameter that allows to set it's value. This will be useful for a follow-up patch that will need to get two sets of capabilities for two different inodes without risking a deadlock. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: set timeout conditionally in __cap_delay_requeueXuehan Xu2018-10-221-6/+8
| | | | | | | | | | | | | __cap_delay_requeue could be invoked through ceph_check_caps when there exists caps that needs to be sent and are delayed by "i_hold_caps_min" or "i_hold_caps_max". If __cap_delay_requeue sets timeout unconditionally, there could be a chance that some "wanted" caps can not be release for a long since their timeouts are reset every time they get delayed. Fixes: http://tracker.ceph.com/issues/36369 Signed-off-by: Xuehan Xu <xuxuehan@360.cn> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: reset cap hold timeout only for requeued inodeChengguang Xu2018-10-221-1/+1
| | | | | | | | | | __cap_delay_requeue() only requeue inode which does not have CEPH_I_FLUSH flag, so avoid reset cap hold timeout for that inode. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: refactor error handling code in ceph_reserve_caps()Chengguang Xu2018-08-131-33/+13Star
| | | | | | | | Call new helper __ceph_unreserve_caps() to reduce duplicated code. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: refactor ceph_unreserve_caps()Chengguang Xu2018-08-131-30/+36
| | | | | | | | | | The code of ceph_unreserve_caps() and error handling in ceph_reserve_caps() are duplicated, so introduce a helper __ceph_unreserve_caps() to reduce duplicated code. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: use timespec64 for inode timestampArnd Bergmann2018-08-021-13/+13
| | | | | | | | | | | | | | Since the vfs structures are all using timespec64, we can now change the internal representation, using ceph_encode_timespec64 and ceph_decode_timespec64. In case of ceph_aux_inode however, we need to avoid doing a memcmp() on uninitialized padding data, so the members of the i_mtime field get copied individually into 64-bit integers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Merge tag 'vfs-timespec64' of ↵Linus Torvalds2018-06-151-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()
| * vfs: change inode times to use struct timespec64Deepa Dinamani2018-06-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
* | ceph: fix wrong check for the case of updating link countYan, Zheng2018-06-041-2/+2
| | | | | | | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: handle the new nfiles/nsubdirs fields in cap messageYan, Zheng2018-06-041-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Without these new fields, stale st_size is returned in following case. 1. MDS modifies a directory 2. MDS issues CEPH_CAP_ANY_SHARED to client 3. The client satifies stat(2) by its cached metadata. set st_size to "i_files + i_subdirs". Link: http://tracker.ceph.com/issues/23855 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: define argument structure for handle_cap_grantYan, Zheng2018-06-041-54/+61
| | | | | | | | | | | | | | The data structure includes the versioned feilds of cap message. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: always get rstat from auth mdsYan, Zheng2018-06-041-0/+2
|/ | | | | | | | | rstat is not tracked by capability. client can't know if rstat from non-auth mds is uptodate or not. Link: http://tracker.ceph.com/issues/23538 Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: quota: cache inode pointer in ceph_snap_realmLuis Henriques2018-04-021-1/+3
| | | | | | | | | Keep a pointer to the inode in struct ceph_snap_realm. This allows to optimize functions that walk the realms hierarchy (e.g. in quotas). Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: optimizing cap reservationChengguang Xu2018-04-021-29/+59
| | | | | | | | | | | | | When caps_avail_count is in a low level, most newly trimmed caps will probably go into ->caps_list and caps_avail_count will be increased. Hence after trimming, should recheck caps_avail_count to effectly reuse newly trimmed caps. Also, when releasing unnecessary caps follow the same rule of ceph_put_cap. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: release unreserved caps if having enough available capsChengguang Xu2018-04-021-1/+15
| | | | | | | | | When unreserving caps check if there is too mamy available caps in the ->caps_list, if so release unreserved caps. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: optimizing cap allocationChengguang Xu2018-04-021-0/+16
| | | | | | | | | | | | | | When setting high volume of caps_min_count or having many unreserved caps, unused caps may always keep in the ->caps_list even can't get new cap from kmem_cache_alloc because lack of maximum limitation of caps_avail_count. Hence reuse caps in ->caps_list if available, it's maybe better than setting max limitation of caps_avail_count and releasing unused caps when reaching the limit. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: adding protection for showing cap reservation infoChengguang Xu2018-04-021-0/+4
| | | | | | | | | | | | | | Adding spinlock protection during getting cap reservation ralated fields so that the numbers match below BUG_ON condition in the code. BUG_ON(mdsc->caps_total_count != mdsc->caps_use_count + mdsc->caps_reserve_count + mdsc->caps_avail_count); Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: flush dirty caps of unlinked inode ASAPZhi Zhang2018-02-261-0/+26
| | | | | | | | | | Client should release unlinked inode from its cache ASAP. But client can't release inode with dirty caps. Link: http://tracker.ceph.com/issues/22886 Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: improving efficiency of syncfsChengguang Xu2018-01-301-1/+1
| | | | | | | | | | write_inode() could be called variety of reasons, in the case of syncfs(2) there is no need to wait for flush getting completed in write_inode(), ->sync_fs is for guaranteeing flush completion for all inodes at that point. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: try to allocate enough memory for reserved capsZhi Zhang2018-01-291-8/+53
| | | | | | | | | | | | | | | ceph_reserve_caps() may not reserve enough caps under high memory pressure, but it saved the needed caps number that expected to be reserved. When getting caps, crash would happen due to number mismatch. Now we will try to trim more caps when failing to allocate memory for caps need to be reserved, then try again. If still failing to allocate memory, return -ENOMEM. Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix race of queuing delayed capsYan, Zheng2018-01-291-17/+16Star
| | | | | | | | | | | | | | | | | When called with CHECK_CAPS_AUTHONLY flag, ceph_check_caps() only processes auth caps. In that case, it's unsafe to remove inode from mdsc->cap_delay_list, because there can be delayed non-auth caps. Besides, ceph_check_caps() may lock/unlock i_ceph_lock several times, when multiple threads call ceph_check_caps() at the same time. It's possible that one thread calls __cap_delay_requeue(), another thread calls __cap_delay_cancel(). __cap_delay_cancel() should be called at very beginning of ceph_check_caps(), so that it does not race with __cap_delay_requeue(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: delete unreachable code in ceph_check_caps()Yan, Zheng2018-01-291-12/+3Star
| | | | | | | | "revoking & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)" has already been tested before calling try_nonblocking_invalidate() Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: limit rate of cap import/export error messagesYan, Zheng2018-01-291-7/+15
| | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix incorrect snaprealm when adding capsYan, Zheng2018-01-291-1/+13
| | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: use atomic_t for ceph_inode_info::i_shared_genYan, Zheng2018-01-291-1/+1
| | | | | | | | | It allows accessing i_shared_gen without holding i_ceph_lock. It is preparation for later patch. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: voluntarily drop Ax cap for requests that create new inodeYan, Zheng2018-01-291-6/+15
| | | | | | | | | MDS need to rdlock directory inode's authlock when handling these requests. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: remove unused and redundant variable droppingColin Ian King2017-11-131-2/+1Star
| | | | | | | | | | Variable dropping is set but never read and hence is redundant and can be removed. Cleans up clang warning: fs/ceph/caps.c:1170:2: warning: Value stored to 'dropping' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fix bool initialization/comparisonThomas Meyer2017-11-131-3/+3
| | | | | | | | Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>