summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | io_uring: use wait_event_interruptible for cq_wait conditional waitJackie Liu2019-05-161-15/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous patch has ensured that io_cqring_events contain smp_rmb memory barriers, Now we can use wait_event_interruptible to keep the code simple. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | io_uring: adjust smp_rmb inside io_cqring_eventsJackie Liu2019-05-161-4/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever smp_rmb is required to use io_cqring_events, keep smp_rmb inside the function io_cqring_events. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | io_uring: fix infinite wait in khread_park() on io_finish_async()Roman Penyaev2019-05-161-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes couple of races which lead to infinite wait of park completion with the following backtraces: [20801.303319] Call Trace: [20801.303321] ? __schedule+0x284/0x650 [20801.303323] schedule+0x33/0xc0 [20801.303324] schedule_timeout+0x1bc/0x210 [20801.303326] ? schedule+0x3d/0xc0 [20801.303327] ? schedule_timeout+0x1bc/0x210 [20801.303329] ? preempt_count_add+0x79/0xb0 [20801.303330] wait_for_completion+0xa5/0x120 [20801.303331] ? wake_up_q+0x70/0x70 [20801.303333] kthread_park+0x48/0x80 [20801.303335] io_finish_async+0x2c/0x70 [20801.303336] io_ring_ctx_wait_and_kill+0x95/0x180 [20801.303338] io_uring_release+0x1c/0x20 [20801.303339] __fput+0xad/0x210 [20801.303341] task_work_run+0x8f/0xb0 [20801.303342] exit_to_usermode_loop+0xa0/0xb0 [20801.303343] do_syscall_64+0xe0/0x100 [20801.303349] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [20801.303380] Call Trace: [20801.303383] ? __schedule+0x284/0x650 [20801.303384] schedule+0x33/0xc0 [20801.303386] io_sq_thread+0x38a/0x410 [20801.303388] ? __switch_to_asm+0x40/0x70 [20801.303390] ? wait_woken+0x80/0x80 [20801.303392] ? _raw_spin_lock_irqsave+0x17/0x40 [20801.303394] ? io_submit_sqes+0x120/0x120 [20801.303395] kthread+0x112/0x130 [20801.303396] ? kthread_create_on_node+0x60/0x60 [20801.303398] ret_from_fork+0x35/0x40 o kthread_park() waits for park completion, so io_sq_thread() loop should check kthread_should_park() along with khread_should_stop(), otherwise if kthread_park() is called before prepare_to_wait() the following schedule() never returns: CPU#0 CPU#1 io_sq_thread_stop(): io_sq_thread(): while(!kthread_should_stop() && !ctx->sqo_stop) { ctx->sqo_stop = 1; kthread_park() prepare_to_wait(); if (kthread_should_stop() { } schedule(); <<< nobody checks park flag, <<< so schedule and never return o if the flag ctx->sqo_stop is observed by the io_sq_thread() loop it is quite possible, that kthread_should_park() check and the following kthread_parkme() is never called, because kthread_park() has not been yet called, but few moments later is is called and waits there for park completion, which never happens, because kthread has already exited: CPU#0 CPU#1 io_sq_thread_stop(): io_sq_thread(): ctx->sqo_stop = 1; while(!kthread_should_stop() && !ctx->sqo_stop) { <<< observe sqo_stop and exit the loop } if (kthread_should_park()) kthread_parkme(); <<< never called, since was <<< never parked kthread_park() <<< waits forever for park completion In the current patch we quit the loop by only kthread_should_park() check (kthread_park() is synchronous, so kthread_should_stop() is never observed), and we abandon ->sqo_stop flag, since it is racy. At the end of the io_sq_thread() we unconditionally call parmke(), since we've exited the loop by the park flag. Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | io_uring: remove 'ev_flags' argumentJens Axboe2019-05-151-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We always pass in 0 for the cqe flags argument, since the support for "this read hit page cache" hint was dropped. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | io_uring: fix failure to verify SQ_AFF cpuJens Axboe2019-05-151-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test case we have is rightfully failing with the current kernel: io_uring_setup(1, 0x7ffe2cafebe0), flags: IORING_SETUP_SQPOLL|IORING_SETUP_SQ_AFF, resv: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000, sq_thread_cpu: 4 expected -1, got 3 This is in a vm, and CPU3 is the last valid one, hence asking for 4 should fail the setup with -EINVAL, not succeed. The problem is that we're using array_index_nospec() with nr_cpu_ids as the index, hence we wrap and end up using CPU0 instead of CPU4. This makes the setup succeed where it should be failing. We don't need to use array_index_nospec() as we're not indexing any array with this. Instead just compare with nr_cpu_ids directly. This is fine as we're checking with cpu_online() afterwards. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | io_uring: fix race condition reading SQE dataStefan Bühler2019-05-131-15/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When punting to workers the SQE gets copied after the initial try. There is a race condition between reading SQE data for the initial try and copying it for punting it to the workers. For example io_rw_done calls kiocb->ki_complete even if it was prepared for IORING_OP_FSYNC (and would be NULL). The easiest solution for now is to alway prepare again in the worker. req->file is safe to prepare though as long as it is checked before use. Signed-off-by: Stefan Bühler <source@stbuehler.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | | | Merge tag 'afs-fixes-b-20190516' of ↵Linus Torvalds2019-05-1720-1381/+1383
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS callback promise fixes from David Howells: "This series fixes a bunch of problems in callback promise handling, where a callback promise indicates a promise on the part of the server to notify the client in the event of some sort of change to a file or volume. In the event of a break, the client has to go and refetch the client status from the server and discard any cached permission information as the ACL might have changed. The problem in the current code is that changes made by other clients aren't always noticed, primarily because the file status information and the callback information aren't updated in the same critical section, even if these are carried in the same reply from an RPC operation, and so the AFS_VNODE_CB_PROMISED flag is unreliable. Arranging for them to be done in the same critical section during reply decoding is tricky because of the FS.InlineBulkStatus op - which has all the statuses in the reply arriving and then all the callbacks, so they have to be buffered. It simplifies things a lot to move the critical section out of the decode phase and do it after the RPC function returns. Also new inodes (either newly fetched or newly created) aren't properly managed against a callback break happening before we get the local inode up and running. Fix this by: - There's now a combined file status and callback record (struct afs_status_cb) to carry both plus some flags. - Each operation wrapper function allocates sufficient afs_status_cb records for all the vnodes it is interested in and passes them into RPC operations to be filled in from the reply. - The FileStatus and CallBack record decoders no longer apply the new/revised status and callback information to the inode/vnode at the point of decoding and instead store the information into the record from (2). - afs_vnode_commit_status() then revises the file status, detects deletion and notes callback information inside of a single critical section. It also checks the callback break counters and cancels the callback promise if they changed during the operation. [*] Note that "callback break counters" are counters of server events that cancel one or more callback promises that the client thinks it has. The client counts the events and compares the counters before and after an operation to see if the callback promise it thinks it just got evaporated before it got recorded under lock. - Volume and server callback break counters are passed into afs_iget() allowing callback breaks concurrent with inode set up to be detected and the callback promise thence to be cancelled. - AFS validation checks are now done under RCU conditions using a read lock on cb_lock. This requires vnode->cb_interest to be made RCU safe. - If the checks in (6) fail, the callback breaker is then called under write lock on the cb_lock - but only if the callback break counter didn't change from the value read before the checks were made. - Results from FS.InlineBulkStatus that correspond to inodes we currently have in memory are now used to update those inodes' status and callback information rather than being discarded. This requires those inodes to be looked up before the RPC op is made and all their callback break values saved. To aid in this, the following changes have also been made: - Don't pass the vnode into the reply delivery functions or the decoders. The vnode shouldn't be altered anywhere in those paths. The only exception, for the moment, is for the call done hook for file lock ops that wants access to both the vnode and the call - this can be fixed at a later time. - Get rid of the call->reply[] void* array and replace it with named and typed members. This avoids confusion since different ops were mapping different reply[] members to different things. - Fix an order-1 kmalloc allocation in afs_do_lookup() and replace it with kvcalloc(). - Always get the reply time. Since callback, lock and fileserver record expiry times are calculated for several RPCs, make this mandatory. - Call afs_pages_written_back() from the operation wrapper rather than from the delivery function. - Don't store the version and type from a callback promise in a reply as the information in them is of very limited use" * tag 'afs-fixes-b-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix application of the results of a inline bulk status fetch afs: Pass pre-fetch server and volume break counts into afs_iget5_set() afs: Fix unlink to handle YFS.RemoveFile2 better afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiry afs: Make vnode->cb_interest RCU safe afs: Split afs_validate() so first part can be used under LOOKUP_RCU afs: Don't save callback version and type fields afs: Fix application of status and callback to be under same lock afs: Always get the reply time afs: Fix order-1 allocation in afs_do_lookup() afs: Get rid of afs_call::reply[] afs: Don't pass the vnode pointer through into the inline bulk status op
| * | | | | | | | afs: Fix application of the results of a inline bulk status fetchDavid Howells2019-05-162-7/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix afs_do_lookup() such that when it does an inline bulk status fetch op, it will update inodes that are already extant (something that afs_iget() doesn't do) and to cache permits for each inode created (thereby avoiding a follow up FS.FetchStatus call to determine this). Extant inodes need looking up in advance so that their cb_break counters before and after the operation can be compared. To this end, the inode pointers are cached so that they don't need looking up again after the op. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Pass pre-fetch server and volume break counts into afs_iget5_set()David Howells2019-05-164-49/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the server and volume break counts from before the status fetch operation that queried the attributes of a file into afs_iget5_set() so that the new vnode's break counters can be initialised appropriately. This allows detection of a volume or server break that happened whilst we were fetching the status or setting up the vnode. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix unlink to handle YFS.RemoveFile2 betterDavid Howells2019-05-166-28/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make use of the status update for the target file that the YFS.RemoveFile2 RPC op returns to correctly update the vnode as to whether the file was actually deleted or just had nlink reduced. Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Clear AFS_VNODE_CB_PROMISED if we detect callback expiryDavid Howells2019-05-161-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix afs_validate() to clear AFS_VNODE_CB_PROMISED on a vnode if we detect any condition that causes the callback promise to be broken implicitly, including server break (cb_s_break), volume break (cb_v_break) or callback expiry. Fixes: ae3b7361dc0e ("afs: Fix validation/callback interaction") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Make vnode->cb_interest RCU safeDavid Howells2019-05-167-59/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use RCU-based freeing for afs_cb_interest struct objects and use RCU on vnode->cb_interest. Use that change to allow afs_check_validity() to use read_seqbegin_or_lock() instead of read_seqlock_excl(). This also requires the caller of afs_check_validity() to hold the RCU read lock across the call. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Split afs_validate() so first part can be used under LOOKUP_RCUDavid Howells2019-05-162-13/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split afs_validate() so that the part that decides if the vnode is still valid can be used under LOOKUP_RCU conditions from afs_d_revalidate(). Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Don't save callback version and type fieldsDavid Howells2019-05-166-16/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't save callback version and type fields as the version is about the format of the callback information and the type is relative to the particular RPC call. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix application of status and callback to be under same lockDavid Howells2019-05-1612-1004/+921Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When applying the status and callback in the response of an operation, apply them in the same critical section so that there's no race between checking the callback state and checking status-dependent state (such as the data version). Fix this by: (1) Allocating a joint {status,callback} record (afs_status_cb) before calling the RPC function for each vnode for which the RPC reply contains a status or a status plus a callback. A flag is set in the record to indicate if a callback was actually received. (2) These records are passed into the RPC functions to be filled in. The afs_decode_status() and yfs_decode_status() functions are removed and the cb_lock is no longer taken. (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer update the vnode. (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update the vnode. (5) vnodes, expected data-version numbers and callback break counters (cb_break) no longer need to be passed to the reply delivery functions. Note that, for the moment, the file locking functions still need access to both the call and the vnode at the same time. (6) afs_vnode_commit_status() is now given the cb_break value and the expected data_version and the task of applying the status and the callback to the vnode are now done here. This is done under a single taking of vnode->cb_lock. (7) afs_pages_written_back() is now called by afs_store_data() rather than by the reply delivery function. afs_pages_written_back() has been moved to before the call point and is now given the first and last page numbers rather than a pointer to the call. (8) The indicator from YFS.RemoveFile2 as to whether the target file actually got removed (status.abort_code == VNOVNODE) rather than merely dropping a link is now checked in afs_unlink rather than in xdr_decode_YFSFetchStatus(). Supplementary fixes: (*) afs_cache_permit() now gets the caller_access mask from the afs_status_cb object rather than picking it out of the vnode's status record. afs_fetch_status() returns caller_access through its argument list for this purpose also. (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather than a read lock and now sets the callback inside the same critical section. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Always get the reply timeDavid Howells2019-05-165-16/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Always ask for the reply time from AF_RXRPC as it's used to calculate the callback expiry time and lock expiry times, so it's needed by most FS operations. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix order-1 allocation in afs_do_lookup()David Howells2019-05-165-50/+45Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afs_do_lookup() will do an order-1 allocation to allocate status records if there are more than 39 vnodes to stat. Fix this by allocating an array of {status,callback} records for each vnode we want to examine using vmalloc() if larger than a page. This not only gets rid of the order-1 allocation, but makes it easier to grow beyond 50 records for YFS servers. It also allows us to move to {status,callback} tuples for other calls too and makes it easier to lock across the application of the status and the callback to the vnode. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Get rid of afs_call::reply[]David Howells2019-05-1610-303/+293Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the afs_call::reply[] array with a bunch of typed members so that the compiler can use type-checking on them. It's also easier for the eye to see what's going on. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Don't pass the vnode pointer through into the inline bulk status opDavid Howells2019-05-162-15/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't pass the vnode pointer through into the inline bulk status op. We want to process the status records outside of it anyway. Signed-off-by: David Howells <dhowells@redhat.com>
* | | | | | | | | Merge tag 'afs-fixes-20190516' of ↵Linus Torvalds2019-05-1726-289/+393
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull misc AFS fixes from David Howells: "This fixes a set of miscellaneous issues in the afs filesystem, including: - leak of keys on file close. - broken error handling in xattr functions. - missing locking when updating VL server list. - volume location server DNS lookup whereby preloaded cells may not ever get a lookup and regular DNS lookups to maintain server lists consume power unnecessarily. - incorrect error propagation and handling in the fileserver iteration code causes operations to sometimes apparently succeed. - interruption of server record check/update side op during fileserver iteration causes uninterruptible main operations to fail unexpectedly. - callback promise expiry time miscalculation. - over invalidation of the callback promise on directories. - double locking on callback break waking up file locking waiters. - double increment of the vnode callback break counter. Note that it makes some changes outside of the afs code, including: - an extra parameter to dns_query() to allow the dns_resolver key just accessed to be immediately invalidated. AFS is caching the results itself, so the key can be discarded. - an interruptible version of wait_var_event(). - an rxrpc function to allow the maximum lifespan to be set on a call. - a way for an rxrpc call to be marked as non-interruptible" * tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix double inc of vnode->cb_break afs: Fix lock-wait/callback-break double locking afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set afs: Fix calculation of callback expiry time afs: Make dynamic root population wait uninterruptibly for proc_cells_lock afs: Make some RPC operations non-interruptible rxrpc: Allow the kernel to mark a call as being non-interruptible afs: Fix error propagation from server record check/update afs: Fix the maximum lifespan of VL and probe calls rxrpc: Provide kernel interface to set max lifespan on a call afs: Fix "kAFS: AFS vnode with undefined type 0" afs: Fix cell DNS lookup Add wait_var_event_interruptible() dns_resolver: Allow used keys to be invalidated afs: Fix afs_cell records to always have a VL server list record afs: Fix missing lock when replacing VL server list afs: Fix afs_xattr_get_yfs() to not try freeing an error value afs: Fix incorrect error handling in afs_xattr_get_acl() afs: Fix key leak in afs_release() and afs_evict_inode()
| * | | | | | | | afs: Fix double inc of vnode->cb_breakDavid Howells2019-05-161-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When __afs_break_callback() clears the CB_PROMISED flag, it increments vnode->cb_break to trigger a future refetch of the status and callback - however it also calls afs_clear_permits(), which also increments vnode->cb_break. Fix this by removing the increment from afs_clear_permits(). Whilst we're at it, fix the conditional call to afs_put_permits() as the function checks to see if the argument is NULL, so the check is redundant. Fixes: be080a6f43c4 ("afs: Overhaul permit caching"); Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix lock-wait/callback-break double lockingDavid Howells2019-05-162-10/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __afs_break_callback() holds vnode->lock around its call of afs_lock_may_be_available() - which also takes that lock. Fix this by not taking the lock in __afs_break_callback(). Also, there's no point checking the granted_locks and pending_locks queues; it's sufficient to check lock_state, so move that check out of afs_lock_may_be_available() into __afs_break_callback() to replace the queue checks. Fixes: e8d6c554126b ("AFS: implement file locking") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not setDavid Howells2019-05-161-5/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't invalidate the callback promise on a directory if the AFS_VNODE_DIR_VALID flag is not set (which indicates that the directory contents are invalid, due to edit failure, callback break, page reclaim). The directory will be reloaded next time the directory is accessed, so clearing the callback flag at this point may race with a reload of the directory and cancel it's recorded callback promise. Fixes: f3ddee8dc4e2 ("afs: Fix directory handling") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix calculation of callback expiry timeDavid Howells2019-05-162-57/+51Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the calculation of the expiry time of a callback promise, as obtained from operations like FS.FetchStatus and FS.FetchData. The time should be based on the timestamp of the first DATA packet in the reply and the calculation needs to turn the ktime_t timestamp into a time64_t. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Make dynamic root population wait uninterruptibly for proc_cells_lockDavid Howells2019-05-161-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make dynamic root population wait uninterruptibly for proc_cells_lock. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Make some RPC operations non-interruptibleDavid Howells2019-05-1614-35/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make certain RPC operations non-interruptible, including: (*) Set attributes (*) Store data We don't want to get interrupted during a flush on close, flush on unlock, writeback or an inode update, leaving us in a state where we still need to do the writeback or update. (*) Extend lock (*) Release lock We don't want to get lock extension interrupted as the file locks on the server are time-limited. Interruption during lock release is less of an issue since the lock is time-limited, but it's better to complete the release to avoid a several-minute wait to recover it. *Setting* the lock isn't a problem if it's interrupted since we can just return to the user and tell them they were interrupted - at which point they can elect to retry. (*) Silly unlink We want to remove silly unlink files if we can, rather than leaving them for the salvager to clear up. Note that whilst these calls are no longer interruptible, they do have timeouts on them, so if the server stops responding the call will fail with something like ETIME or ECONNRESET. Without this, the following: kAFS: Unexpected error from FS.StoreData -512 appears in dmesg when a pending store data gets interrupted and some processes may just hang. Additionally, make the code that checks/updates the server record ignore failure due to interruption if the main call is uninterruptible and if the server has an address list. The next op will check it again since the expiration time on the old list has past. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | rxrpc: Allow the kernel to mark a call as being non-interruptibleDavid Howells2019-05-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow kernel services using AF_RXRPC to indicate that a call should be non-interruptible. This allows kafs to make things like lock-extension and writeback data storage calls non-interruptible. If this is set, signals will be ignored for operations on that call where possible - such as waiting to get a call channel on an rxrpc connection. It doesn't prevent UDP sendmsg from being interrupted, but that will be handled by packet retransmission. rxrpc_kernel_recv_data() isn't affected by this since that never waits, preferring instead to return -EAGAIN and leave the waiting to the caller. Userspace initiated calls can't be set to be uninterruptible at this time. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix error propagation from server record check/updateDavid Howells2019-05-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afs_check/update_server_record() should be setting fc->error rather than fc->ac.error as they're called from within the cursor iteration function. afs_fs_cursor::error is where the error code of the attempt to call the operation on multiple servers is integrated and is the final result, whereas afs_addr_cursor::error is used to hold the error from individual iterations of the call loop. (Note there's also an afs_vl_cursor which also wraps afs_addr_cursor for accessing VL servers rather than file servers). Fix this by setting fc->error in the afs_check/update_server_record() so that any error incurred whilst talking to the VL server correctly propagates to the final result. This results in: kAFS: Unexpected error from FS.StoreData -512 being seen, even though the store-data op is non-interruptible. The error is actually coming from the server record update getting interrupted. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix the maximum lifespan of VL and probe callsDavid Howells2019-05-165-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an older AFS server doesn't support an operation, it may accept the call and then sit on it forever, happily responding to pings that make kafs think that the call is still alive. Fix this by setting the maximum lifespan of Volume Location service calls in particular and probe calls in general so that they don't run on endlessly if they're not supported. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix "kAFS: AFS vnode with undefined type 0"David Howells2019-05-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under some circumstances afs_select_fileserver() can return without setting an error in fc->error. The problem is in the no_more_servers segment where the accumulated errors from attempts to contact various servers are integrated into an afs_error-type variable 'e'. The resultant error code is, however, then abandoned. Fix this by getting the error out of e.error and putting it in 'error' so that the next part will store it into fc->error. Not doing this causes a report like the following: kAFS: AFS vnode with undefined type 0 kAFS: A=0 m=0 s=0 v=0 kAFS: vnode 20000025:1:1 because the code following the server selection loop then sees what it thinks is a successful invocation because fc.error is 0. However, it can't apply the status record because it's all zeros. The report is followed on the first instance with a trace looking something like: dump_stack+0x67/0x8e afs_inode_init_from_status.isra.2+0x21b/0x487 afs_fetch_status+0x119/0x1df afs_iget+0x130/0x295 afs_get_tree+0x31d/0x595 vfs_get_tree+0x1f/0xe8 fc_mount+0xe/0x36 afs_d_automount+0x328/0x3c3 follow_managed+0x109/0x20a lookup_fast+0x3bf/0x3f8 do_last+0xc3/0x6a4 path_openat+0x1af/0x236 do_filp_open+0x51/0xae ? _raw_spin_unlock+0x24/0x2d ? __alloc_fd+0x1a5/0x1b7 do_sys_open+0x13b/0x1e8 do_syscall_64+0x7d/0x1b3 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 4584ae96ae30 ("afs: Fix missing net error handling") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix cell DNS lookupDavid Howells2019-05-163-75/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, once configured, AFS cells are looked up in the DNS at regular intervals - which is a waste of resources if those cells aren't being used. It also leads to a problem where cells preloaded, but not configured, before the network is brought up end up effectively statically configured with no VL servers and are unable to get any. Fix this by not doing the DNS lookup until the first time a cell is touched. It is waited for if we don't have any cached records yet, otherwise the DNS lookup to maintain the record is done in the background. This has the downside that the first time you touch a cell, you now have to wait for the upcall to do the required DNS lookups rather than them already being cached. Further, the record is not replaced if the old record has at least one server in it and the new record doesn't have any. Fixes: 0a5143f2f89c ("afs: Implement VL server rotation") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | dns_resolver: Allow used keys to be invalidatedDavid Howells2019-05-154-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow used DNS resolver keys to be invalidated after use if the caller is doing its own caching of the results. This reduces the amount of resources required. Fix AFS to invalidate DNS results to kill off permanent failure records that get lodged in the resolver keyring and prevent future lookups from happening. Fixes: 0a5143f2f89c ("afs: Implement VL server rotation") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix afs_cell records to always have a VL server list recordDavid Howells2019-05-154-24/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix it such that afs_cell records always have a VL server list record attached, even if it's a dummy one, so that various checks can be removed. Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix missing lock when replacing VL server listDavid Howells2019-05-151-3/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When afs_update_cell() replaces the cell->vl_servers list, it uses RCU protocol so that proc is protected, but doesn't take ->vl_servers_lock to protect afs_start_vl_iteration() (which does actually take a shared lock). Fix this by making afs_update_cell() take an exclusive lock when replacing ->vl_servers. Fixes: 0a5143f2f89c ("afs: Implement VL server rotation") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix afs_xattr_get_yfs() to not try freeing an error valueDavid Howells2019-05-153-64/+53Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | afs_xattr_get_yfs() tries to free yacl, which may hold an error value (say if yfs_fs_fetch_opaque_acl() failed and returned an error). Fix this by allocating yacl up front (since it's a fixed-length struct, unlike afs_acl) and passing it in to the RPC function. This also allows the flags to be placed in the object rather than passing them through to the RPC function. Fixes: ae46578b963f ("afs: Get YFS ACLs and information through xattrs") Signed-off-by: David Howells <dhowells@redhat.com>
| * | | | | | | | afs: Fix incorrect error handling in afs_xattr_get_acl()David Howells2019-05-151-5/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix incorrect error handling in afs_xattr_get_acl() where there appears to be a redundant assignment before return, but in fact the return should be a goto to the error handling at the end of the function. Fixes: 260f082bae6d ("afs: Get an AFS3 ACL as an xattr") Addresses-Coverity: ("Unused Value") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Joe Perches <joe@perches.com>
| * | | | | | | | afs: Fix key leak in afs_release() and afs_evict_inode()David Howells2019-05-152-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix afs_release() to go through the cleanup part of the function if FMODE_WRITE is set rather than exiting through vfs_fsync() (which skips the cleanup). The cleanup involves discarding the refs on the key used for file ops and the writeback key record. Also fix afs_evict_inode() to clean up any left over wb keys attached to the inode/vnode when it is removed. Fixes: 5a8132761609 ("afs: Do better accretion of small writes on newly created content") Signed-off-by: David Howells <dhowells@redhat.com>
* | | | | | | | | Merge tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2019-05-1712-264/+751
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "On the filesystem side we have: - a fix to enforce quotas set above the mount point (Luis Henriques) - support for exporting snapshots through NFS (Zheng Yan) - proper statx implementation (Jeff Layton). statx flags are mapped to MDS caps, with AT_STATX_{DONT,FORCE}_SYNC taken into account. - some follow-up dentry name handling fixes, in particular elimination of our hand-rolled helper and the switch to __getname() as suggested by Al (Jeff Layton) - a set of MDS client cleanups in preparation for async MDS requests in the future (Jeff Layton) - a fix to sync the filesystem before remounting (Jeff Layton) On the rbd side, work is on-going on object-map and fast-diff image features" * tag 'ceph-for-5.2-rc1' of git://github.com/ceph/ceph-client: (29 commits) ceph: flush dirty inodes before proceeding with remount ceph: fix unaligned access in ceph_send_cap_releases libceph: make ceph_pr_addr take an struct ceph_entity_addr pointer libceph: fix unaligned accesses in ceph_entity_addr handling rbd: don't assert on writes to snapshots rbd: client_mutex is never nested ceph: print inode number in __caps_issued_mask debugging messages ceph: just call get_session in __ceph_lookup_mds_session ceph: simplify arguments and return semantics of try_get_cap_refs ceph: fix comment over ceph_drop_caps_for_unlink ceph: move wait for mds request into helper function ceph: have ceph_mdsc_do_request call ceph_mdsc_submit_request ceph: after an MDS request, do callback and completions ceph: use pathlen values returned by set_request_path_attr ceph: use __getname/__putname in ceph_mdsc_build_path ceph: use ceph_mdsc_build_path instead of clone_dentry_name ceph: fix potential use-after-free in ceph_mdsc_build_path ceph: dump granular cap info in "caps" debugfs file ceph: make iterate_session_caps a public symbol ceph: fix NULL pointer deref when debugging is enabled ...
| * | | | | | | | | ceph: flush dirty inodes before proceeding with remountJeff Layton2019-05-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfstest generic/452 was triggering a "Busy inodes after umount" warning. ceph was allowing the mount to go read-only without first flushing out dirty inodes in the cache. Ensure we sync out the filesystem before allowing a remount to proceed. Cc: stable@vger.kernel.org Link: http://tracker.ceph.com/issues/39571 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: fix unaligned access in ceph_send_cap_releasesJeff Layton2019-05-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | libceph: make ceph_pr_addr take an struct ceph_entity_addr pointerJeff Layton2019-05-072-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC9 is throwing a lot of warnings about unaligned accesses by callers of ceph_pr_addr. All of the current callers are passing a pointer to the sockaddr inside struct ceph_entity_addr. Fix it to take a pointer to a struct ceph_entity_addr instead, and then have the function make a copy of the sockaddr before printing it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: print inode number in __caps_issued_mask debugging messagesJeff Layton2019-05-071-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To make it easier to correlate with MDS logs. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: just call get_session in __ceph_lookup_mds_sessionJeff Layton2019-05-071-7/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I originally thought there was a potential race here, but the fact that this is called with the mdsc->mutex held, ensures that the last reference to the session can't be put here. Still, it's clearer to just return the value from get_session here, and may prevent a bug later if we ever rework this code to be less reliant on mutexes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: simplify arguments and return semantics of try_get_cap_refsJeff Layton2019-05-071-46/+30Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return of this function is rather complex. It can return 0 or 1, and in the case of a 1 return, the "err" pointer will be filled out. This necessitates a lot of copying of values. We can achieve the same effect by just returning 0, 1 or a negative error code, and drop the "err" argument from this function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: fix comment over ceph_drop_caps_for_unlinkJeff Layton2019-05-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not clear what AUTH_RDCACHE means in this context, and we're clearly just dropping LINK caps here. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: move wait for mds request into helper functionJeff Layton2019-05-071-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: have ceph_mdsc_do_request call ceph_mdsc_submit_requestJeff Layton2019-05-072-23/+22Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nothing calls ceph_mdsc_submit_request today, but in later patches we'll need to be able to call this separately. Have the helper return an int so we can check the r_err under the mutex, and have the caller just check the error code from the submit. Also move the acquisition of CEPH_CAP_PIN references into the same function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: after an MDS request, do callback and completionsJeff Layton2019-05-071-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No MDS requests use r_callback today, but that will change in the future. The OSD client always does r_callback and then completes r_completion. Let's have the MDS client do the same. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: use pathlen values returned by set_request_path_attrJeff Layton2019-05-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We make copies of the dentry name in set_request_path_attr, but then create_request_message re-fetches the lengths out of the dentry. While we don't currently set the *_drop fields unless the parents are locked, it's still better not to rely on that sort of implicit assumption. Use the pathlen values that set_request_path_attr returned instead, as they will always be correct for the returned paths themselves. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * | | | | | | | | ceph: use __getname/__putname in ceph_mdsc_build_pathJeff Layton2019-05-073-39/+36Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Al suggested we get rid of the kmalloc here and just use __getname and __putname to get a full PATH_MAX pathname buffer. Since we build the path in reverse, we continue to return a pointer to the beginning of the string and the length, and add a new helper to free the thing at the end. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>